Thursday, October 30, 2014

Importance of knowing the odds


While scientific study of probability is relatively new development, dating from 16th century, and mathematicians such as Cardano, Huygens and Pascal were the first to develop mathematics of probability as science of knowing “what are the odds” of specific event happening. To be able to estimate likelihood of an even that could hurt us or benefit us is vital mechanism to our development and survival as species. When father throws baby in air, he does it because he can (being physically stronger than the mother) and because it is fun for child. But, on a deeper level this is one of the earliest lessons given by the parent of the concept of the risk and reward which is vital skill for child to master in order to shift the odds in its favor and to make right decisions and choices as it grows and lives. Instead of living by the Nike's "just do it" slogan - this lesson is first step in learning what is likely to happen - if i do it, and therefore should i do it?

Probability measure is so central to our well-being as individuals or corporations because it answers the question of how likely an event will occur, and yet most of us do it rather instinctively, and not as part of plan, method or strategy. Once we figured likelihood of occurrence we can then decide appropriate next steps that are aligned with odds of an event happening. If chances are high of thunderstorm in specific area – we may change our plan to go there camping. If chances are high that someone will not respond to our marketing message we may not send expensive marketing offer to that customer.  
So, whether is in everyday life or in business – having some means of quantifying probabilities is hugely important tool of navigating through life. Good news is - we all have such means! In fact we are born with it - it is called human brain! We have in-born powerful data mining software that we (should) constantly work on improving – and so when we approach specific situation that may be significant for our physical, emotional well-being or for our pockets – we ask ourselves “has something similar occurred before and what was the outcome?” This then tells us whether to proceed or to retreat.
We spend our lives analyzing and comparing and then we make decisions. So, when we see word “analytics” on billboard – it wrongly implies that analytical technologies are only used by big companies only for commercial reasons. Not at all! We all do it - all the time! And the human brain is a super powerful computer with intuition, creativity, and with some serious massively processing power with over 250 000 neurons acting simultaneously in order to make decision, produce thought, or assign the odds of something happening. However, it has one limitation which is in the sheer number processing. If we have only 10 variables with each have 10 different values – there are near 10 billion potential combinations – and that is when we rely on computer software and its algorithms to do speedy number crunching for us, tells us some probability of something happening that we may benefit from or be in some situation that we want to avoid.
So, we all need to have at our disposal some analytical capacities and if we move this to levels of organizations (profit or non-profit) - it becomes utmost necessity to have some computerized analytical software. If only all underlying conditions in deterministic universe are known – there would be no probabilities and there would be a certainty of specific outcome! Can we ever come close to fully  knowing underlying conditions and causes and their values to absolute certainty of anything but the most simplest events around us? Not a chance, this has zero probability! And that is why we need so much probabilistic knowledge. That is not to say that everything is predictable, and in areas of science that stipulate in-determinacy such as in quantum or chaos theories and therefore probabilistic theories may not be equally applicable. Things get slightly more complex or simpler (depends how one looks at it) if we slightly change our scale of required “determinism” and we decide – well, I am not even interested about modeling cause and effect, instead i want to know associations, loose connections, linkages, patterns and correlations or anything else that would incrementally and cumulatively increase odds of knowing what will happen in period of time - based on how similar scenario played out before.
Fundamental question is “would you play if you knew the odds”? That question is applicable in all spheres of life. However, this is like asking – do we need air to breath? However, not everyone recognize importance of being able to answer that type of question, purpose of it and value it can generate? It is somewhat easier to see the value in colorful report even if presents trivial and non-actionable insight rather than simple number that says that the probability of our top customers switching to competitor is high. But those who do see value, will have brighter colours to paint their reports from.
Goran Dragosavac

Tuesday, October 14, 2014

How could government agencies in South Africa benefit of greater use of analytics

In some of the most developed countries, there is pervasive use of analytics for a variety of purposes. According to latest US General Accounting Office report one can see high level and types of usage across different departments, with departments of defense and of homeland security being slightly ahead of others. Primary purposes of using analytical technologies in the government sector are improving service or performance, detecting fraud, waste, and abuse, analyzing scientific and research information, managing human resources; detecting criminal activities or patterns; and analyzing intelligence and detecting terrorist activities. This is motivated by growth in the volumes and availability of data collected by government agencies and by advances in analytical technologies that can be deployed on such information. Another contributing factor is decreasing cost of storage which means that larger amounts of data can be kept cheaper than ever before. 

So, question is how can local adoption and consumption of analytics in the public sector be increased to the levels close to usage of so called “first world” countries? There is tremendous need in South Africa for analytically-enabled applications across the board. Imagine benefits of “early warning” systems (EWS) that can alert before the crisis allowing for fast response times. This is applicable in all government departments – from early warning detection systems in Eskom’s production units that could “ring bell” just before unplanned outage! Or early warning detection system that would indicate water pump failure like the one now that caused week-long water shortages in areas around Johannesburg.

Word "crisis" is often used in context of South Africa’s public health delivery. Everyone knows that staff shortages are major contributor to the poor state of affairs in area of public health. What is not clearly known is the magnitude of the difference and ranking order between different hospitals in different areas. Some of the government policies and programs are not only ineffective in reducing problems but directly contributing to underlying cause. That means that real-time awareness and feed-back are painfully lacking. This is precisely where analytical technologies can massively assist, so that governmental programs and policies much better represent reality.
There was a case in one province where the school was built on one side of the river, while majority of learners reside in rural villages on the other side of river. For most part of the year river is not difficult to cross, but for one month of the year this river is heavily flooded and dozens of children die every year by being swept by heavy flood-streams. This could have been prevented through analytically enabled decision making.  

Think of a  case of a children hospital where analytics have found mismatches between causes and outcomes of injuries. If stated injury is caused by the fall from the bed and this specific type injury with its symptoms is highly unlikely to be caused by the same cause – well, could it be that parents are lying and child has been abused? After further scrutiny of such cases – that is exactly what have been discovered. As a result - this specific children’s hospital has implemented policy that for any injury pattern discovered, where stated cause of injury doesn’t match with the expected set of symptoms – that social workers should be alerted to have closer look at such family.

Another example is that certain government hospitals are persistently far above the average of instance of infant mortality. The fact that some of these hospitals are worst on on-going basis suggest that there is some negative pattern at work that is causing the mortality numbers to be worse than elsewhere. Analytics can potentially extract such negative pattern and by breaking this pattern through appropriate measures and actions – one can reduce this problem to average or below average levels – and this reduction of the problem can be directly attributable to actionable analytics.

And then there is massive problem of fraud, waste and abuse where analytics can be used for detection and ultimately – prevention. But, what is the main reason for slow adoption of cutting-edge analytical technologies in public sector? Yes, there may be issues with data quality and access, issues with a shortage of skills and lack of analytical technologies – but the biggest challenge is lack of motivation. Neither, penalty for doing nothing, nor award for doing something is strong enough for needed change of management culture. That is why improvement is hard to come by. There are some pockets of excellence in public sector that proves that analytical technologies can effectively be used to vastly improve service delivery and performance, reduce the fraud, better represent reality for better decision making and ultimately make idealistic concept of “smart and just city” more achievable. In other words, there is strong case for greater usage of analytics where communities are built on sustainable economic development and high quality of life, with lesser crime, greater and quicker justice delivery and with wise management of natural resources, and last bit not least – more effective transformation and empowerment of previously disadvantaged sectors of society - far more of a reality for tomorrow than what it is today.
Goran Dragosavac

Unexpected use of analytics


big analyticsWhile most common applications of analytics have been in database marketing and CRM type of applications, and most people associate the use of analytics in these areas – it is hard to find any other area of human endeavor where analytics have not been used to either describe or predict, whether it is research, science and techno-, logy, sports, politics, entertainment, or any other area where there is question, historical data relevant to that question and some analytical skills and technology. 
In the wake of India's 16th national election it was clear that some parties have taken a page from Obama's re-election campaign where they used – in a big way – technology, social media and big data to connect with voters. Analytics have helped in micro-segmenting electorate focusing on swing states, different gender and minority groupings tailoring messages for segment-specific audiences. Not to mention the use of analytics to rework advertising campaigns and most importantly – to raise funds.
Another area of pervasive use of analytics is in counter-terrorism, as well as for defense and military agencies. A big lesson of the September 11 attacks was the importance of being able to integrate disparate pieces of data, and data could be anything from field reports to social media postings, from broadcast news, accounts to e-commerce transactions, from bank records to records in classified government databases. And once data is mapped and reduced, users can track high-value individuals and organizations of interest, and establish connections between people, organizations, events and places that would otherwise be difficult to make.
It is well known that the US Marine Corps is experimenting with big data analytics technologies, such as Hadoop and graph databases, all for the purpose of quicker intelligence gathering and dissemination that can be used for real-decision making by field commanders. Also, according to US' Department of Veteran affairs, the number of suicides among veterans and active-duty military personnel is 22 a day, which is by all accounts an under-reported epidemic, and so analytical practitioners were approached to see how analytics can help spot patterns of suicide and prevent it before it occurs. Of course first step in bringing disparate and relevant data together and then deriving a number of "stress load" factors and modeling behavioral dynamics that could turn on the switch and push the person to suicide. And if it can be modeled – it can be prevented, and while suicide prevention may not be an "exact science" it is hard not to see how analytics cannot be hugely beneficial in these areas.
In the area of entertainment, applications of analytics are limitless. Aziz Asnari, an American comedian, started inviting fans to participate with him in the development of new material. All they would have to do is subscribe to his channel and give their feedback on his new material. Of course, the catch was that by subscribing they gave him information about themselves, so Asnari could see what comic topics work well or poorly on which segment of his fans, and he could use that feedback to adjust his material to the segment he was performing to. Largely, the media and entertainment industry of today is well aware that consumers have multiple ways of telling them what they think about their content, especially across social media. So, they need, in essence, to capture these digital voices, analyze them and tweak and adjust their content accordingly. Which segments hate a certain program, which segment likes it and therefore where you should increase your marketing spend, who streamed latest trailers, males or woman? Who complained about moving it to a new time slot, etc. 
In sport, the use of analytics is equally ever-increasing. From analyzing the injury pattern of a certain football player before the transfer window to work out what the risk is, after spending millions of dollars on him, that he will be sidelined for months by yet another injury. Or analyzing patterns of play of a competitor team or even an individual player to select the right players and devise the appropriate tactics that will negate competitor's strengths. It is the new science of winning – whether a coach uses the neural networks in his brain or those of a computer algorithm.
But then, on the fun side – analytics have been used for more than just to win customers, profits, audiences or games. They have been increasingly used to save lives and property, whether is in predicting the most likely path of the hurricane, the likely spread of the wild fire, floods or even diseases. Analyzing patterns of injuries is reality in many children's hospitals for the prevention of falls, traffic accident, assaults, burns etc. And by knowing where these injuries happen more often, how and when, and who is the most likely victim – it becomes actionable intelligence that can be used in injury prevention programs and ultimately reduce suffering and save lives.
It is still early days and there is tremendous potential to do more across the board, but with big data technologies coupled with abilities to mine and analyze different types of data, text, pictures and videos, sensor data will contribute to the new generation of analytical applications that will hopefully make the world even slightly more predictable and a safer place.

Goran Dragosavac

Big data - proceed with caution


While collection and analysis on big data hold great promise, quantity doesn't always translate to quality. Quantity of data is represented by the number of records and the number of variables, and one can argue that good old statistical sampling techniques are still relevant. If variability is captured with a random sample, there will be very little incremental benefit, if any, of doing analysis on all rows.
The second dimension of big data is the number of variables. Data sets of only 10 variables with 10 distinct values for each variable, gives potentially 10 billion pattern combinations; and with an increase in the number of variables, the potential for extracting spurious and non-explicable patterns and correlations also increases.
The key question remains – how can analytics handle stream data that keeps increasing in volume? Each method, technique or algorithm has an optimal point, after which there are diminishing returns, plateau and then degradation, while computational requirements continue to grow. Some suggest that algorithms need to be rewritten to move the optimal point further down a path of data infinity.
And what makes it more challenging is that big data is characterized more by the data variety and velocity rather than by sheer volume. Data doesn't only come in a standard structured format, it comes in a stream in the form of free text, pictures, sounds and whatever else may come to play. And it comes with a high degree of variability where formats within the stream can change as the data are captured.
So, all this necessitates that analytical technologies are further redesigned in a way that they can take advantage of massively parallel processing architectures and be able to exploit heterogeneous data with high volumes and velocity and still be able to produce robust and accurate models. Some argue that traditional statistical methods that open more questions than give answers may not survive in this data flood era, and that new machine-learning methods are needed to deal with "big data noise" and see the "big picture around the corner". 
There is often the na├»ve assumption that analysis will happen on this big data as it has been collected. In most cases, only relevant subsets of data will be needed for analysis, which will be integrated with other relevant data sources and most likely aggregated to allow for knowledge induction and generalization. 
The big challenge is also around data management. Since data is getting collected in different time points from different locations – temporal and spatial variability – and comes in different formats without adequate metadata describing who, what, when, how and from where, this can pose serious issues in terms of contextualizing and acting on intelligence extracted from such data.
And then there are issues around system and components design. Since not all big data and all business requirements are the same, designers will need to carefully consider functionality, conceptual model, organization and interfaces that would meet the needs of end-users. Answering the business question is more important than processing all the data, so knowing how much data is enough for a given set of business questions is important to know, since this will drive design and architecture of a processing system. 
Then there is the challenge with data ownership, privacy and security. Who owns Twitter or Facebook data – service providers where data is stored or account holders? There are serious attempts by researchers to develop algorithms that will automatically randomize personal data among large data collections to mitigate privacy concerns. International Data Corporation has suggested five levels of increasing security: privacy, compliance-driven, custodial, confidential and lockdown, and there is still work ahead to define these security levels in respect to analytical exploitation before any legislative measures are in place. 
One must not forget what the end-goal is here. It is about business value and the advantage of being able to make decisions founded on big data analytics that were beyond reach before. And the main challenges here are to prioritize big data analytical engagements so that resources are used on high priority, high value business questions. Successful completion of such complex big data analytics projects will require multiple experts from different domains and different locations to share data, as well as analytical technologies, and be able to provide input and share the exploration of results. 
Therefore, big data analytic systems must support collaboration in an equally big way! And lastly, results of analytics must be interpretable to the end-users and be relevant to questions at hand, and some measures of relevance and interest are needed to rank and reduce the sea of patterns so that only relevant, non-trivial and potentially useful results are presented. And in presenting and disseminating results of analytics – the method of visualization plays a special role both in interpretation and collaboration purposes.
Not all of these challenges may be equally relevant in all situations, but at least it is helpful to be aware of them. While there will not be a U-turn on big data, as well as big data analytics, the issue is how to address some of these challenges while keeping an eye on ball, which is to ensure that big data technologies deliver on their promises of providing better answers, quicker, to more complex questions.

Goran Dragosavac