Thursday, June 27, 2013

Main applications of analytics and data mining in healthcare

Disease Management

Disease management concerns with predictive as well as descriptive aspects of specific disease. What is likely probability of specific disease outcome, and what are the factors associated with these outcomes with the focus on actionable factors. One has to separate effects of causes for specific disease, and that can be done by separating event period from period of input data collection. Disease management can involve specific aspect of the disease whose resolution can be beneficial to not only health-providers but more importantly to the patient. Descriptive component of disease management involves desirable as well as undesirable patterns – and auctioning on these patterns involves either supporting them or breaking them and then measuring effects of these actions for purpose of achieving specific disease management goals.

Some of the examples of disease management questions:
    • If surgical procedure "X" is done, then 45% of the time infection "Y" occurs within two weeks- Why, reasons, contributing factors?
    • What, if any seasonal patterns in emergency room nosocomial infections exist and contributing factors?
    • Why do some congestive heart failure (CHF) patients return to the heart clinic after bypass surgery for care within 3 moths, while others don't?
    • Compare and contrast high length of stay patient groups based upon bed location, nursing teams, and treatment modalities.
    • Compare and contrast treatment results or glucose levels for type II diabetic patients for a given time period, by physician, gender, age group, etc.
    • What practice patterns for managing primary mammogram candidates will yield the best outcomes in terms of survival rates or complication rates at the least cost?
    • What percentage of women in membership between the ages 40 - 60 have had a mammogram in the last 12 months?
    • What is the comparative mean value of hypertension levels within a certain group or population of patients and does it fall within acceptable statistical levels? Do variations in clinical practice patterns have a cause and effect relationship?  

Outcomes Analysis: Clinical and Financial

Clinical Outcomes
A Clinical Outcome is the result of medical or surgical intervention or nonintervention. It can refer to, but is not limited to the following:

  • Mortality
  • Morbidity
  • Re-admittance rates
  • Changes in birth and death rates for a global population, for example, residents of a state
  • The outcome of a given diagnostic procedure, lab result or medical test
  • The results for a patient after care, for example, how long it took to restore the patient's ability to walk, or to work, or how long and to what degree did the patient have pain
  • Did the patient recover, how long did it take
  • The patient's own perception of their care and progress.
It is thought that through a historical record of outcome experiences, caregivers will know better which treatment modalities result in consistently better outcomes for patients. Effective Outcomes Management often relies on a successful data warehousing strategy designed to track historical outcome experiences in many areas such as epidemiological studies, lab results, responses to treatments, mortality and morbidity rates, length of patient stay and clinical effectiveness measures.

Financial Outcomes

The definition of a financial outcome varies depending upon an organization's goals and overall strategy. As an example, financial outcomes might cover measures such as hospital length of stay, net margins, cost breakouts, number of ER visits and office visits - just to name a few.

 Fraud and Detection

It would be nice if we could develop some type of industry wrapper to data mining technology for the health care market specifically. But for now, this may be an area of opportunity for AEs because the industry has yet to spend many resources on Fraud detection and have not developed sophisticated tools and technologies for not only detecting fraud but for predicting and catching fraud before claims adjudication.
Fraud and Abuse is usually defined as "the intentional deception or misrepresentation that an individual knows to be false or does not believe to be true and makes, knowing that the deception could result in some unauthorized benefit to himself/herself or some other person". The most frequent kind of fraud arises from a false statement or misrepresentation made, or caused to be made, that is material to entitlement or payment.
Violators and perpetrators of fraud may include physicians or other practitioners, a hospital or other institutional provider, a clinical laboratory or other supplier, an employee of any provider, a billing service, beneficiary, Medicare carrier employee or any person in a position to file a claim for payment or benefits.

Types of abuses

  • Misrepresentation of medical necessity: For example, a physician who recommends that eye cataract surgery be performed on a healthy eye.
  • Billing errors: Encompasses everything from billing the wrong date of service to up-coding.
  • Over-provision of services: Providing medically unnecessary tests to generate a fee.
  • Misrepresentation of services provided.
  • Offering or acceptance of kickbacks, and/or a routine waiver of co-payments.

Fraud schemes range from those perpetrated by individuals acting alone to broad-based activities by institutions or groups of individuals, sometimes employing sophisticated telemarketing and other promotional techniques to lure consumers into serving as the unwitting tools in the schemes. Seldom do perpetrators target only one insurer or target the public or private sector exclusively. Rather, most are found to be defrauding several private and public sector victims simultaneously.


Medical Errors

The issue of reducing medical errors has been a heated political topic and will continue to be controversial in the next several years. It is believed the key to decreasing these errors will be to properly identify them, analyze the causes, and then change the system and/or processes to prevent them from happening in the future. A November 1999 study by the U. S. Institute of Medicine (IOM) cited 90,000 avoidable deaths, 3 million medical errors and 2.2 million avoidable injuries each year attributable to medical errors. That's the equivalent of having one jumbo jet crash per day with 200 people dying in each crash.
The IOM defines medical error as "the failure to complete a planned action as intended or the use of a wrong plan to achieve an aim. An adverse event is defined as an injury caused by medical management rather than by the underlying disease or condition of the patient. Some adverse events are not preventable and they reflect the risk associated with treatment, such as a life-threatening allergic reaction to a drug when the patient had no known allergies to it. However, the patient who receives an antibiotic to which he or she is known to be allergic, goes into anaphylactic shock, and dies, represents a preventable adverse event.
Most people believe that medical errors usually involve drugs, such as a patient getting the wrong prescription or dosage, or mishandled surgeries, such as amputation of the wrong limb. However, there are many other types of medical errors, including:

  • Diagnostic error, such as misdiagnosis leading to an incorrect choice of therapy, failure to use an indicated diagnostic test, misinterpretation of test results, and failure to act on abnormal results.
  • Equipment failure
  • Infections, such as nosocomial and post-surgical wound infections.
  • Blood transfusion-related injuries
  • Misinterpretation of medical orders
  • Incorrect medicines and/or prescriptions
  • Surgical errors
  • Lab reports errors.
Most errors result from problems created by today's complex health care system. But errors also happen when doctors and their patients have problems communicating. For example, a recent study supported by the Agency for Healthcare Research and Quality (AHRQ) found that doctors often do not do enough to help their patients make informed decisions. Uninvolved and uninformed patients are less likely to accept the doctor's choice of treatment and less likely to do what they need to do to make the treatment work.

Performance Management in Healthcare

 Healthcare provider organizations use performance management methodologies to focus on their key challenges:

·         How are our resources (employees, physicians, capital assets) helping us to accomplish our strategic goals?
·         How are we going to excel at key business (access, throughput, value of service to patients) processes?
·         How are we going to create loyalty (patient satisfaction, physician referrals, market share) with our key stakeholders?
·         How are we going to sustain our ability (have enough financial resources) to enhance the value of the organization?

Full service performance management programs address each of those four perspectives. 



Thursday, April 18, 2013

Psychologist says maths can predict chances of divorce

A psychologist claims that a newly devised mathematical model can predict with 94% accuracy which couples will divorce - entirely on the basis of the first few minutes of a discussion about some disputed issue.  John Gottman, of the University of Washington, and two applied mathematicians analysed hundreds of videotaped conversations between couples in Professor Gottman's relationship research institute. They also analysed pulse rates and other physiological data to provide a "bitterness rating" for each conversation. 
The researchers were looking for what they called the "masters and disasters" of marriage. What mattered was not the dispute itself, but a couple's attitudes during the argument. "When the masters of marriage are talking about something important, they may be arguing, but they are also laughing and teasing and there are signs of affection because they have made emotional connections," Prof Gottman said. "But a lot of people don't know how to connect or how to build a sense of humour, and this means that a lot of fighting that couples engage in is a failure to make emotional connections.
"We wouldn't have known this without the mathematical model."
The researchers will take part in a symposium on love and marriage at the American Association for the Advancement of Science in Seattle tomorrow. On St Valentine's Day, they will produce the magic ratio of positive to negative interactions that is the mark of marital success. This ratio is 5 to 1: couples who keep their tempers and consider each other 80% of the time while arguing stand a chance of celebrating their golden wedding. Those who fall below this ratio might as well dial the lawyers, or at least the marriage guidance counsellors. The team say their model charts a "Dow Jones industrial average for marital conversation".
Prof Gottman has spent almost 30 years trying to discover what makes marriages work and fail. In 1999, he unveiled a systematic study of conversations between 124 couples who had been married less than nine months, and rated them for emotion, gesture and attitude. The "positive" codes were for affection, humour, joy, interest and validation. And then there were ratings for disgust, contempt, anger, fear, defensiveness, whining and sadness. At the end of three years, 17 couples had divorced.
Thread with caution when building pregnancy models..

Many retailers know that if they could really anticipate our purchasing patterns and where it leads to – that this could be very beneficial to them since they could  reach the customer quicker and more efficiently. And for many retailers “holy grail” application in the family or women’s segment is pregnancy prediction.
We all know that  life of any individual or family is very different in terms of priorities, habits and shopping behavior – before and after baby is born. At very least no one should argue that it should be different.  So, to be able to time such “earth-shattering” event where old world is gone and a new star is born – and then “help” that individual or family by paddling your own products ahead of competitors - can really get you large share of their wallets on purpose of serving their needs better than competitor. What is wrong with that? Well, few things can go wrong here, mostly in “privacy” department, and some “smarties” who went ahead of themselves eventually learned their lessons and they had to move a few steps back.
Lets’s start with conceptually outlying how you could build pregnancy predictive model, before putting a few warning signs, kind of “proceed with caution” or “danger ahead”.
The first thing you need to do is to put the "carrot on the hook" for any female customers who would be willing to share their pregnancy secret with you (first or second trimester preferable) for some hefty promotional discounts. Once you have a critical mass of newly pregnant customers – it is just a matter of capturing their purchasing history, so that you are able then to differentiate between them the rest (non-pregnant segmented) in the form of robust and accurate predictive model. Once, such model is in place it is a matter of implementing it, monitoring it and measuring value it generates.
All sound well and good - here is reality..
Once upon time there was one very clever man, in very clever marketing department of one forward-thinking retail company. And that man created a very smart data-mining model who could predict if woman customer is pregnant. Soon after mailing list followed to its likely pregnant female customers. As the story goes there were some very impressed customers who were amazed with “how did they know”? But they were some who were not so impressed, and they asked different questions of “how did they dare to know”? There were also some who felt wrongly “impregnated” like the father who stormed marketing department accusing them of leading his teenage daughter into getting pregnant - so they can sell to her their new range of baby products. But then, a few months later the same father end up sending letter of apology after discovery that his daughter was indeed pregnant!. Not to say that he was being completely stunned by how this retailer knew something he did not - even though his daughter lived with him. 
The biggest problem was that many customers felt spied on, feeling that their privacy was compromised, so they started cutting ties with this retailer and doing everything they could to hide their purchasing behavior. This prompted retailer to adjust accordingly their model execution. And the only remedy was to blur the fact that they had such probabilistic knowledge. This resulted in promotions where baby-products coupons were masked with other vouchers, and therefore it was no longer obvious that marketers had such knowledge, which kept customers at ease.  So, if you are competing for baby product market think carefully about how you navigate through this. Could be some stormy waters just when you think it is smooth sailing.
Goran Dragosavac


Wednesday, April 17, 2013

Text  Mining on F-word

I have a colleague who works as the analytical practitioner and recently she was involved in banking project where they were analyzing free text data collected online.

The idea was to hear who is talking out there about this company,  what are they saying, how influential are the voices, what is the sentiment, what is the critical mass ad so on. And no better words to start your exploration of negative sentiment than F-word, and then go on from there.

Next thing my colleague had done - was to use a technique called concept linking which takes selected word, in this case F-word, and produce a graphical display of the linkages between that word and other entities. And the thicker links would indicate a stronger connection between the words.

So, there she was, sitting with a senior bank manager who was probably dressed in a grey suit and tie,  using some neat technologies for linguistic exploration to find the most F–ed up areas of the business.  Isn't this just pure pragmatism! Basically - let’s see what are the customers most angry about before we see  if we can do something about it.
Next time someone throws expletive in your face – don’t get angry, try to learn from it!  
Goran Dragosavac

Sunday, February 10, 2013

The benefits of having a multidimensional view of the customer

When we talk about a multi-dimensional view of the customer we refer to view of the cu-stomer purely from the business perspective in relation to profitability, risk, responsiveness, loyalty, behavior and preferences. So, ability to see a single customer along these dimensions would undoubtedly give any organization incredible competitive advantage of being able to serve customers better and in return being awarded by customer’s larger share of his wallet.

 And this can certainly be achieved by using predictive analytics whose main output are probabilities, which in this case would be probability of being loyal, profitable… etc. So if you know that specific customer is of low value to you, whose cost of serving is far greater than the value he brings it to you in terms of profitability – who cares about his loyalty and preferences? You don’t want to waste a cent of marketing budget on him, in fact you want to open the door as wide as you can and let him go. On the contrary - customer in the highest value segment whose loyalty scores are dwindling deserves to be phoned by your top account managers to see how you can improve your service to him.  And if you know his preferences and buying habits - you know what to offer him to have him re-think his intention to leave you. This is what we mean when we say knowing the next best action toward the customer even if action is – don’t do anything, let him go!
At this level, analytics are no longer used for decision support – they are used for decision making. All you need is to look for constellation of these probabilities and next action toward the specific customer, or customer segment becomes crystally clear. You just need to act on what these numbers are telling you and “count the blessings”. This is what I call “holy grail” of analytics and until an organization can do all the above and more – usage of analytics is nowhere close to optimal. And for these companies who use analytics at that level and for those purposes – pay-offs are huge, but you may never know about it.
Goran Dragosavac

Thursday, January 31, 2013

Nine Laws of Data Mining

by Tom Khabaza

This content was created during the first quarter of 2010 to publish the “Nine Laws of Data Mining”, which explain the reasons underlying the data mining process. If you prefer brevity, see my tweets: If you are a member of LinkedIn, see the “9 Laws of Data Mining” subgroup of the CRISP-DM group for a discussion forum. This page contains laws 1-4, with further laws on additional pages. The 9 Laws are also expressed as haikus here.

Data mining is the creation of new knowledge in natural or artificial form, by using business knowledge to discover and interpret patterns in data. In its current form, data mining as a field of practise came into existence in the 1990s, aided by the emergence of data mining algorithms packaged within workbenches so as to be suitable for business analysts. Perhaps because of its origins in practice rather than in theory, relatively little attention has been paid to understanding the nature of the data mining process. The development of the CRISP-DM methodology in the late 1990s was a substantial step towards a standardised description of the process that had already been found successful and was (and is) followed by most practising data miners.


Although CRISP-DM describes how data mining is performed, it does not explain what data mining is or why the process has the properties that it does. In this paper I propose nine maxims or “laws” of data mining (most of which are well-known to practitioners), together with explanations where known. This provides the start of a theory to explain (and not merely describe) the data mining process.

It is not my purpose to criticise CRISP-DM; many of the concepts introduced by CRISP-DM are crucial to the understanding of data mining outlined here, and I also depend on CRISP-DM’s common terminology. This is merely the next step in the process that started with CRISP-DM.


1st Law of Data Mining – “Business Goals Law”:

Business objectives are the origin of every data mining solution

This defines the field of data mining: data mining is concerned with solving business problems and achieving business goals. Data mining is not primarily a technology; it is a process, which has one or more business objectives at its heart. Without a business objective (whether or not this is articulated), there is no data mining.

Hence the maxim: “Data Mining is a Business Process”.


2nd Law of Data Mining – “Business Knowledge Law”:
Business knowledge is central to every step of the data mining process

This defines a crucial characteristic of the data mining process. A naive reading of CRISP-DM would see business knowledge used at the start of the process in defining goals, and at the end of the process in guiding deployment of results. This would be to miss a key property of the data mining process, that business knowledge has a central role in every step.

For convenience I use the CRISP-DM phases to illustrate:

· Business understanding must be based on business knowledge, and so must the mapping of business objectives to data mining goals. (This mapping is also based on data knowledge data mining knowledge).

· Data understanding uses business knowledge to understand which data is related to the business problem, and how it is related.

· Data preparation means using business knowledge to shape the data so that the required business questions can be asked and answered. (For further detail see the 3rd Law – the Data Preparation law).

· Modelling means using data mining algorithms to create predictive models and interpreting both the models and their behaviour in business terms – that is, understanding their business relevance.

· Evaluation means understanding the business impact of using the models.

· Deployment means putting the data mining results to work in a business process.

In summary, without business knowledge, not a single step of the data mining process can be effective; there are no “purely technical” steps. Business knowledge guides the process towards useful results, and enables the recognition of those results that are useful. Data mining is an iterative process, with business knowledge at its core, driving continual improvement of results.

The reason behind this can be explained in terms of the “chasm of representation” (an idea used by Alan Montgomery in data mining presentations of the 1990s). Montgomery pointed out that the business goals in data mining refer to the reality of the business, whereas investigation takes place at the level of data which is only a representation of that reality; there is a gap (or “chasm”) between what is represented in the data and what takes place in the real world. In data mining, business knowledge is used to bridge this gap; whatever is found in the data has significance only when interpreted using business knowledge, and anything missing from the data must be provided through business knowledge. Only business knowledge can bridge the gap, which is why it is central to every step of the data mining process.


3rd Law of Data Mining – “Data Preparation Law”:

Data preparation is more than half of every data mining process

It is a well-known maxim of data mining that most of the effort in a data mining project is spent in data acquisition and preparation. Informal estimates vary from 50 to 80 percent. Naive explanations might be summarised as “data is difficult”, and moves to automate various parts of data acquisition, data cleaning, data transformation and data preparation are often viewed as attempts to mitigate this “problem”. While automation can be beneficial, there is a risk that proponents of this technology will believe that it can remove the large proportion of effort which goes into data preparation. This would be to misunderstand the reasons why data preparation is required in data mining.

The purpose of data preparation is to put the data into a form in which the data mining question can be asked, and to make it easier for the analytical techniques (such as data mining algorithms) to answer it. Every change to the data of any sort (including cleaning, large and small transformations, and augmentation) means a change to the problem space which the analysis must explore. The reason that data preparation is important, and forms such a large proportion of data mining effort, is that the data miner is deliberately manipulating the problem space to make it easier for their analytical techniques to find a solution.

There are two aspects to this “problem space shaping”. The first is putting the data into a form in which it can be analysed at all – for example, most data mining algorithms require data in a single table, with one record per example. The data miner knows this as a general parameter of what the algorithm can do, and therefore puts the data into a suitable format. The second aspect is making the data more informative with respect to the business problem – for example, certain derived fields or aggregates may be relevant to the data mining question; the data miner knows this through business knowledge and data knowledge. By including these fields in the data, the data miner manipulates the search space to make it possible or easier for their preferred techniques to find a solution.

It is therefore essential that data preparation is informed in detail by business knowledge, data knowledge and data mining knowledge. These aspects of data preparation cannot be automated in any simple way.

This law also explains the otherwise paradoxical observation that even after all the data acquisition, cleaning and organisation that goes into creating a data warehouse, data preparation is still crucial to, and more than half of, the data mining process. Furthermore, even after a major data preparation stage, further data preparation is often required during the iterative process of building useful models, as shown in the CRISP-DM diagram.

4th Law of Data Mining – “NFL-DM”:

The right model for a given application can only be discovered by experiment

or “There is No Free Lunch for the Data Miner”

It is an axiom of machine learning that, if we knew enough about a problem space, we could choose or design an algorithm to find optimal solutions in that problem space with maximal efficiency. Arguments for the superiority of one algorithm over others in data mining rest on the idea that data mining problem spaces have one particular set of properties, or that these properties can be discovered by analysis and built into the algorithm. However, these views arise from the erroneous idea that, in data mining, the data miner formulates the problem and the algorithm finds the solution. In fact, the data miner both formulates the problem and finds the solution – the algorithm is merely a tool which the data miner uses to assist with certain steps in this process.

There are 5 factors which contribute to the necessity for experiment in finding data mining solutions:

1. If the problem space were well-understood, the data mining process would not be needed – data mining is the process of searching for as yet unknown connections.

2. For a given application, there is not only one problem space; different models may be used to solve different parts of the problem, and the way in which the problem is decomposed is itself often the result of data mining and not known before the process begins.

3. The data miner manipulates, or “shapes”, the problem space by data preparation, so that the grounds for evaluating a model are constantly shifting.

4. There is no technical measure of value for a predictive model (see 8th law).

5. The business objective itself undergoes revision and development during the data mining process, so that the appropriate data mining goals may change completely.

This last point, the ongoing development of business objectives during data mining, is implied by CRISP-DM but is often missed. It is widely known that CRISP-DM is not a “waterfall” process in which each phase is completed before the next begins. In fact, any CRISP-DM phase can continue throughout the project, and this is as true for Business Understanding as it is for any other phase. The business objective is not simply given at the start, it evolves throughout the process. This may be why some data miners are willing to start projects without a clear business objective – they know that business objectives are also a result of the process, and not a static given.

Wolpert’s “No Free Lunch” (NFL) theorem, as applied to machine learning, states that no one bias (as embodied in an algorithm) will be better than any other when averaged across all possible problems (datasets). This is because, if we consider all possible problems, their solutions are evenly distributed, so that an algorithm (or bias) which is advantageous for one subset will be disadvantageous for another. This is strikingly similar to what all data miners know, that no one algorithm is the right choice for every problem. Yet the problems or datasets tackled by data mining are anything but random, and most unlikely to be evenly distributed across the space of all possible problems – they represent a very biased sample, so why should the conclusions of NFL apply? The answer relates to the factors given above: because problem spaces are initially unknown, because multiple problem spaces may relate to each data mining goal, because problem spaces may be manipulated by data preparation, because models cannot be evaluated by technical means, and because the business problem itself may evolve. For all these reasons, data mining problem spaces are developed by the data mining process, and subject to constant change during the process, so that the conditions under which the algorithms operate mimic a random selection of datasets and Wopert’s NFL theorem therefore applies. There is no free lunch for the data miner.

This describes the data mining process in general. However, there may well be cases where the ground is already “well-trodden” – the business goals are stable, the data and its pre-processing are stable, an acceptable algorithm or algorithms and their
role(s) in the solution have been discovered and settled upon. In these situations, some of the properties of the generic data mining process are lessened. Such stability is temporary, because both the relation of the data to the business (see 2nd law) and our understanding of the problem (see 9th law) will change. However, as long this stability lasts, the data miner’s lunch may be free, or at least relatively inexpensive.


5th Law of Data Mining – “Watkins’ Law”: There are always patterns

This law was first stated by David Watkins. We might expect that a proportion of data mining projects would fail because the patterns needed to solve the business problem are not present in the data, but this does not accord with the experience of practising data miners.

Previous explanations have suggested that this is because:

There is always something interesting to be found in a business-relevant dataset, so that even if the expected patterns were not found, something else useful would be found (this does accord with data miners’ experience), and

A data mining project would not be undertaken unless business experts expected that patterns would be present, and it should not be surprising that the experts are usually right.

However, Watkins formulated this in a simpler and more direct way: “There are always patterns.”, and this accords more accurately with the experience of data miners than either of the previous explanations. Watkins later amended this to mean that in data mining projects about customer relationships, there are always patterns connecting customers’ previous behaviour with their future behaviour, and that these patterns can be used profitably (“Watkins’ CRM Law”). However, data miners’ experience is that this is not limited to CRM problems – there are always patterns in any data mining problem (“Watkins’ General Law”).

The explanation of Watkins’ General Law is as follows:

· The business objective of a data mining project defines the domain of interest, and this is reflected in the data mining goal.

· Data relevant to the business objective and consequent data mining goal is generated by processes within the domain.

· These processes are governed by rules, and the data that is generated by the processes reflects those rules.

· In these terms, the purpose of the data mining process is to reveal the domain rules by combining pattern-discovery technology (data mining algorithms) with the business knowledge required to interpret the results of the algorithms in terms of the domain.

· Data mining requires relevant data, that is data generated by the domain processes in question, which inevitably holds patterns from the rules which govern these processes.

To summarise this argument: there are always patterns because they are an inevitable by-product of the processes which produce the data. To find the patterns, start from the process or what you know of it – the business knowledge.

Discovery of these patterns also forms an iterative process with business knowledge; the patterns contribute to business knowledge, and business knowledge is the key component required to interpret the patterns. In this iterative process, data mining algorithms simply link business knowledge to patterns which cannot be observed with the naked eye.

If this explanation is correct, then Watkins’ law is entirely general. There will always be patterns for every data mining problem in every domain unless there is no relevant data; this is guaranteed by the definition of relevance.


6th Law of Data Mining – “Insight Law”:
Data mining amplifies perception in the business domain

How does data mining produce insight? This law approaches the heart of data mining – why it must be a business process and not a technical one. Business problems are solved by people, not by algorithms. The data miner and the business expert “see” the solution to a problem, that is the patterns in the domain that allow the business objective to be achieved. Thus data mining is, or assists as part of, a perceptual process. Data mining algorithms reveal patterns that are not normally visible to human perception. The data mining process integrates these algorithms with the normal human perceptual process, which is active in nature. Within the data mining process, the human problem solver interprets the results of data mining algorithms and integrates them into their business understanding, and thence into a business process.

This is similar to the concept of an “intelligence amplifier”. Early in the field of Artificial Intelligence, it was suggested that the first practical outcomes from AI would be not intelligent machines, but rather tools which acted as “intelligence amplifiers”, assisting human users by boosting their mental capacities and therefore their effective intelligence. Data mining provides a kind of intelligence amplifier, helping business experts to solve business problems in a way which they could not achieve unaided.

In summary: Data mining algorithms provide a capability to detect patterns beyond normal human capabilities. The data mining process allows data miners and business experts to integrate this capability into their own problem solving and into business processes.


7th Law of Data Mining – “Prediction Law”:
Prediction increases information locally by generalisation

The term “prediction” has become the accepted description of what data mining models do – we talk about “predictive models” and “predictive analytics”. This is because some of the most popular data mining models are often used to “predict the most likely outcome” (as well as indicating how likely the outcome may be). This is the typical use of classification and regression models in data mining solutions.

However, other kinds of data mining models, such as clustering and association models, are also characterised as “predictive”; this is a much looser sense of the term. A clustering model might be described as “predicting” the group into which an individual falls, and an association model might be described as “predicting” one or more attributes on the basis of those that are known.

Similarly we might analyse the use of the term “predict” in different domains: a classification model might be said to predict customer behaviour – more properly we might say that it predicts which customers should be targeted in a certain way, even though not all the targeted individuals will behave in the “predicted” manner. A fraud detection model might be said to predict whether individual transactions should be treated as high-risk, even though not all those so treated are in fact cases of fraud.

These broad uses of the term “prediction” have led to the term “predictive analytics” as an umbrella term for data mining and the application of its results in business solutions. But we should remain aware that this is not the ordinary everyday meaning of “prediction” – we cannot expect to predict the behaviour of a specific individual, or the outcome of a specific fraud investigation.

What, then, is “prediction” in this sense? What do classification, regression, clustering and association algorithms and their resultant models have in common? The answer lies in “scoring”, that is the application of a predictive model to a new example. The model produces a prediction, or score, which is a new piece of information about the example. The available information about the example in question has been increased, locally, on the basis of the patterns found by the algorithm and embodied in the model, that is on the basis of generalisation or induction. It is important to remember that this new information is not “data”, in the sense of a “given”; it is information only in the statistical sense.


8th Law of Data Mining – “Value Law”:

The value of data mining results is not determined by the accuracy or stability
of predictive models

Accuracy and stability are useful measures of how well a predictive model makes its predictions. Accuracy means how often the predictions are correct (where they are truly predictions) and stability means how much (or rather how little) the predictions would change if the data used to create the model were a different sample from the same population. Given the central role of the concept of prediction in data mining, the accuracy and stability of a predictive model might be expected to determine its value, but this is not the case.

The value of a predictive model arises in two ways:

The model’s predictions drive improved (more effective) action, and

The model delivers insight (new knowledge) which leads to improved strategy.

In the case of insight, accuracy is connected only loosely to the value of any new knowledge delivered. Some predictive capability may be necessary to convince us that the discovered patterns are real. However, a model which is incomprehensibly complex or totally opaque may be highly accurate in its predictions, yet deliver no useful insight, whereas a simpler and less accurate model may be much more useful for delivering insight.

The disconnect between accuracy and value in the case of improved action is less obvious, but still present, and can be highlighted by the question “Is the model predicting the right thing, and for the right reasons?” In other words, the value of a model derives as much from of its fit to the business problem as it does from its predictive accuracy. For example, a customer attrition model might make highly accurate predictions, yet make its predictions too late for the business to act on them effectively. Alternatively an accurate customer attrition model might drive effective action to retain customers, but only for the least profitable subset of customers. A high degree of accuracy does not enhance the value of these models when they have a poor fit to the business problem.

The same is true of model stability; although an interesting measure for predictive models, stability cannot be substituted for the ability of a model to provide business insight, or for its fit to the business problem. Neither can any other technical measure.

In summary, the value of a predictive model is not determined by any technical measure. Data miners should not focus on predictive accuracy, model stability, or any other technical metric for predictive models at the expense of business insight and business fit.


9th Law of Data Mining – “Law of Change”: All patterns are subject to change

The patterns discovered by data mining do not last forever. This is well-known in many applications of data mining, but the universality of this property and the reasons for it are less widely appreciated.

In marketing and CRM applications of data mining, it is well-understood that patterns of customer behaviour are subject to change over time. Fashions change, markets and competition change, and the economy changes as a whole; for all these reasons, predictive models become out-of-date and should be refreshed regularly or when they cease to predict accurately.

The same is true in risk and fraud-related applications of data mining. Patterns of fraud change with a changing environment and because criminals change their behaviour in order to stay ahead of crime prevention efforts. Fraud detection applications must therefore be designed to detect new, unknown types of fraud, just as they must deal with old and familiar ones.

Some kinds of data mining might be thought to find patterns which will not change over time – for example in scientific applications of data mining, do we not discover unchanging universal laws? Perhaps surprisingly, the answer is that even these patterns should be expected to change.

The reason is that patterns are not simply regularities which exist in the world and are reflected in the data – these regularities may indeed be static in some domains. Rather, the patterns discovered by data mining are part of a perceptual process, an active process in which data mining mediates between the world as described by the data and the understanding of the observer or business expert. Because our understanding continually develops and grows, so we should expect the patterns also to change. Tomorrow’s data may look superficially similar, but it will have been collected by different means, for (perhaps subtly) different purposes, and have different semantics; the analysis process, because it is driven by business knowledge, will change as that knowledge changes. For all these reasons, the patterns will be different.

To express this briefly, all patterns are subject to change because they reflect not only a changing world but also our changing understanding


The 9 Laws of Data Mining are simple truths about data mining. Most of the 9 laws are already well-known to data miners, although some are expressed in an unfamiliar way (for example, the 5th, 6th and 7th laws). Most of the new ideas associated with the 9 laws are in the explanations, which express an attempt to understand the reasons behind the well-known form of the data mining process.

Why should we care why the data mining process takes the form that it does? In addition to the simple appeal of knowledge and understanding, there is a practical reason to pursue these questions.

The data mining process came into being in the form that exists today because of technological developments – the widespread availability of machine learning algorithms, and the development of workbenches which integrated these algorithms with other techniques and make them accessible to users with a business-oriented outlook. Should we expect technological change to change the data mining process? Eventually it must, but if we understand the reasons for the form of the process, then we can distinguish between technology which might change it and technology which cannot.

Several technological developments have been hailed as revolutions in predictive analytics, for example the advent of automated data preparation and model re-building, and the integration of business rules with predictive models in deployment frameworks. The 9 laws of data mining suggest, and their explanations demonstrate, that these developments will not change the nature of the process. The 9 laws, and further development of these ideas, should be used to judge any future claims of revolutionising the data mining process, in addition to their educational value for data miners.

I would like to thank Chris Thornton and David Watkins, who supplied the insights which inspired this work, and also to thank all those who have contributed to the LinkedIn “9 Laws of Data Mining” discussion group, which has provided invaluable food for thought.