Analytics and Data Mining

Wednesday, September 26, 2012

What to do with False Positives?

I often hear complaints from business folk that their models need improvement because there are too many false positives. Just for clarity - in a case of fraud transactions – false positives are related to those transactions which were assigned to be fraudulent when in reality they weren’t. Sure, one needs to always minimize occurrence of false positives as much as possible, but it is not always the model’s fault. Sometimes what looks like a clear cut fraud – just isn’t. It is a fuzzy area where the difference between patterns of your event and non-event are completely blurred. Kind of – it could go either way!

Some implementations of analytics have been built on false positives. These are the people who look like buyers of particular brand – and yet they are not. Well, the logical assumption is that if some marketing stimuli is sent to these people – they are more likely to become buyers of that brand, due to its high-degree of look-alike-ness than randomly selected folk. I have completed several successful projects geared solely on acting on these ‘so called’ modeling mistakes.

Another example is building a model capable of predicting who will be dormant customers within a period of time. After building the model we score it on some existing base comprising of known (historical) dormant customers as well as of those who are not. Then, we focus on false positives and compare them to one’s that are correctly predicted. Often the difference is so small between the two groups in terms of their usage patterns – that we may as well call them all dormant customers. Even though false positives are technically not dormant yet – for all intents and purposes they really are. So, we go back to the business definition of what constitutes dormant customer and we look at the whole phenomenon with a new fresh angle. Thanks to comparative studies between accurate predictions and false positives.

So what I am trying to say in this article is that what appears to be modeling “mistake” can be turned into the value from more than one different angle. There is always a reason why models make mistakes – and tiredness is never one of them.

Goran Dragosavac