What to do with False Positives?
I often hear complaints from business folk that their models need
improvement because there are too many false positives. Just for clarity - in a
case of fraud transactions – false positives are related to those transactions
which were assigned to be fraudulent when in reality they weren’t. Sure, one needs to always minimize
occurrence of false positives as much as possible, but it is not always the model’s
fault. Sometimes what looks like a clear
cut fraud – just isn’t. It is a fuzzy area where the difference between
patterns of your event and non-event are completely blurred. Kind of – it could
go either way!
Some implementations of analytics have been
built on false positives. These are the people who look like buyers of
particular brand – and yet they are not. Well, the logical assumption is that
if some marketing stimuli is sent to these people – they are more likely to
become buyers of that brand, due to its high-degree of look-alike-ness than
randomly selected folk. I have completed several successful projects geared
solely on acting on these ‘so called’ modeling mistakes.
Another example is building a model capable of predicting
who will be dormant customers within a period of time. After building the model we score it on some
existing base comprising of known (historical) dormant customers as well as of those who are not.
Then, we focus on false positives and compare them to one’s that are correctly
predicted. Often the difference is so
small between the two groups in terms of their usage patterns – that we may as
well call them all dormant customers. Even though false positives are
technically not dormant yet – for all
intents and purposes they really are. So, we go back to the business definition
of what constitutes dormant customer and we look at the whole phenomenon with a
new fresh angle. Thanks to comparative
studies between accurate predictions and false positives.
So what I am trying to say in this article is that what
appears to be modeling “mistake” can be turned into the value from more than
one different angle. There is always a reason why models make mistakes – and
tiredness is never one of them.
Goran Dragosavac
Thank you For Sharing Big data hadoop online Course India
ReplyDeleteAmazing content.
ReplyDeleteData Mining Service Providers