If you are interested in analytics - you are on a right place!
Send me a comment, let me know what is happening in YOUR analytical world! For much more visit my portal: http://www.bigdatanalysis.com/
Hope to see you there!
Friday, September 30, 2011
What to do when the data doesn’t fit the analytical question?
Smart response to this question can be – well, either we get the new data, or new question!
Let’s imagine our task is to find similarity between members of the same group, for example – home loan customers. Now, imagine the situation where we ONLY have a data for the home loans customers.
We can certainly examine all their characteristics, but there is no guarantee that they will be different from purchases of some other banking products. What we need is some point of reference. We need additional data of customers who have any other product other than home loans. So, in order to find out what is something similar about them, we need to figure what is different between them and anyone else – which is pretty much one and a same thing.
This is invariably classification problem which we try to solve by unary target variable (where all purchasers having the same value of the product purchased). So, since we don’t have, or are able to get - additional data for customers that have other types of products – we need to go for second-best scenario. So, instead of “reformulating” data through the artful and creative data preparation to better fit analytical question – we have no other option but to do exactly opposite – reformulating analytical question to fit the data at hand.
This would mean that our new question should be what are the groups of similarity within the single class of loan customers, and how do they differ from other groups of loan customers – as oppose to the original question of what makes my “loan” customers similar? This is now very different question and by reformulating our question we are also picking new “tool” from our workbench, so instead of using some classification algorithm we are reverting to clustering method.
So, the usual premise where data and analytical methods are functions of business question – doesn’t work in this situation, so practical solution is to alter the initial objective.