Saturday, March 26, 2011

Automated model-building – science fiction or reality?

How many times we have all heard before, some unscrupulous software vendors of analytical software saying something like.. “All you need to do is to press a few buttons on our analytical software and models will be built automatically, and you just sit back and sip your coffee”?
 I used to warn companies of such exaggerated claims.
Well, with the advent of SAS Institute Rapid Predictive Modeler (RPM)  -  believe it or not - this is all very much reality.  And the reason behind creating automated modeling tool is to allow non-technical audiences, with little statistical knowledge to build the models on a fly.
So, they would literally press a few buttons within an environment that are familiar, and this tool would, build models elsewhere and then return results back into the original environment.
The user would be able to choose what types of models are needed in terms of complexity, and choices are among the basic, intermediate and advanced options. They could also choose which components of the output are to be returned. 
But, RPM is not just for non-technical (read: statistical) users.  One of the options is to save the model in the modeling environment (Enterprise Miner), so user can access modeling diagram and use it for benchmarking,  validation, and of course - to fasten their own search for optimal models. For example - I have used RPM to build advanced modeling diagram with multiple transformations, variable selection and modeling options within a few minutes.  Building same type of advanced model in my usual way of building models would take my whole day.
So, if you wonder about the costs of this software, the good news is that it comes for free provided you have already access to the latest version of SAS's data mining software – Enterprise Miner 6.2. Rest is easy!

Sunday, March 20, 2011

SAS Institute wins patent approval for its Fraud Detection capability

Two and half years later after winning Frost & Sullivan North American Technology Innovation of the Year Award in the field of Enterprise Fraud Detection and Prevention technologies SAS Institute has won approval for "Computer-Implemented Predictive Model Generation Systems and Methods" (US patent 7,788,195 B1), which is at the heart of  SAS® Fraud Management, an integral component of the SAS Enterprise Financial Crimes Framework for Banking.

Let’s just note that a Frost & Sullivan criterion is based on following:

Significance of the innovation(s) in the industry, and across industries (if applicable)
Potential of the products of innovation(s) to become industry standard(s)
Competitive advantage of innovation vis-à-vis other related innovations
Impact (or potential impact) of innovation(s) on company or industry mind share

So, it seems that by patenting their award winning product SAS Institute is fairly confident that they will make impact toward further improvements and successes in fraud detection arena.
SAS Fraud Management employs a "Self-Organizing Neural Network Arboretum" (SONNA) modeling capability to build its unique hybrid approach to consortium and custom models. This capability offers significant improvement in fraud detection performance compared to the performance of regular, non-linear modeling techniques like neural networks. Capability is based of computing signature of card-holder over the period of time and then matching specific transaction to the overall blueprint, and allocating specific score that transaction is fraudulent.
SAS Fraud Solution provides the creation of risk-based reason factor groups, giving an organization's fraud operations and analytics groups more insight and context to the risks and reasons behind the model score - optimizing responses and automating actions to a high-scoring transaction.
In words of Revathi Subramanian, primary patent inventor and Research and Development Director in the SAS Fraud Modeling Department "Banks will be able to better protect themselves and their customers against fraudsters using these advanced techniques. This patent is a demonstration of SAS' continued commitment to innovation and reinvestment that makes it one of the foremost business analytics companies in the world."
SAS has developed consortium models for Asia Pacific, Mexico, the United Kingdom and the United States, but the models can also be used in other non-listed regions to drive immediate return on investment.
"SAS feels that customized predictive models are the best way to maximize ROI from a fraud solution," said Ellen Joyner, Global Marketing Manager, Financial Crimes Prevention at SAS. "We understand that all clients may not have the data history of known fraudsters and their behavior to customize predictive models or the requirement for a customized approach.
 SAS has developed regional models based upon consortium data that helps SAS to readily deploy ‘out of the box' analytics across multiple product and channel portfolios."

New buzz - Social Media Analytics!

We have all seen the incredible power of social media in the realm of politics and social spheres. Media channels, like twitter are allowing news to spread like wildfire and change the way of how we think and what we do, giving us more objective view of the world around us.

Business and commercial spheres are impacted as much. There is a human need for connection and communication. Traditional loyalty to companies and the brands is thing of the past (unless you have an apple in your logo), and the consumers are increasingly mistrustful of the advertising, looking for more objective sources of advice of what to buy and from whom.

Another powerful dynamic is the sheer strength of the consumer's voice within the context of social media, which comes in large numbers with no geographic boundaries. This is certainly taking the big business by surprise, realizing that their brand and reputation is under scrutiny like never before, and that even the smallest misstep can have large effects very quickly on corporate reputation and ultimately its bottom line.

So, matter of fact is that social media does have an impact on your brand. And this is not just twitter, or Facebook, but thousands of others online channels where people post and share information.

So, if you are big business - it is given that somebody is talking about you. It is also given that what has been said can have an impact on your brand. What is not obvious is who is talking, where they are talking from and what are they saying.

Fortunately, there are technologies that can assist you. First you need a means of finding and collecting online sources of relevant information. Then you need technologies that enable you to store internally such information. The third step is to categorize it and, analyze it. This bunch of different technologies is known as Social Media Analytics (SMA).

There is no doubt that SMA really helps you to dig deep under surface and differentiate between sources of positive and negative sentiment. Also, you can categorize this sentiment based on areas of your business. You can also measure this influence of specific online source to your brand, find key influential authors, spot the trend, etc.

In other words, so you can something about it, and you can do it early before it is too late. And if you just do it for your own corporate benefits this can certainly have short-term effects. The trick is to do it for the benefits of your customers first and they are likely to award you with their loyalties and even forgive you for some future slip-ups. Anyone mentioned company with the apple in their logo?

Regards to all and send me your comments.


Saturday, March 19, 2011

What skills do you need to become good data miner?

This is the question I am often asked.

In order to become good data mining practitioner one needs to understand statistical concepts and basic principles of knowledge induction. Knowing inferential stats, t-tests, analysis of variance, regressions, etc.. is important, and this is your "bread and butter" knowledge. Then, you need to know some academic or commercial software, since you won't be doing logistic regression - for example - on a piece of paper.

At this point, you at the mercy of educational institution. If they have access to software that large organizations in your local markets are using - well, then your employability will be far higher - than if they open source academic software.

In the area where I live - there is one university whose statistic graduates are rarely seen in big companies. And it is no surprise - people who run this university have attitude of "why pay a cent for software "x" when I get software "y" for free? Well, if market demand is for software "x" - that should be reason good enough.

So, you know the stats and you know some statistical software - is that it? Sure is not. Few times I had statisticians with master degree and phd’s phoning me to ask me "where do I start with solving my business problem? So, knowing methodologies and processes is vitally important. No predictive modeling type of project will ever start with a predictive model. In fact no data mining project will ever start with data or with statistical/data mining software. So, methodology is not what you learn in most stat classes! Right?

Therefore, is knowing statistical methods, some statistical software and methodologies all you need to know to become good analyst/data miner? Not quite! How about knowledge of applications that you would use your statistical skills? If you a building credit-card fraud detection model, as oppose to policy lapse prediction model - basic principles of model building are the same, but modeling nuances are not. I often say that good data miner understands that everything that happens in data preparation phase, as well as in modeling phase is a function of business problem and business objectives. So, one can imagine that there are big differences between fraud in the credit-card space as oppose to insurance policies lapses. And those differences will be translated into how you are going to prepare your data and how will you build your model.

Therefore, knowing applications is very important. And, regardless of in which industry you are - some applications you will encounter more often than the others. Segmentations, cross-selling, retention/churn, customer value and profitability are particularly common - so, every practitioner should be familiar with at least those.

So, is that all? Nope.

How about knowledge of best practices? Instead of relying on your intuition and gut-feeling when confronted with specific modeling puzzle, wouldn't be helpful to know what others have done when confronted with the same problem, and how successful they were. That is where knowing best practices come handy. And if solution to your modeling problem is not in any best practices manual - then relying on your intuition and creativity becomes essential.

So, let's imagine you have all these skills, and you are required to present your results to the board of your multinational client - which is some Chinese company. And it gets worse! You need to present these results in Mandarin. And you can't find anyone to translate it. And this means you will not be able to convince the board that you will dramatically reduce their business problem. And, all that means that with all your skills - your project has FAILED!

What am I trying to say here? You got to have ability to communicate yo your business sponsors - in a way that they can understand you, so that you can convince them that what you have done will have positive business effects. So you have to be able to switch between two different languages - language of techies and language of business folk. And this is not the easy task!

I have seen good modelers blowing it all up by putting their business audiences to sleep, and I saw modelers having huge success with mediocre modeling results, just by virtue of by being able to talk the language of their business audiences.

So, now you can see why data miner is rare creature. And hopefully, this article can help you in identifying your own areas of improvement.

Good luck and send me comment!