In order to become good data mining practitioner one needs to understand statistical concepts and basic principles of knowledge induction. Knowing inferential stats, t-tests, analysis of variance, regressions, etc.. is important, and this is your "bread and butter" knowledge. Then, you need to know some academic or commercial software, since you won't be doing logistic regression - for example - on a piece of paper.
At this point, you at the mercy of educational institution. If they have access to software that large organizations in your local markets are using - well, then your employability will be far higher - than if they open source academic software.
In the area where I live - there is one university whose statistic graduates are rarely seen in big companies. And it is no surprise - people who run this university have attitude of "why pay a cent for software "x" when I get software "y" for free? Well, if market demand is for software "x" - that should be reason good enough.
So, you know the stats and you know some statistical software - is that it? Sure is not. Few times I had statisticians with master degree and phd’s phoning me to ask me "where do I start with solving my business problem? So, knowing methodologies and processes is vitally important. No predictive modeling type of project will ever start with a predictive model. In fact no data mining project will ever start with data or with statistical/data mining software. So, methodology is not what you learn in most stat classes! Right?
Therefore, is knowing statistical methods, some statistical software and methodologies all you need to know to become good analyst/data miner? Not quite! How about knowledge of applications that you would use your statistical skills? If you a building credit-card fraud detection model, as oppose to policy lapse prediction model - basic principles of model building are the same, but modeling nuances are not. I often say that good data miner understands that everything that happens in data preparation phase, as well as in modeling phase is a function of business problem and business objectives. So, one can imagine that there are big differences between fraud in the credit-card space as oppose to insurance policies lapses. And those differences will be translated into how you are going to prepare your data and how will you build your model.
Therefore, knowing applications is very important. And, regardless of in which industry you are - some applications you will encounter more often than the others. Segmentations, cross-selling, retention/churn, customer value and profitability are particularly common - so, every practitioner should be familiar with at least those.
So, is that all? Nope.
How about knowledge of best practices? Instead of relying on your intuition and gut-feeling when confronted with specific modeling puzzle, wouldn't be helpful to know what others have done when confronted with the same problem, and how successful they were. That is where knowing best practices come handy. And if solution to your modeling problem is not in any best practices manual - then relying on your intuition and creativity becomes essential.
So, let's imagine you have all these skills, and you are required to present your results to the board of your multinational client - which is some Chinese company. And it gets worse! You need to present these results in Mandarin. And you can't find anyone to translate it. And this means you will not be able to convince the board that you will dramatically reduce their business problem. And, all that means that with all your skills - your project has FAILED!
What am I trying to say here? You got to have ability to communicate yo your business sponsors - in a way that they can understand you, so that you can convince them that what you have done will have positive business effects. So you have to be able to switch between two different languages - language of techies and language of business folk. And this is not the easy task!
I have seen good modelers blowing it all up by putting their business audiences to sleep, and I saw modelers having huge success with mediocre modeling results, just by virtue of by being able to talk the language of their business audiences.
So, now you can see why data miner is rare creature. And hopefully, this article can help you in identifying your own areas of improvement.
Good luck and send me comment!