Friday, September 30, 2011

What to do when the data doesn’t fit the analytical question?

Smart response to this question can be – well, either we get the new data, or new question!
Let’s imagine our task is to find similarity between members of the same group, for example – home loan customers. Now, imagine the situation where we ONLY have a data for the home loans customers.
We can certainly examine all their characteristics, but there is no guarantee that they will be different from purchases of some other banking products. What we need is some point of reference. We need additional data of customers who have any other product other than home loans. So, in order to find out what is something similar about them, we need to figure what is different between them and anyone else – which is pretty much one and a same thing.
This is invariably classification problem which we try to solve by unary target variable (where all purchasers having the same value of the product purchased). So, since we don’t have, or are able to get - additional data for customers that have other types of products – we need to go for second-best scenario. So, instead of “reformulating” data through the artful and creative data preparation to better fit analytical question – we have no other option but to do exactly opposite – reformulating analytical question to fit the data at hand.
This would mean that our new question should be what are the groups of similarity within the single class of loan customers, and how do they differ from other groups of loan customers – as oppose to the original question of what makes my “loan” customers similar? This is now very different question and by reformulating our question we are also picking new “tool” from our workbench, so instead of using some classification algorithm we are reverting to clustering method.
So, the usual premise where data and analytical methods are functions of business question – doesn’t work in this situation, so practical solution is to alter the initial objective.     

Goran Dragosavac

Wednesday, September 28, 2011

If you are new to Web Mining…

If you selling products and services via web channel you may consider analyzing who is visiting your web site and how do people who buy differ from thos that don’t, and out of those who buy - what is their clickstream sequence and navigational pattern.
Each customer's action on a website generates data, and not just high-level interactions such as buying something but also something as simple as using a search engine or navigating through a site. All these interactions between digital service providers, and the consumer can be recorded, and stored in digital databases. These large data sets contain information helpful to business marketing strategies, both - for retrospective analysis, as well as for data-driven forecasting.

Companies today are in the unprecedented position of being able to collect vast amounts of customer information relatively easily. By using web mining, companies can analyze and predict the behavior of their customers. All web site visitors leave digital trails which web servers automatically store in log files. Web analysis tools analyze, and process these web server logs files to produce meaningful information. Essentially, a complete profile of site traffic is created which shows for example, how many visitors there were to the site, what sites they came from, and which pages on the site are most popular. Web analysis tools provide companies with previously unknown statistics, and useful insights into the behavior of their online customers. While the usage and popularity of such tools may continue to increase, many online retailers are now demanding more useful information about their customers, from the vast amounts of data generated by their web sites.

Organizations have typically invested large amounts of money into developing their web sites and web strategy and they would like to know what return they are receiving on their investment. Most sites use hits and page views as measure of success of the web site, which clearly is not going to answer their questions. A website is commonly used for:

-Selling products/services
-Providing product/company information
-Providing customer support

Typical questions that an e-retailer needs to answer are:

- How to increase browser to buyer conversion rate?
- How to increase web retention rate? (Defined as ratio of number of browsers who return to the web site within certain window of time to the total number of browsers.)
- How to reduce clicks-to-close value? (Smaller number indicates that customers are finding easier what they looking for. To reduce this value personalization of web services is a right approach.
- Does the web site design satisfy the needs of various customer segments?

Using page hits will NOT provide answer for any of these goals. Current traffic analysis tools are geared at providing high-level predefined reports about domain names, IP addresses, browsers, cookies and other machine-to-machine activity. These server activity reports simply do not provide the type of bottom-line analysis that e-tailers, service providers, marketers and advertisers in the business world have come to demand. These software packages (i.e., web analysis tools) originated from the need to report on the activity of the web server and not on the activity of the user.

Web mining may be subdivided into:
- Web-content mining
- Web-structure mining
- Web-usage mining.
- User profile data

Web-content mining is the mining of Internet pages, common in the next generation of XML/RKF-based search engines/Web spiders.
Web-structure mining is the application of data mining to reconstruct the structure of a Web site or sites.
Web-usage mining is mining of log files and associated data from a particular Web site to discover knowledge of browser and buyer behavior on that site. User profile data, such as demographic information about the users of the web-site, registration data and customer profile information can provide valuable information of its customers, and can be platform for segmentation and profiling. Web-usage mining is what is widely understood to be web mining and it is main subject of this introduction.

Goran Dragosavac

Data Mining in Retail Industry

Retail industry collects large amount of data on sales and customer shopping history. The quantity of data collected continues to expand rapidly, especially due to the increasing ease, availability and popularity of the business conducted on web, or e-commerce. Retail industry provides a rich source for data mining. Retail data mining can help identify customer behavior, discover customer shopping patterns and trends, improve the quality of customer service, achieve better customer retention and satisfaction, enhance goods consumption ratios design more effective goods transportation and distribution policies and reduce the cost of business.

Some of the retail applications of data mining are in following areas:

Customer Relationship Management
 Customer Segmentation: Customer segmentation is a vital ingredient in a retail organization's marketing recipe. It can offer insights into how different segments respond to shifts in demographics, fashions and trends. For example it can help classify customers in the following segments:
·          Customers who respond to new promotions
·          Customers who respond to new product launches
·          Customers who respond to discounts
·          Customers who show propensity to purchase specific products

 Campaign/ Promotion Effectiveness Analysis: Once a campaign is launched its effectiveness can be studied across different media and in terms of costs and benefits; this greatly helps in understanding what goes into a successful marketing campaign. Campaign/ promotion effectiveness analysis can answer questions like:
·         Which media channels have been most successful in the past for various campaigns?
·         Which geographic locations responded well to a particular campaign?
·         What were the relative costs and benefits of this campaign?
·         Which customer segments responded to the campaign?
 Customer Lifetime Value (CLV): Not all customers are equally profitable. CLV attempts to calculate some projected relative measure of value by calculating Risk Adjusted Revenue (probability of customer owning categories/products in his portfolio that he currently doesn ‘t have), as well as Risk Adjusted Loss (probability of customer dropping categories/products in his portfolio that he currently owns) and adding to some Net Present Value, and deducting the value of servicing the customer.
Customer Potential: Also, there are those customers who are not very profitable today may have the potential of being profitable in future. Hence it is absolutely essential to identify customers with high potential before deciding what the best way to realize that potential is through the right marketing stimully..
 Customer Loyalty Analysis: It is more economical to retain an existing customer than to acquire a new one. To develop effective customer retention programs it is vital to analyze the reasons for customer attrition. Business Intelligence helps in understanding customer attrition with respect to various factors influencing a customer and at times one can drill down to individual transactions, which might have resulted in the change of loyalty.
 Cross Selling: Retailers use the vast amount of customer information available with them to cross sell other products at the time of purchase. This can be done through product portfolio analysis and then selling the products that are missing from typical portfolios. Also market basket analysis can be another food method for effective cross selling. Look-a-like modeling is yet another strategy where model is produce that produce some quantitative measure of affinity of the customer to a specific product.
 Product Pricing: Pricing is one of the most crucial marketing decisions taken by retailers. Often an increase in price of a product can result in lower sales and customer adoption of replacement products. Using data warehousing and data mining, retailers can develop sophisticated price models for different products, which can establish price - sales relationships for the product and how changes in prices affect the sales of other products.
 Target Marketing/Response Modeling: Retailers can optimize the overall marketing and promotion effort by targeting campaigns to specific customers or groups of customers. Target marketing can be based on a very simple analysis of the buying habits of the customer or the customer group; but increasingly data mining tools are being used to define specific customer segments that are likely to respond to particular types of campaigns.
Supply Chain Management & Procurement
Supply chain management (SCM) promises unprecedented efficiencies in inventory control and procurement to the retailers. With cash registers equipped with bar-code scanners, retailers can now automatically manage the flow of products and transmit stock replenishment orders to the vendors. The data collected for this purpose can provide deep insights into the dynamics of the supply chain. However, most of the commercial SCM applications provide only transaction-based functionality for inventory management and procurement; they lack sophisticated analytical capabilities required to provide an integrated view of the supply chain.
 Vendor Performance Analysis: Performance of each vendor can be analyzed on the basis of a number of factors like cost, delivery time, quality of products delivered, payment lead time, etc. In addition to this, the role of suppliers in specific product outages can be critically analyzed.
 Inventory Control (Inventory levels, safety stock, lot size, and lead time analysis): Both current and historic reports on key inventory indicators like inventory levels, lot size, etc. can be generated from the data warehouse, thereby helping in both operational and strategic decisions relating to the inventory.
 Product Movement and the Supply Chain: Some products move much faster off the shelf than others. On-time replenishment orders are very critical for these products. Analyzing the movement of specific products - using BI tools - can help in predicting when there will be need for re-order.
 Demand Forecasting: Complex demand forecasting models can be created using a number of factors like sales figures, basic economic indicators, environmental conditions, etc. If correctly implemented, a data warehouse can significantly help in improving the retailer’s relations with suppliers and can complement the existing SCM application.
Storefront Operations
The information needs of the store manager are no longer restricted to the day to day operations. Today’s consumer is much more sophisticated and she demands a compelling shopping experience. For this the store manager needs to have an in-depth understanding of her tastes and purchasing behavior. Data warehousing and data mining can help the manager gain this insight. Following are some of the uses of BI in storefront operations:
Store Segmentation: This analysis takes the data that is common for different stores, and finds out which stores are similar in terms of product or customer dimensions. In other words – what stores are similar based on products that are sold quickly or more slowly in comparison to rest of the stores. Next step is to build the profile of the customers that buys from specific store.
 Market Basket Analysis: It is used to study natural affinities between products. One of the classic examples of market basket analysis is the beer-diaper affinity, which states that men who buy diapers are also likely to buy beer. This is an example of 'two-product affinity'. But in real life, market basket analysis can get extremely complex resulting in hitherto unknown affinities between a number of products. This analysis has various uses in the retail organization. One very common use is for in-store product placement. Another popular use is product bundling, i.e.grouping products to be sold in a single package deal. Other uses include design ing the company's e-commerce web site and product catalogs.
 Category Management: It gives the retailer an insight into the right number of SKUs to stock in a particular category. The objective is to achieve maximum profitability from a category; too few SKUs would mean that the customer is not provided withadequate choice, and too many would mean that the SKUs are cannibalizing each other. It goes without saying that effective category management is vital for a retailer's survival in this market.
 Out-Of-Stock Analysis: This analysis probes into the various reasons resulting into an out of stock situation. Typically a number of variables are involved and it can get very complicated. An integral part of the analysis is calculating the lost revenue due to product stock out.
Alternative Sales Channels
 E Business Analysis: The Internet has emerged as a powerful alternative channel for established retailers. Increasing competition from retailers operating purely over the Internet - commonly known as 'e-tailers' - has forced the 'Bricks and Mortar' retailers to quickly adopt this channel. Their success would largely depend on how they use the Net to complement their existing channels. Web logs and Information forms filled over the web are very rich sources of data that can provide insightful information about customer's browsing behavior, purchasing patterns, likes and dislikes, etc. Two main types of analysis done on the web site data are:
·         Web Log Analysis: This involves analyzing the basic traffic information over the e-commerce web site. This analysis is primarily required to optimize the operations over the Internet. It typically includes following analyses:
·         Site Navigation: An analysis of the typical route followed by the user while navigating the web site. It also includes an analysis of the most popular pages in the web site. This can significantly help in site optimization by making it more user- friendly.
·         Referrer Analysis: An analysis of the sites, which are very prolific in diverting traffic to the company’s web site.
·         Error Analysis: An analysis of the errors encountered by the user while navigating the web site. This can help in solving the errors and making the browsing experience more pleasurable. n Keyword Analysis: An analysis of the most popular keywords used by various users in Internet search engines to reach the retailer’s e-commerce web site.
·         Product Recommendation: If someone buys product A which other product he may buy. Usually there are 3 different angles to exploit when setting up recommendation engine: natural product affinities, customers affinities and preferences, peer dynamics and wisdom of the crowds.

 Channel Profitability: Data mining can help analyze channel profitability, and whether it makes sense for the retailer to continue building up expertise in that channel. The decision of continuing with a channel would also include a number of subjective factors like outlook of key enabling technologies for that channel.
 Product – Channel Affinity: Some product categories sell particularly well on certain channels. Data mining can help identify hidden product-channel affinities and help the retailer design better promotion and marketing campaigns.
Finance and Fixed Asset Management
The role of financial reporting has undergone a paradigm shift during the last decade. It is no longer restricted to just financial statements required by the law; increasingly it is being used to help in strategic decision making. Also, many organizations have embraced a free information architecture, whereby financial information is openly available for internal use. Many analytics described till now use financial data. Many companies, across industries,have integrated financial data in their enterprise wide data warehouse or established separate Financial Data Warehouse (FDW). Following are some of the uses of BI in finance:
 Budgetary Analysis: Data warehousing facilitates analysis of budgeted versus actual expenditure for various cost heads like promotion  overruns can be analyzed in more detail. It can also be used to allocate budgets for the coming financial period.
Fixed Asset Return Analysis: This is used to analyze financial viability of the fixed assets owned or leased by the company. It would typically involve measures like profitability per sq. foot of store space, total lease cost vs. profitability, etc.
 Financial Ratio Analysis: Various financial ratios like debt-equity, liquidity ratios, etc. can be analyzed over a period of time. The ability to drill down and join inter-related reports and analyses – provided by all major OLAP tool vendors – can make ratio analysis much more intuitive.
 Profitability Analysis: This includes profitability of individual stores, departments within the store, product categories, brands, and individual SKUs.