RFM Segmentation

Sunday, August 9, 2015

RFM Segmentation

Even though RFM segmentation is well known in retail industry, and basic premise is that by knowing recency, frequency and value of the purchase you can be in good position to start figuring out specific customer in terms of its value, purchasing behavior and its loyalties. However, same logic can be applied for any phenomena that we trying to predict. Therefore, knowing how often something happens, how recently its happened and its voracity – has same type of predictive power as it has in retail context. And whenever I used it for predictive modeling- RFM would always come as one of the top predictors. So, let me delve deeper in explaining basic principles of RFM method.

RFM segments the customer base based on recency of purchase (R), frequency of purchase (F) and monetary value (M). Recency parameter is the most powerful of the 3. In forecasting models latest time series often has the highest weighting and is the most predictive of the next forecasting value. Second most powerful is the frequency as long as the definition of the frequency is limited to last month or quarter and not over entire life-span of customer relationship. Least powerful is the monetary value. Since the total value in the period of time is directly correlated with frequency it is advisable to use an average value.

There are several different ways to calculate RFM groups and scores and below is the classic approach:

First create 5 segments based on the recency, dividing the data file into 5 exact quintiles, where the contacts with the most recent Transactions (i.e. in the top 20% of the file) are given a recency value of 5, then the next 20% are given a recency value of 4 and so on. Then, each of those quintiles, segmented into 5 further quintiles based on the frequency value for each contact where the contacts with the highest transaction frequency value are of 5, then the next 20% is given a frequency value of 4 and so on. Finally, each of these segments is then segmented into 5 further quintiles, based on the monetary value of each contact; i.e. the total amount which all that contact’s transactions add up to. Those contacts with the highest monetary values (i.e. in the top 20%), are given a monetary value of 5, then the next 20% are given a monetary value of 4 and so on.) At the end of this process, you will have 125 segments with a RFM group between 111 and 555 with the same number of contacts within each segment; and each contact will have a RFM score of between 3 and 15.

An alternative approach is to still calculate RFM Groups/Scores using quintiles, but by using the Independent RFM Quintile approach, not just the recency but also the frequency and monetary values for each contact are calculated across the whole data file and are not dependent on any of the other values/RFM factors or any other quintile. Another approach is to use user-definable bands for each criterion (i.e. each RFM factor) in order to determine what recency, frequency and monetary value that should be given to each contact. Even-though RFM segmentation can be used on “stand-alone” basis, I always tend to incorporate it with other demographic and affinity variables in order to have more holistic view of the segment's make-up.

I have coined my own approach that I often use which is somewhat different of the classic approach and it goes in following way:\

1.) Create variable Total Spend for for each customer

2.) Create variable Total number of visits for each customer

3.) Divide both variables into 3 equally spaced bins, based on frequency – 1st bin would be lowest 30% of all customers in regard to spending (and visits – separate variable)

4.) Evaluate each customer in terms of in which group he belonged (for that time) in terms of his total spending, and total visits, and label him for that group (Example: variable “FRM_Spend_label” would have values “L”, “M” and “H”. If amount of his total customer spending for 12m is within threshold fits within second bin – give him a value “M” (medium) in variable “FRM_Spend_label”

5.) Do the same thing for visits, creating a new variable “FRM_visit_variable”.

6.) Do slightly different thing for “Recency” – starting from the same endpoint as it has been done for “spending” and visits – go behind only 3 months and not 12. Then, do the following: if customer did purchase in month 1 (the most recent month) give him a value “H”, if the most recent purchase was in month “2” – give him a value “M” and if the most recent purchase was in month “3” – give him value “L”.

Note – it might happen that most of a customers have some sort of purchase in all months in which case it would be advisable to raise threshold above “0”. In other words call the recent purchase only if monthly total is above some specified amount bigger than “0”.

7.) Combine all three FRM dimensions together into single variable where values would be combinations of “H”, “M” and “L”. If value is “HLH” it would mean that customer falls in the top group of customers in terms of their number of visits to the stores, it means that customer wasn’t in the store (with purchase larger than…) for a month and it means that customer falls in the top group of customers in terms of their total monetary value that they bring to the company.

8.) In last step I deploy “19 +1” rule, where i retain top 19 combinations based on its frequencies and all the other combinations I drop into “other” category, so that my FRM variable doesn’t have more than 20 distinct values.

Hope this helps!