Build a user rating system from 0 to 1

Build a user rating system from 0 to 1

Huahua is a product operator at an e-commerce company. If a new product is launched, his usual approach is to organize activities, ride on hot topics, do marketing, etc. However, these practices attracted a large number of freeloaders, but the number of real customers obtained was very small.

When Huahua was having a headache about this, his senior Doudou from the same group gave him a suggestion: using AHP and RFM to build a user rating system and fine-tune the operation could bring good results. Huahua was so happy that he quickly used Baidu to search, what exactly are AHP and RFM? How to use it? Next, the author will talk to you in detail.

1. AHP weight setting

1. What is AHP?

The Analytic Hierarchy Process (AHP) was proposed by American operations researcher Thomas L. Saaty in the mid-1970s.

AHP refers to the decomposition of decision-related elements into levels such as goals, criteria, and methods. It is mainly used for quantitative analysis and decision-making of qualitative problems.

For example, an e-commerce platform creates a comprehensive scoring model for users based on user behavior data to identify loyal users, active users, silent users, etc., and then conducts refined operations for each type of user.

2. Basic principles of AHP

The idea of ​​AHP is to closely link with the subjective judgment and reasoning of decision makers, that is, to quantify the reasoning or judgment process of decision makers, so as to avoid logical reasoning errors when the structure is complex or there are many options.

The specific steps are as follows:

1) Establish a scoring system

Build a user value scoring system and carry out refined operations for various types of users.

Set a goal and list all the factors that affect it. By using methods such as expert scoring and user questionnaires, all influencing factors are listed one by one, such as activity, loyalty, purchasing power, etc.

2) Constructing hierarchical structure and judgment matrix

List the indicators or plans that influence the factors.

The indicators that influence user activity include page views, duration of stay, number of products viewed, and number of orders placed.

The indicators that influence user loyalty include the most recent visit time, visit frequency, and the number of active evaluations.

The indicators that influence user purchasing power include the highest single transaction amount, average order amount, and number of purchases.

3) Calculate the weight coefficient

Calculate the indicator weights of each indicator layer and criterion layer respectively, and then calculate the decision formula (as shown below).

4) Consistency check

If the consistency index CR<0.1, proceed to the next stage; otherwise, reassign the weights of each index (i.e., rebuild the judgment matrix).

5) Hierarchical sorting

Hierarchical sorting is divided into hierarchical single sorting and hierarchical total sorting. The so-called single hierarchical ranking refers to the ranking of the importance of factors at this level with respect to a factor at the previous level; the so-called total hierarchical ranking refers to the process of determining the ranking weights of the relative importance of the total goal of all factors at a certain level.

The hierarchical sorting is done from the highest level to the lowest level. For the highest level, the result of single sorting of its level is also the result of total sorting.

3. Determine the weights

1) Construct a judgment matrix

When determining the weights between factors at each level, if the results are only qualitative, they are usually not easily accepted by others; therefore, Saaty proposed the consistency matrix method, that is, comparing two factors with each other and using a scale to minimize the difficulty of comparing different factors with each other in order to improve accuracy.

Expert scoring was used to compare all factors pairwise to determine the appropriate scale. After establishing the hierarchical structure, compare the weights of the factors and subordinate indicators to achieve qualitative to quantitative transformation; for example, use the 1-9 point scaling method to construct the scoring matrix A of the decision-making layer, as shown in the figure below.

In fact, the above scoring matrix is ​​the judgment matrix in the hierarchical analysis method.

2) Consistency test

The purpose of consistency check is to check the coordination between the importance of each element to avoid the contradictory situation that A is more important than B, B is more important than C, and C is more important than A.

Related theories:

Consistency Matrix:

Determine whether the matrix is ​​a consistency matrix:

In the construction of the judgment matrix, it is not required to be consistent, which is determined by the complexity of objective things and the diversity of human cognition; but the judgment matrix is ​​the basis for calculating the sorting weight vector, so the judgment matrix should satisfy general consistency.

Check the consistency of the judgment matrix:

First solve the eigenvector using the manual calculation method - the sum product method:

Compute the eigenvalues ​​of matrix A by hand:

Find the eigenvectors:

Find the largest eigenvalue:

The manual solution has low accuracy and only obtains an approximate value of the maximum eigenvalue.

Consistency Check

3) Calculate the indicator layer weight

Calculate the weight of activity:

Therefore, the weights of the relative activity of the criterion layer are:

  • The weight of the number of page views: b1=0.63231
  • The weight of the length of stay: b2=0.21452
  • Weight of the number of times a product is viewed: b3=0.10961
  • Weight of number of orders: b4=0.04357

Calculate the weight of loyalty:

Therefore, the relative loyalty weights of the criteria layers are:

  • Weight of the most recent access time: c1=0.61935
  • The weight of access frequency: c2=0.28423
  • Weight of active evaluation times: c3=0.09642

Calculate the purchasing power weight:

Therefore, the relative purchasing power weights of the criteria layers are:

  • The weight of the highest single amount: d1=0.70706
  • Weight of average order amount: d2=0.20141
  • Weight of number of purchases: d3=0.09153

List all weights:

What should I do if the consistency check fails?

When the author actually constructed the scoring matrix, the consistency check failed several times (such as CR>=0.1). This may be due to some subjective factors or unreasonable model construction. Therefore, experts are needed to rebuild the scoring matrix and even the hierarchical analysis model.

Building a model affects:

This will have an impact on whether the factors are reasonable, whether the meaning is clear, and whether there is overlap between elements. It is recommended that there should be no more than 7 elements in each layer; if the strengths of the elements vary greatly, try not to put them in the same layer.

Impact of calculation accuracy:

Different eigenvalue solution methods (such as the sum-product method, square root method, etc.), errors in Excel calculation values, errors in calculation tools, etc. may all lead to some deviations in consistency verification results. You can use more accurate calculation tools such as Matlab, as shown in the following figure.

4) Conclusion

Using the AHP model, we can get the following formula:

Activity = b1*number of pages viewed + b2*length of stay + b3*number of products viewed + b4*number of orders placed;

Loyalty = c1*last visit time + c2*visit frequency + c3*number of active evaluations;

Purchasing power = d1*highest single transaction amount + d2*average order amount + d3*number of purchases;

User value score = 0.64339*activity+0.28284*loyalty+0.07377*purchasing power.

The AHP method can construct a model using less quantitative data, and the final conclusion can only indicate the importance of the factors, but cannot determine the user value rating.

Therefore, the RFM model and the AHP model are combined to calculate the scores of each factor and obtain the score of each user.

2. RFM Calculation Score

1. What is RFM?

The RFM model is an important tool and means to measure customer value and customer profitability.

The model divides customers into multiple categories based on three indicators: a customer's recent purchase behavior (Recency), overall purchase frequency (Frequency), and consumption amount (Monetary). Finally, the overall distribution of customers is evaluated based on the proportion of different types of customers (as shown in the figure below), and targeted marketing is carried out for different types of customers.

In an RFM user stratification model, how many points does an important development customer have? How many points does an average valuable customer have? The author will use the transaction data of an e-commerce company for a total of 5 months from November 1, 2018 to April 30, 2019. In order to protect privacy, the data has been anonymized.

2. Steps to build the RFM model

1) Acquiring and cleaning data

The RFM model is mainly used to analyze user purchasing behavior. The data usually obtained includes payment time, actual payment amount, order status and other information. Some of the data is shown in the figure below.

After obtaining the data, there may be null values, outliers, etc. This kind of dirty data cannot be analyzed and needs to be removed through simple data cleaning. There are two ways of data cleaning: outlier processing, such as deletion and mean compensation, and outlier identification, such as searching by business rules and semantic conflicts.

For example, after obtaining the transaction data, the author found that the "shipping time" was empty, which was dirty data and needed to be eliminated; the corresponding value of "order status" was "after payment, the user's refund was successful and the transaction was automatically closed". The refund user data should not be included in the model and needs to be removed.

After cleaning, filter the "Shipping Time" and "Order Status" respectively. At this time, it is found that the "Shipping Time" is empty or the order status is "After payment, the user successfully refunds and the transaction is automatically closed". This type of data no longer exists, indicating that it has been filtered out.

2) Build the model

Next, the author needs to extract the values ​​of R, F, and M: R (the number of days since the last purchase), F (the number of purchases), and M (the average purchase amount).

Build a pivot table, drag "Buyer Nickname" to the row position and value position respectively, count and summarize the "Buyer Nickname", that is, get the number of purchases by the buyer, that is, the F value; drag "Payment Time" to the value position and set it to the maximum value, drag "Actual Payment Amount" to the value position and set it to the average value, that is, the M value, as shown in the figure below.

Copy the initially pivoted data to a new table (selectively paste "values ​​and number formats"); then process the value of R. Since the order deadline is April 30, 2019, the author sets the modeling time to May 1, 2019, and calculates the number of days from the customer's last payment on May 1, that is, the R value of each customer, as shown in the figure below.

Using the RFM calculation method, all factors (R, F, M) are mapped to a 0-5 score range.

Or use the following formula to normalize (as shown below). Use the first formula for positive correlation and the second formula for negative correlation. R belongs to negative correlation because the smaller the time interval between the last purchase, the more important it is. Both F and M are positively correlated.

Normalized calculations can also use (X-Xmin)/mean(X) and (Xmax-X)/mean(X). It should be noted that if the real data is not evenly distributed, the mean may be biased. For example, if some people spend 1 million yuan and some people spend 1,000 yuan, the deviation of the mean will be large. Therefore, normalization can be performed using tertiles, median or (Xmax-Xmin).

Due to the limited data fields obtained, it is impossible to obtain the weights of the criterion layer through the indicator layer, so the weights of activity, loyalty and purchasing power calculated directly by AHP are 0.64339, 0.28284 and 0.07377 respectively.

The standardized data and user value with certain weights are obtained, as shown in the following figure:

R, F, M, and user value are divided into 0 and 1. If they are greater than the mean, the value is 1, otherwise it is 0, resulting in 16 user types, as shown in the figure below.

Substituting user types into the data, some of the results are shown in the following figure.

3. Model Visualization

1) Analyze the proportion of each type of customers

Pivot the RFM model table that has just been completed, drag "Customer Type" to the row area, and then drag "Customer Type" to the value area twice, the first time is for counting, and the second time is to view the customer ratio, as shown in the figure below.

Draw a picture to more clearly see the proportion of users of different customer types, as shown below.

2) Analyze the proportion of customer amounts

Pivot the RFM model table, drag "Customer Type" to the row area, and then drag "Cumulative Amount" to the value area twice. The first time is to calculate the cumulative consumption amount of each type of customer, and the second time is to view the proportion of the amount of each type of customer, as shown in the figure below.

Draw a graph to more clearly view the proportion of amounts of different customer types, as shown below.

3. Summary and Suggestions

1) From the proportion of each type of customer, we can see that the number of general retention customers (0000) is the highest, reaching 8725 people, accounting for 34.52% of the total number; this type of customer has not made any purchases recently, the purchase frequency is lower than the average, the average order amount is relatively low, and the user value is also low. They placed orders around Double 11 in 2018 and are price-sensitive customers, so you can try to wake them up during promotional activities (such as National Day, Children's Day, etc.).

2) Secondary important retained customers (0010) are customers who have not purchased goods recently, have a low consumption frequency, and a large consumption amount. There are 6,905 of them, accounting for 27.16% of the total number, and the payment amount accounts for the highest proportion. In other words, for the group of customers with the highest sales contribution rate of the merchant, they place orders far away, have a low purchase frequency, and are on the verge of loss. However, unlike the secondary general retained customers, this type of customers has a higher average sales volume.

For this type of customers, operations staff need to obtain their contact information, conduct return visits, and ask the customers why they are dormant; or the products themselves are products with low repurchase rates and high consumption amount ratios; or start with the product itself and try to compare the customer's purchase time with the product's repurchase date to see if the last purchased product has not been used up.

3) Important development customers (1011): There are 2614 customers with recent purchases, low purchase frequency, large consumption amounts, and high user value, accounting for 20.28% of the total number of customers, and their payment amounts are relatively high. These customers are generally new customers.

For this type of customers, operations staff will recently push text messages, distribute coupons, etc. appropriately to increase their purchase frequency, strive to improve the loyalty of this type of users, and ultimately convert them into important and valuable customers.

Author: A data person’s private land

Source: A data person’s private land

<<:  How to make users like your marketing ads?

>>:  Yiren's Way to Make Money - Small Secret Circle Issue 6

Recommend

Introduction to the advantages of 360 Shangyi advertising and promotion!

360 Business Shangyi: Data mining and analysis ba...

Summary of overseas social media classification and operation methods

If a brand wants to build its reputation overseas...

How to write a new media operation and promotion plan?

Today's topic is how to write a promotion pla...

Case! 3 major steps for APP to acquire new users

A store without customers will close, and a produ...

How to choose the right package when applying for 400 phone number?

The application for a 400 telephone number is inc...

ENJOYCG-C4D realistic food series full process teaching

Course Catalog ├──Section 1· 01 Baozi modeling pr...

[Exclusive] DY Empowerment System Star Course

[Exclusive] DY Empowerment System Star Course Res...

How to classify users and design refined operation strategies?

As operations enter a refined stage, how should u...