How to predict user churn rate and make strategies in advance?

The user churn rate directly reflects the market acceptance of the product and the quality of its operations . The goal of predicting churn rate is to predict at what point in time users will leave, so as to prepare strategies in advance to retain them.

What is churn rate? Why do we need to care about user churn?

Simply put: User churn rate refers to the ratio of the number of user churn to the total number of users who use/consume the product (or service). It is a quantitative expression of user churn and the main indicator for judging user churn. It directly reflects the market acceptance of the product and the quality of the operation.

Generally speaking, this indicator is mostly used in the case of "subscription products", such as the information subscription app "Hammer Reading", the vast majority of online SaaS products, and even traditional milk subscriptions.

Since retaining current users is more cost-effective than acquiring new ones, the goal of predicting churn is to:

Predict at what point in time users will leave (before the end of the subscription period), exert influence on these users at the right time to retain them, such as through SMS, email or APP, using super low-priced products to attract return visits or exclusive coupons, etc. These strategies are very effective for some lost users!

Next, I will use simple statistical knowledge to introduce a user churn prediction model based on user inactivity records.

This model can give an easily understandable user churn prediction without using a machine learning algorithm so that we have a fairly accurate insight into users who are about to leave.

Without further ado, let’s get to the point~

1. Operational Definition of User Activity

Before we can start predicting user churn, we need to record the historical activity of users. The purpose of doing this is to understand whether users are using our products or services.

So, the question is, what is the operational definition of user "activity" (i.e., the method of defining the meaning of variables based on observable, measurable, and actionable characteristics)?

In fact, the definition of "user activity" depends on your business background and is closely related to the product or service scenario. Different types of products have different definitions of "user activity". Taking Sina Micro Public Opinion's "Information Monitoring" as an example, it is a subscription-based big data product. After users retrieve information through a combination of various keywords , they choose to subscribe via email or client and receive subscription information according to a customized receiving frequency.

For this data product, user activity can be defined as follows: If a user is active, then within a specified time period (the analysis unit depends on the analyst, which can be a day, week, month, quarter or year), it should include the following payment, usage or interaction behaviors:

The user's subscription to "Information Monitoring" has not expired;
The user logs in to the product page on the web or mobile terminal;
The user uses part or all of the product's functions, such as targeted monitoring based on information sources or regions;
The user has made some consumption during this period, such as text data downloads and subscription renewals;
The user had various feedback on the product during this period, including complaints.
…

For this product, it makes sense to analyze user behavior in months because the shortest subscription period is one month and the longest subscription period is one year.

Once we have a clear definition of “user activity”, we can use these operational definitions to encode the user (in)activity for each month, using binary values (0,1) — if the user was active in month X, we set their activity value to 1, otherwise it is set to 0.

2. Create an “Inactive User Profile”

Now, for each user, we have an "active mark" in months. Next, we will build a "user inactivity file" based on this. This means that for each user, I want to count the number of consecutive months they have been inactive.

Here, the author chose a one-year "analysis window" (that is, 12 months as the analysis time range), and presented the "active files" and "inactive files" in the form of a table - the blue table shows the active records of each user in each month, and the green table shows the user's inactive records.

Based on the possible active situations of users during this time period, the author enumerates three typical users, as shown in the following table:

1. User A:

The user was active at the beginning of the "analysis window", but became inactive in May (that is, May was the first inactive month). Next, the user’s inactivity continued until December, which is until the end of the “analysis window”.

Therefore, from May to December, the "User Inactivity File" counts the months in which users are inactive month by month.

2. User B:

Like user A, this user was also active at the beginning. The difference is that the user was inactive from March to June, was active for only one month in July, then became inactive again in August and September, and finally returned to active status in October, November and December of the "analysis window".

In this case, every time a user returns to active status from inactive status, the previous inactive month count needs to be reset. That is to say, when we count the consecutive inactive months of the user again, we need to start counting from 1 again, and the counts of the previous inactive months will no longer be accumulated.

3. User C:

Different from the two types of users mentioned above, this user is inactive when he first enters the "analysis window". This situation may occur because the user's subscription has long expired (it is best to rule out this situation before formal analysis, because it is difficult to handle), or the user was inactive before the start of the "analysis window".

Because we cannot see the user activity before the "analysis window", we have no idea about the user's activity status before that. Given this situation, we give these months a special mark—using -1 to mark the first few months of inactivity for user C. The user's other inactive situations can be counted in the same way as the above two types of users.

Note: The green table at the back, which is the "User Inactivity File", is the data we need to pay attention to when building the user churn model.

3. Build a user churn model

With the above operational definition of user inactivity, we can count the number of users with consecutive inactive months from 0 to 12 within the "analysis window" (January to December) in units of months.

This step can be achieved with a pivot table - by aggregating the number of users per month and per inactivity level. As shown in the following table:

In the table above, the value of each cell represents the number of users who have been inactive for X consecutive months in each month, from the column direction.

For example: the first highlighted value (574) in the table above represents the number of users who have been inactive for one month in January. This value comes from the 4,815 active users in the previous 12 months. The second highlighted value (425) represents the number of users who have been inactive for two consecutive months in February - 425 comes from 574 (the number of users who were inactive for one month in January, which is the base number of users who were inactive for two months in February). It is worth noting that the number of consecutive inactive months (0) in the first row actually represents the number of active users in the base.

Using this data, we can calculate the percentage of users who have consecutive inactive months each month within the "analysis window". As shown in the green table below:

In the above table, the highlighted value (74%) indicates the percentage of users who have been inactive for two consecutive months in February.

The percentage is calculated as follows:

I wanted to get the most representative values, so I took the average of the last 4 months of the analysis window (September, October, November and December). We may not have enough data to calculate these averages (e.g. October, November and December) - in this case, we take the average of all available values (the range of values used to calculate the average is outlined in red):

4. Calculate the probability of user churn

Haha, if you are still reading this article, then congratulations! We are about to get into the most exciting part. In this section, we will use a little knowledge of statistics.

Let’s revisit the ultimate goal of this article — calculating the probability of user churn for each consecutive number of inactive months (0-12).

That is, if a user has been inactive for X consecutive months, what is the likelihood that this user will churn next?

Mathematically speaking, we can use the powerful Bayesian formula to calculate user churn rate. Although Bayes' formula is a mathematical formula, its principle can be understood without numbers. If you see someone always doing good things, that person is most likely a good person. That is to say, when you cannot accurately know the essence of a thing, you can judge the probability of its essential attributes based on the number of events related to the specific essence of the thing.

To express it in mathematical language, the more events that support a certain property occur, the greater the possibility that the property is true.

Its mathematical form is as follows:

Here, both A and B represent events, and P(B)≠0. P(A) and P(B) represent the prior probabilities or marginal probabilities of A and B respectively. It is called "prior" because it does not take into account any factors related to A (B). P(A|B) is the conditional probability of A given that B has occurred, and is also called the posterior probability of A because it is derived from the value of B. P(B|A) is the conditional probability of B given that A has occurred, and is also called the posterior probability of B because it is derived from the value of A.

In this case, the corresponding formula is as follows:

However, there is an item in the above formula that is meaningless - P(inactive for X consecutive months | churn), which means "the probability of being inactive for X consecutive months given that the user has already churned." Just think about it, if you have already lost users, you cannot be in an inactive state, so this probability value has no business significance.

In view of this situation, the author decisively abandoned this item (remember!). From this, we get an ultimate version of the churn rate calculation formula:

Next, let’s look at the two terms on the right side of the formula (the numerator and denominator), and then calculate their values for each inactive month to get the user churn probability value we want (note that it is a conditional probability value, that is, the user churn probability under the condition of continuous inactivity for X months).

Let’s talk about the denominator first. P (inactive for X consecutive months) is the value I calculated before - the average percentage of users in the last 4 months:

P(1) = 19%
P(2) = 81%
P(3) = 89%
P(4) = 92%
P(5) = 93%
P(6) = 95%
P(7) = 96%
P(8) = 97%
…

Next, let’s solve the numerator P(loss) through an example.

First of all, what is the churn probability P(C1) of a user who has been inactive for 1 month? For these users who are about to churn, the number of consecutive months they will be inactive is already within the set we are considering. In other words, the number of months these users will be inactive is 1 month, 2 months, 3 months, ... Therefore, we define the churn probability P(C1) of a user who has been inactive for 1 month as follows:

Now, in the same way, what is the P(churn) of users who have been inactive for 2 months, that is, P(C2)? For these users who are about to churn, they will remain inactive for 2 months, 3 months, 4 months,..., 12 months. Therefore, we define the churn probability P(C2) of a user who has been inactive for 2 consecutive months as follows:

By induction and deduction, we calculate the probability of churn for each inactive month in the same way:

Here, n is the limit of the number of consecutive inactive months, and we find that this probability is stable. From the table above, we can see that this happens in the 7th consecutive month, and the probability value here remains at 95~96%.

For simplicity, we assume that inactivity in consecutive months are independent events, in which case:

P(A ∩ B ) = P(A)* P(B)

Therefore, we can use the following formula:

Now that we have calculated the numerator and denominator for each inactive month probability, we can start the last step - calculating the user churn probability for each number of consecutive inactive months. As we have discussed previously, the value of n is 7.

The final calculation results are shown in the following table:

Note that the churn rate for active users (i.e., those inactive for 0 consecutive months in the first row) is calculated as P(1) X P(2) X P(3) X P(4) X … X P(7). We are not dividing by anything here because P(0 consecutive months of inactivity) is 1 when the user is active.

Finally, we can also use a churn rate curve to intuitively reflect the changes in the churn rate, thereby determining the best time to retain inactive users. The curve is shown in the figure below:

V. Conclusion

In this article, the author does not provide the specific implementation details of the batch use of this model. If you understand the logic of building this model, you can use SQL, Python, or even Excel to implement it.

Furthermore, in practice, this model is best run with different user groups. In this article, the author only runs on a certain type of user, however, dividing the user groups according to different criteria will be more meaningful for actual business. For example, you can segment users based on their value and then make churn predictions for each user subgroup.

Of course, the author only conducts user churn analysis on a monthly scale. However, for many business scenarios, a more fine-grained analysis perspective may be more meaningful, such as by week or day.

The author of this article is @Scottish Fold Ear Cat. It is compiled and published by (Qinggua Media). Please indicate the author information and source when reprinting!

Product promotion services: APP promotion services, advertising platform, Longyou Games

<<: Can you gain 1 million followers in a month by reading comments on Douyin? New wealth code

>>: Review of 6 ways to play "crowdfunding fission": two keys to ensure you achieve explosive growth

Where have all the stars gone? The answer is surprising...

What time does the 2022 Winter Olympics Commendation Ceremony start? Where can I watch the Winter Olympics Commendation Ceremony live? Live broadcast address is attached!

The 2022 Winter Olympics and Paralympics have come...

WeChat registration covers 30 provinces and cities across the country, 640 hospitals without queuing

[[283645]] What do you fear the most when going t...

How to leverage marketing?

The balance between the brand recognition through...

This APP can automatically convert the operation video into an AR tutorial, teaching you how to repair furniture step by step

[[203491]] There are often such plots in novels o...

How to predict user churn rate and make strategies in advance?

1. Operational Definition of User Activity

2. Create an “Inactive User Profile”

1. User A:

2. User B:

3. User C:

3. Build a user churn model

4. Calculate the probability of user churn

V. Conclusion

Where have all the stars gone? The answer is surprising...

7 effective speech lessons to improve your speaking ability, 10 minutes a day to quickly improve your eloquence

Who is the next BAT?

I saw him build a tall building, and I saw it collapse. Are you familiar with these 10 high-diving cars?

At what temperature does ice start to melt? I guess you must have answered incorrectly.

Moonlight: Let your phone play PC games

15 messaging platforms worth trying in 2019

What is this spider-like fruit that's suddenly become popular? Is it delicious?

Can’t come up with the copy for information flow advertising? There is a template here, just use it!

What should you pay attention to if you want to buy a "sufficient" mobile phone in 2017?

Recommend

How do technical leaders transform themselves as they advance in entrepreneurship?

How dazzling is the sun captured by Xihe?

To start running a WeChat store from scratch, just read this article

One article will introduce you to mainstream information flow advertising products!

Interview with Touch Chen Haozhi: Free service is the value of Internet companies

React mobile enterprise data project practice

If I give you 10 million, do you know how to spend it? Mainstream channel promotion and delivery combination strategy

Zhou Lvwen-Learn mathematical modeling and MATLAB programming in 7 days

What time does the 2022 Winter Olympics Commendation Ceremony start? Where can I watch the Winter Olympics Commendation Ceremony live? Live broadcast address is attached!

WeChat registration covers 30 provinces and cities across the country, 640 hospitals without queuing

How to leverage marketing?

This APP can automatically convert the operation video into an AR tutorial, teaching you how to repair furniture step by step

What challenges do astronauts need to overcome to return to Earth?

4000 words to analyze Perfect Diary's social media tactics

CES2015: Razer Forge TV Gaming Set-Top Box