User growth analysis: How to segment users?

Introduction: In the growth analysis of a product, we want to focus on a group of users who meet certain conditions. We not only want to know the overall behavior of these people (number of visits, visit duration, etc.), but also want to know the sub-groups with larger differences among them. The user segmentation method can help us conduct in-depth analysis on groups with significant differences, so as to explore the reasons behind the indicator numbers and explore ways to achieve user growth .

1. Application Scenarios of User Segmentation

In our daily data work, we often receive such requests: we want to focus on a group of users who meet certain conditions. We not only want to know the overall behavior of these people (number of visits, visit duration, etc.), but also want to know who specifically meets these conditions. Then check the data of these people, export the user list, and send targeted tips messages. Sometimes you may want to further check the specific operation behaviors of certain people when using a certain function. User segmentation is a tool and method used to meet such needs. It can help us conduct in-depth analysis on groups with large differences, so as to explore the reasons behind the indicator numbers and explore ways to achieve user growth.

For example, user portrait segmentation, the core value lies in the refined positioning of population characteristics and the exploration of potential user groups. Enable websites, advertisers, enterprises and advertising companies to fully understand the differentiated characteristics of user groups, help customers find marketing opportunities and operational directions based on the differentiated characteristics of the groups, and comprehensively improve customers' core influence.

2. User Grouping

Figure 1: Five types of user segments

Type 1: No grouping, such as targeting all active users, sending group text messages, etc. The disadvantage is that it is not targeted and can easily cause user disgust.

Type 2: Grouping based on user basic information, such as grouping based on user registration information. Compared with not segmenting the users, this method has a certain degree of targeting, but because it does not really understand the users, it cannot produce good expected results.

Type three: User portrait grouping, such as age, gender, region, user preferences, etc. The focus of portrait construction is to "label" the user group. A label is usually a highly refined feature identifier defined by humans. Finally, the labels of user groups are combined to outline a three-dimensional "portrait" of the user group. Portrait segmentation allows us to truly understand certain characteristics of users, which is very helpful for business promotion .

Type 4: Grouping based on user behavior . In this stage, we will focus on user behavior characteristics based on portrait grouping. For example, we will formulate different marketing promotion strategies based on the user's registration channel and active habits.

Type 5: Clustering and predictive modeling . Clustering modeling can divide users into different groups based on their comprehensive characteristic indicators, such as entertainment, idle, social , office, etc.; predictive modeling is to try to guess the user's next attitude and behavior (for example, what they want to know and what they want to do). Because of this, it is very helpful in turning complex behavioral processes into marketing automation.

3. Common user segmentation dimensions

1. Statistical indicators: age, gender, region
2. Payment status: Free, Trial, Paid user
3. Purchase history: non-paying users, one-time paying users, multiple-paying users
4. Access location: The region where the user uses the product
5. Frequency of use: How often users use the product
6. Depth of use: Light, medium, heavy user
7. Ad clicks: users clicked on the ad vs. did not click on the ad

4. Introduction to commonly used clustering methods

The above introduces some methods and ideas about clustering. Next, we will focus on user clustering. Clustering can be divided into hierarchical clustering (merging method, decomposition method, tree diagram) and non-hierarchical clustering (partition clustering, spectral clustering, etc.). The more commonly used Internet user clustering methods are K-means clustering method and two-step clustering method (both are partition clustering).

Characteristics of Cluster Analysis:

Simple and intuitive;
It is mainly used in exploratory research. The results of the analysis can provide multiple possible solutions. The selection of the final solution requires the researcher's subjective judgment and subsequent analysis.
Regardless of whether there are actually different categories in the actual data, cluster analysis can obtain several categories of solutions;
The solution of cluster analysis depends entirely on the clustering variables selected by the researcher. Adding or deleting some variables may have a substantial impact on the final solution.
When using cluster analysis, researchers should pay special attention to various factors that may affect the results.
Outliers and special variables have a greater impact on clustering
When the measurement scales of categorical variables are inconsistent, standardization is required in advance.

Weaknesses of cluster analysis:

Clustering is an unsupervised class analysis method and cannot automatically discover how many classes should be divided into;
It is unrealistic to expect to be able to clearly find roughly equivalent classes or market segments;
Sample clustering, the relationship between variables needs to be determined by the researcher;
It does not automatically give an optimal clustering result.

Application process of cluster analysis:

(1) Select clustering variables

When selecting features, we will try our best to select variables that have an impact on product usage behavior based on certain assumptions. These variables generally include user attitudes, opinions, and behaviors that are closely related to the product. However, the cluster analysis process has certain requirements for the variables used for clustering: 1. The values of these variables in different research objects have obvious differences; 2. There cannot be a high correlation between these variables.

First, the more variables used for clustering, the better. Variables without obvious differences have no real significance for clustering and may bias the results. Second, highly correlated variables are equivalent to weighting these variables, which is equivalent to amplifying the effect of certain factors on user classification. Methods for identifying appropriate clustering variables: 1. Perform cluster analysis on the variables and select a representative variable from the clustered categories; 2. Perform principal component analysis or factor analysis to generate new variables as clustering variables.

(2) Cluster analysis

Compared with the preparation work before clustering, the actual execution process is extremely simple. After the data is prepared, import it into the statistical tool and run it, and the results will come out. One of the problems encountered here is how many categories should users be divided into? Usually, a comprehensive judgment can be made by combining several criteria: 1. Look at the inflection point (hierarchical clustering will produce a clustering coefficient graph, and generally select several categories near the inflection point); 2. Judge based on experience or product characteristics (user differences between different products are also different); 3. Be able to explain clearly logically.

Figure 2: Aggregation coefficient graph

(3) Find out the important characteristics of each type of user

After determining a classification scheme, we need to go back and observe the performance of each category of users on each variable. Based on the results of the difference test, we use colors to distinguish the levels of different types of users on this indicator. The same goes for other variables. Finally, we will discover important characteristics that distinguish different categories of users from other categories of users.

(4) Cluster interpretation and naming

When understanding and interpreting user segments, it is best to incorporate more data, such as demographic data, feature preference data, and so on. Then, select the most obvious features of each category and name it, and you're done.

5. Application Case of K-means Clustering in User Segmentation

In this case, we first look at the most commonly used K-Means clustering method (also called fast clustering method), which is the most commonly used non-hierarchical clustering method. Due to its simple and intuitive calculation method and relatively fast speed (relative to hierarchical clustering method), K-Means is often the first algorithm used when conducting exploratory analysis. Moreover, due to its widespread adoption, it also saves a lot of time cost for explanation during collaborative communication.

1. K-means algorithm principle:

Randomly select k elements as the centers of k clusters.
Calculate the similarity between the remaining elements and the centers of the k clusters, and assign these elements to the clusters with the highest similarity.
According to the clustering results, the centers of the k clusters are recalculated by taking the arithmetic mean of the dimensions of all elements in the cluster.
Recluster all elements according to the new center.
Repeat step 4 until the clustering result no longer changes, and then output the result.

Assume that the set of original data we extracted is (X1, X2, …, Xn), and each Xi is a d-dimensional vector. The purpose of K-means clustering is to divide the original data into k categories under the condition of a given classification group number k (k ≤ n), S = {S1, S2, …, Sk}. In the numerical model, it is to find the minimum value of the following expression (μi represents the average value of classification Si):

2. User grouping background and goals:

A certain product covers various social groups (different ages, different industries, different interests, etc.), so it is necessary to segment the general user market and then carry out targeted operational activities.

3. Clustering variable selection:

User portrait features, user status features, user activity features

4. Cluster analysis and results:

Through correlation analysis and variable importance analysis, some variables with poor effects were eliminated, and then the remaining 11 variables were trained multiple times (target cluster number, participating variables, tolerance for individual differences within the group), and finally the clustering results were obtained.

Figure 3: K-means clustering effect of user grouping

5. Interpretation and naming of results:

Cluster 1: Low-end and low-age group Cluster 2: Active student group Cluster 3: High-viscosity group in the workplace Cluster 4: Low-viscosity group in the workplace Cluster 5: High-age and low-activity group

Table 2: K-mean clustering results for user grouping

6. Comparison of the effects of two-step clustering and k-means clustering

The K-Means clustering method mentioned above has the advantages of being simple, intuitive, and fast. However, its disadvantage is that it can only use numerical variables, cannot include categorical variables, and is very sensitive to outliers, which can easily and seriously affect the clustering results. Moreover, when the data set is large (which is common in Tencent), and all data points cannot be loaded into memory, K-Means cannot be run on a single machine. The two-step clustering rule overcomes the above shortcomings. It can include categorical variables and numerical variables, and can run smoothly when the hardware conditions are insufficient or the data set is very large. This two-step clustering method can be seen as a combination of the improved BIRCH clustering algorithm and the hierarchical clustering method. First, the "clustering feature tree" in the BIRCH algorithm is used for pre-clustering to form subclasses, and then the subclasses are used as input for hierarchical clustering.

1. The principle of two-step clustering:

Step 1: Pre-clustering process:

Construct a cluster feature tree (CFT) and divide it into many subclasses.

At the beginning, a certain observation is placed at the root node of the tree, which records the variable information of the observation. Then, based on the specified distance measure as the similarity basis, each subsequent observation is placed in the most similar node according to its similarity to the existing nodes. If a similar node is not found, a new node is formed for it. In this step, outliers will be identified and removed, and will not affect the results as easily as in K-Means.

Step 2: Formal clustering:

The pre-clustering completed in the first step is taken as input and re-clustered using the hierarchical clustering method (using the log-likelihood function as the distance measure). At each stage, the Schwarz Bayesian Information Criterion (BIC) is used to evaluate whether the existing classification is suitable for the existing data.

Finally, a classification scheme that meets the criteria is given.

2. Advantages of two-step clustering:

1. Massive data processing;
2. Automatically standardize data;
3. Able to handle mixed data of categorical variables and continuous variables;
4. Outliers can be automatically discarded or classified into the nearest class.
5. The number of categories can be determined automatically or manually specified according to business needs;

3. Comparison of the effects of two-step clustering:

Perform two-step clustering on the same data in point 6, and the optimal model result is as follows

Figure 4: Two-step clustering effect of user grouping

4. Interpretation of two-step clustering results:

Cluster 1: Low-end and young groups Cluster 2: Students or new entrants to the workplace with high activity Cluster 3: Young people with low activity Cluster 4: Young people hanging up Cluster 5: Workplace office groups Cluster 6: Old people with low activity

Table 3: Two-step clustering results of user groups

7. Business Case - Mining Customer Groups with Special Behavior Patterns through K-Means Clustering

1. Business requirements

In this case, the product manager wants to understand the behavior patterns of inactive logged-in users, and to be able to segment the large user group based on different behavior combinations, so as to focus on the different needs of different groups, and even explore the needs of vertical fields, so as to take measures on the product or operation side to activate silent users and increase DAU.

2. Analyze the goal

Discover the user groups whose usage patterns are different from those of the typical users in the market
Roughly estimate the number of users in each segment
Understand the behavioral characteristics and user profiles of each segment
Based on the above results, put forward product or operation suggestions or clarify the direction for further exploration in terms of driving sales.

3. Analysis process

a) Feature extraction

The analysis focuses on the user's click behavior. In this example, considering the typicality of user behavior, four full weeks, a total of 28 days of data, were selected, and there were no holidays in the time window. In addition, considering the scenarios where computing performance and exploratory analysis require repeated iterations, only one thousandth of the users are randomly selected from the market as representatives.

b) Feature screening

In the feature extraction stage, click data of nearly 200 function points were extracted. However, some of these features have very low coverage, with only one percent of users having used them within 28 days. These low-coverage features will be removed first.

In addition, as mentioned earlier, highly correlated variables will also interfere with the clustering process. Here, the Pearson correlation coefficient is calculated for all features pairwise. For highly correlated features (correlation coefficient greater than 0.5), only the features with the widest coverage are retained to maximize the reflection of user differences.

c) Feature transformation-exploration

After the above two steps, the author has conducted many clustering explorations, but without exception, the clustering results all show a super large category with dozens of very small categories (several or dozens of users). Such a result is obviously contrary to our analysis goal. First, the small group discovered here is too small and has no value from a business perspective; second, the super-large categories are basically equivalent to the general market users, and no differences among the users can be found.

Why is there such a result? Mainly because the click behavior basically follows the power-law distribution. A large number of users are concentrated in the low-frequency range, while a very small number of users have extremely high frequencies. In this way, in a typical clustering algorithm, high-frequency users will be clustered into small categories with very few people, while a large number of low-frequency users will be clustered into a super large category.

Figure 5: Click behavior distribution

Figure 6: K-Means clustering diagram of click behavior count

For this situation, the typical solution is to take the logarithm of the frequency, transform the power-law distribution into an approximate normal distribution and then perform clustering. In this study, after taking the natural logarithm, the clustering effect was only slightly improved, but it still remained in the situation of one super large category plus several small categories with very small numbers of people. The reason behind this is one of the characteristics of click behavior data: core functions and popular items have a large number of clicks, while relatively unpopular functions have a large number of 0 values. In this case, taking the logarithm is no improvement.

Figure 7: Opening times distribution

Figure 8: Distribution of opening times (natural logarithmic transformation)

Back to the goal of this analysis, we need to "discover the segmented groups whose usage behavior patterns are different from those of typical users in the market." If we discard these unpopular functions and only look at the popular options, we will not be able to find some relatively niche behavior patterns to achieve the analysis goal. This sparse numerical situation reminds me of text classification. In the bag-of-words model of text classification, the word vector of each "document" also contains a large number of zero values. The solution of the bag-of-words model is to weight the word vector using the TF-IDF method. Here is a brief introduction to this method

d) Feature transformation - TF-IDF

In the bag-of-words model of text classification, it is necessary to group together "documents" (such as a news article, a microblog, or a comment) according to the topics they discuss, and a document contains many terms. TF (Term Frequency) refers to the ratio of the number of times a word appears in a document to the total number of words in the document. This simple calculation can tell you which words are more common in a document without being affected by the length of the document itself.

On the other hand, some words are "popular" words that are used in all articles. These words are of little help in distinguishing the topic of the article (such as "report", "reporter", etc. in news). For such "popular" words, their weights need to be lowered, so the goal can be achieved through calculations such as (total number of documents/number of documents containing a certain word). The weight of the words in each article will be 0, and the fewer the documents contained, the larger the value. This calculation is IDF (Inverse Document Frequency).

According to the above discussion, readers may have thought that if the concept of "document" is changed to "user", and the "number of word occurrences" is replaced by the "number of function clicks", it can be used to classify the types of user behavior. First, the functional preferences of low-frequency users will be reflected through the calculation of TF, and they will not be lumped into the category of low-frequency users when compared with high-frequency users just because they use them less overall. At the same time, IDF also gives some niche features greater weight, making it easier to highlight niche preferences in clustering.

e) Clustering results

Through such feature transformation and clustering using the K-Means algorithm, the results are more in line with the analysis objectives. From the market data, we found various groups with distinct behavioral characteristics, and roughly estimated the size, behavioral characteristics and background characteristics of each group. On this basis, we combine user research data to explore suggestions for product improvement.

8. Summary

The biggest change that user segmentation brings to the field of user data research is to break down data silos and truly understand users. Analyze the characteristics of users behind a certain indicator number (their demographic attributes, behavioral characteristics, etc.), and then discover the reasons behind product problems, and find opportunities or directions for effective product improvement.

When performing cluster analysis, the selection and preparation of features are very important: 1. Appropriate variables need to be significantly different in each sample; 2. There should not be a strong correlation between variables, otherwise it is necessary to use PCA and other methods to reduce the dimension first; 3. The data needs to be transformed according to the characteristics of the data itself and the business characteristics (such as standardization, logarithm, etc.);

The selection of clustering algorithms requires considering data characteristics (whether there are variables, outliers, data volume, whether they are clustered), computing speed (exploratory analysis often requires faster computing speed), accuracy (whether the communities can be accurately identified), etc. to select the appropriate algorithm. For the parameters in the algorithm, such as the number of categories K in K-Means, it is necessary to combine technical indicators and business background to select a logically reasonable classification scheme.

There are many clustering algorithms, each with its own characteristics and strengths. This article only takes two of the more commonly used methods as examples to stimulate discussion and hopefully inspire readers.

Author: Tencent QQ Big Data , authorized to publish by Qinggua Media .

Source: Tencent QQ Big Data

<<: Feishu: How to achieve team efficiency improvement and organizational upgrade

>>: Why is user activation important? Share 3 points!

Beef belly hotpot, monkey dung olives... the behavioral patterns of species are more colorful than we think

Recommend

E-commerce operations: How to use coupons?

When it comes to e-commerce platforms, everyone k...

[Scholars from the Saiwai] 20220521 Dragon Control Secrets (II): The Rules and Methods of Changing Dragons + Next Week's Prediction Midday Review

[Scholars from the Saiwai] 20220521 Dragon Contro...

Apple applies for patent for ultrasonic under-screen fingerprint recognition, which is accurate enough to replace Touch ID

[[201617]] According to AppleInsider on August 29...

In the search promotion cooperation of Baidu Alliance, if a user clicks on 1,000 ads, how much money can the website owner earn? Assuming 30 percent,

First of all, you asked the wrong question. There...

Zheng He made seven voyages to the West, with a crew of 30,000 people. Why were there women and midwives accompanying him?

Zheng He was a famous navigator in ancient my cou...

User growth analysis: How to segment users?

1. Application Scenarios of User Segmentation

2. User Grouping

3. Common user segmentation dimensions

4. Introduction to commonly used clustering methods

5. Application Case of K-means Clustering in User Segmentation

6. Comparison of the effects of two-step clustering and k-means clustering

7. Business Case - Mining Customer Groups with Special Behavior Patterns through K-Means Clustering

8. Summary

Beef belly hotpot, monkey dung olives... the behavioral patterns of species are more colorful than we think

How does TikTok achieve user growth through the AARRR model?

Monkeypox virus has spread to many countries. Why is it so scary?

One picture to understand | Don’t pick these “king mushrooms” in Beijing, they are poisonous!

Will the rear fingerprint recognition of iPhone 8 ruin Cook's growth expectations?

Korean media: Chinese manufacturers plagiarize and dilute the influence of Korean products

In 2021, will you choose to use an iPhone or an Android phone?

Product Case: How to use social apps for couples?

24-hour emergency response: Beijing's flu cases increased by 16.69% year-on-year

Wireless is better than wired. Can Sony MDR-EX750BT and NW-A25 produce super CD sound quality?

Recommend

E-commerce operations: How to use coupons?

[Scholars from the Saiwai] 20220521 Dragon Control Secrets (II): The Rules and Methods of Changing Dragons + Next Week's Prediction Midday Review

How much does it cost to customize a photography app in Taizhou?

The most complete guide to live streaming sales: KOL placement strategy!

JATO Dynamics: Global car sales in March 2020 totaled 5.55 million units, down 39% year-on-year

Apple applies for patent for ultrasonic under-screen fingerprint recognition, which is accurate enough to replace Touch ID

About Tint Color Properties

How does Himalaya use ASO to dominate the charts?

3 things that App promotion operators need to know!

Dialogue on Innovation—51CTO’s first developer competition has started!

In the search promotion cooperation of Baidu Alliance, if a user clicks on 1,000 ads, how much money can the website owner earn? Assuming 30 percent,

Zheng He made seven voyages to the West, with a crew of 30,000 people. Why were there women and midwives accompanying him?

Tesla recalls 11,000 Model X SUVs due to unsafe seats

The heat that starts high and ends low! Is there an "absolute zero" between the sun and the earth?

Marketing hot spots in December 2018, save it!