Introduction: In the growth analysis of a product, we want to focus on a group of users who meet certain conditions. We not only want to know the overall behavior of these people (number of visits, visit duration, etc.), but also want to know the sub-groups with larger differences among them. The user segmentation method can help us conduct in-depth analysis on groups with significant differences, so as to explore the reasons behind the indicator numbers and explore ways to achieve user growth . 1. Application Scenarios of User SegmentationIn our daily data work, we often receive such requests: we want to focus on a group of users who meet certain conditions. We not only want to know the overall behavior of these people (number of visits, visit duration, etc.), but also want to know who specifically meets these conditions. Then check the data of these people, export the user list, and send targeted tips messages. Sometimes you may want to further check the specific operation behaviors of certain people when using a certain function. User segmentation is a tool and method used to meet such needs. It can help us conduct in-depth analysis on groups with large differences, so as to explore the reasons behind the indicator numbers and explore ways to achieve user growth. For example, user portrait segmentation, the core value lies in the refined positioning of population characteristics and the exploration of potential user groups. Enable websites, advertisers, enterprises and advertising companies to fully understand the differentiated characteristics of user groups, help customers find marketing opportunities and operational directions based on the differentiated characteristics of the groups, and comprehensively improve customers' core influence. 2. User GroupingFigure 1: Five types of user segments Type 1: No grouping, such as targeting all active users, sending group text messages, etc. The disadvantage is that it is not targeted and can easily cause user disgust. Type 2: Grouping based on user basic information, such as grouping based on user registration information. Compared with not segmenting the users, this method has a certain degree of targeting, but because it does not really understand the users, it cannot produce good expected results. Type three: User portrait grouping, such as age, gender, region, user preferences, etc. The focus of portrait construction is to "label" the user group. A label is usually a highly refined feature identifier defined by humans. Finally, the labels of user groups are combined to outline a three-dimensional "portrait" of the user group. Portrait segmentation allows us to truly understand certain characteristics of users, which is very helpful for business promotion . Type 4: Grouping based on user behavior . In this stage, we will focus on user behavior characteristics based on portrait grouping. For example, we will formulate different marketing promotion strategies based on the user's registration channel and active habits. Type 5: Clustering and predictive modeling . Clustering modeling can divide users into different groups based on their comprehensive characteristic indicators, such as entertainment, idle, social , office, etc.; predictive modeling is to try to guess the user's next attitude and behavior (for example, what they want to know and what they want to do). Because of this, it is very helpful in turning complex behavioral processes into marketing automation. 3. Common user segmentation dimensions 1. Statistical indicators: age, gender, region 4. Introduction to commonly used clustering methodsThe above introduces some methods and ideas about clustering. Next, we will focus on user clustering. Clustering can be divided into hierarchical clustering (merging method, decomposition method, tree diagram) and non-hierarchical clustering (partition clustering, spectral clustering, etc.). The more commonly used Internet user clustering methods are K-means clustering method and two-step clustering method (both are partition clustering). Characteristics of Cluster Analysis:
Weaknesses of cluster analysis:
Application process of cluster analysis: (1) Select clustering variables When selecting features, we will try our best to select variables that have an impact on product usage behavior based on certain assumptions. These variables generally include user attitudes, opinions, and behaviors that are closely related to the product. However, the cluster analysis process has certain requirements for the variables used for clustering: 1. The values of these variables in different research objects have obvious differences; 2. There cannot be a high correlation between these variables. First, the more variables used for clustering, the better. Variables without obvious differences have no real significance for clustering and may bias the results. Second, highly correlated variables are equivalent to weighting these variables, which is equivalent to amplifying the effect of certain factors on user classification. Methods for identifying appropriate clustering variables: 1. Perform cluster analysis on the variables and select a representative variable from the clustered categories; 2. Perform principal component analysis or factor analysis to generate new variables as clustering variables. (2) Cluster analysis Compared with the preparation work before clustering, the actual execution process is extremely simple. After the data is prepared, import it into the statistical tool and run it, and the results will come out. One of the problems encountered here is how many categories should users be divided into? Usually, a comprehensive judgment can be made by combining several criteria: 1. Look at the inflection point (hierarchical clustering will produce a clustering coefficient graph, and generally select several categories near the inflection point); 2. Judge based on experience or product characteristics (user differences between different products are also different); 3. Be able to explain clearly logically. Figure 2: Aggregation coefficient graph (3) Find out the important characteristics of each type of user After determining a classification scheme, we need to go back and observe the performance of each category of users on each variable. Based on the results of the difference test, we use colors to distinguish the levels of different types of users on this indicator. The same goes for other variables. Finally, we will discover important characteristics that distinguish different categories of users from other categories of users. (4) Cluster interpretation and naming When understanding and interpreting user segments, it is best to incorporate more data, such as demographic data, feature preference data, and so on. Then, select the most obvious features of each category and name it, and you're done. 5. Application Case of K-means Clustering in User SegmentationIn this case, we first look at the most commonly used K-Means clustering method (also called fast clustering method), which is the most commonly used non-hierarchical clustering method. Due to its simple and intuitive calculation method and relatively fast speed (relative to hierarchical clustering method), K-Means is often the first algorithm used when conducting exploratory analysis. Moreover, due to its widespread adoption, it also saves a lot of time cost for explanation during collaborative communication. 1. K-means algorithm principle:
Assume that the set of original data we extracted is (X1, X2, …, Xn), and each Xi is a d-dimensional vector. The purpose of K-means clustering is to divide the original data into k categories under the condition of a given classification group number k (k ≤ n), S = {S1, S2, …, Sk}. In the numerical model, it is to find the minimum value of the following expression (μi represents the average value of classification Si): 2. User grouping background and goals: A certain product covers various social groups (different ages, different industries, different interests, etc.), so it is necessary to segment the general user market and then carry out targeted operational activities. 3. Clustering variable selection: User portrait features, user status features, user activity features 4. Cluster analysis and results: Through correlation analysis and variable importance analysis, some variables with poor effects were eliminated, and then the remaining 11 variables were trained multiple times (target cluster number, participating variables, tolerance for individual differences within the group), and finally the clustering results were obtained. Figure 3: K-means clustering effect of user grouping 5. Interpretation and naming of results: Cluster 1: Low-end and low-age group Cluster 2: Active student group Cluster 3: High-viscosity group in the workplace Cluster 4: Low-viscosity group in the workplace Cluster 5: High-age and low-activity group Table 2: K-mean clustering results for user grouping 6. Comparison of the effects of two-step clustering and k-means clusteringThe K-Means clustering method mentioned above has the advantages of being simple, intuitive, and fast. However, its disadvantage is that it can only use numerical variables, cannot include categorical variables, and is very sensitive to outliers, which can easily and seriously affect the clustering results. Moreover, when the data set is large (which is common in Tencent), and all data points cannot be loaded into memory, K-Means cannot be run on a single machine. The two-step clustering rule overcomes the above shortcomings. It can include categorical variables and numerical variables, and can run smoothly when the hardware conditions are insufficient or the data set is very large. This two-step clustering method can be seen as a combination of the improved BIRCH clustering algorithm and the hierarchical clustering method. First, the "clustering feature tree" in the BIRCH algorithm is used for pre-clustering to form subclasses, and then the subclasses are used as input for hierarchical clustering. 1. The principle of two-step clustering: Step 1: Pre-clustering process: Construct a cluster feature tree (CFT) and divide it into many subclasses. At the beginning, a certain observation is placed at the root node of the tree, which records the variable information of the observation. Then, based on the specified distance measure as the similarity basis, each subsequent observation is placed in the most similar node according to its similarity to the existing nodes. If a similar node is not found, a new node is formed for it. In this step, outliers will be identified and removed, and will not affect the results as easily as in K-Means. Step 2: Formal clustering: The pre-clustering completed in the first step is taken as input and re-clustered using the hierarchical clustering method (using the log-likelihood function as the distance measure). At each stage, the Schwarz Bayesian Information Criterion (BIC) is used to evaluate whether the existing classification is suitable for the existing data. Finally, a classification scheme that meets the criteria is given. 2. Advantages of two-step clustering: 1. Massive data processing; 3. Comparison of the effects of two-step clustering: Perform two-step clustering on the same data in point 6, and the optimal model result is as follows Figure 4: Two-step clustering effect of user grouping 4. Interpretation of two-step clustering results: Cluster 1: Low-end and young groups Cluster 2: Students or new entrants to the workplace with high activity Cluster 3: Young people with low activity Cluster 4: Young people hanging up Cluster 5: Workplace office groups Cluster 6: Old people with low activity Table 3: Two-step clustering results of user groups 7. Business Case - Mining Customer Groups with Special Behavior Patterns through K-Means Clustering1. Business requirements In this case, the product manager wants to understand the behavior patterns of inactive logged-in users, and to be able to segment the large user group based on different behavior combinations, so as to focus on the different needs of different groups, and even explore the needs of vertical fields, so as to take measures on the product or operation side to activate silent users and increase DAU. 2. Analyze the goal
3. Analysis process a) Feature extraction The analysis focuses on the user's click behavior. In this example, considering the typicality of user behavior, four full weeks, a total of 28 days of data, were selected, and there were no holidays in the time window. In addition, considering the scenarios where computing performance and exploratory analysis require repeated iterations, only one thousandth of the users are randomly selected from the market as representatives. b) Feature screening In the feature extraction stage, click data of nearly 200 function points were extracted. However, some of these features have very low coverage, with only one percent of users having used them within 28 days. These low-coverage features will be removed first. In addition, as mentioned earlier, highly correlated variables will also interfere with the clustering process. Here, the Pearson correlation coefficient is calculated for all features pairwise. For highly correlated features (correlation coefficient greater than 0.5), only the features with the widest coverage are retained to maximize the reflection of user differences. c) Feature transformation-exploration After the above two steps, the author has conducted many clustering explorations, but without exception, the clustering results all show a super large category with dozens of very small categories (several or dozens of users). Such a result is obviously contrary to our analysis goal. First, the small group discovered here is too small and has no value from a business perspective; second, the super-large categories are basically equivalent to the general market users, and no differences among the users can be found. Why is there such a result? Mainly because the click behavior basically follows the power-law distribution. A large number of users are concentrated in the low-frequency range, while a very small number of users have extremely high frequencies. In this way, in a typical clustering algorithm, high-frequency users will be clustered into small categories with very few people, while a large number of low-frequency users will be clustered into a super large category. Figure 5: Click behavior distribution Figure 6: K-Means clustering diagram of click behavior count For this situation, the typical solution is to take the logarithm of the frequency, transform the power-law distribution into an approximate normal distribution and then perform clustering. In this study, after taking the natural logarithm, the clustering effect was only slightly improved, but it still remained in the situation of one super large category plus several small categories with very small numbers of people. The reason behind this is one of the characteristics of click behavior data: core functions and popular items have a large number of clicks, while relatively unpopular functions have a large number of 0 values. In this case, taking the logarithm is no improvement. Figure 7: Opening times distribution Figure 8: Distribution of opening times (natural logarithmic transformation) Back to the goal of this analysis, we need to "discover the segmented groups whose usage behavior patterns are different from those of typical users in the market." If we discard these unpopular functions and only look at the popular options, we will not be able to find some relatively niche behavior patterns to achieve the analysis goal. This sparse numerical situation reminds me of text classification. In the bag-of-words model of text classification, the word vector of each "document" also contains a large number of zero values. The solution of the bag-of-words model is to weight the word vector using the TF-IDF method. Here is a brief introduction to this method d) Feature transformation - TF-IDF In the bag-of-words model of text classification, it is necessary to group together "documents" (such as a news article, a microblog, or a comment) according to the topics they discuss, and a document contains many terms. TF (Term Frequency) refers to the ratio of the number of times a word appears in a document to the total number of words in the document. This simple calculation can tell you which words are more common in a document without being affected by the length of the document itself. On the other hand, some words are "popular" words that are used in all articles. These words are of little help in distinguishing the topic of the article (such as "report", "reporter", etc. in news). For such "popular" words, their weights need to be lowered, so the goal can be achieved through calculations such as (total number of documents/number of documents containing a certain word). The weight of the words in each article will be 0, and the fewer the documents contained, the larger the value. This calculation is IDF (Inverse Document Frequency). According to the above discussion, readers may have thought that if the concept of "document" is changed to "user", and the "number of word occurrences" is replaced by the "number of function clicks", it can be used to classify the types of user behavior. First, the functional preferences of low-frequency users will be reflected through the calculation of TF, and they will not be lumped into the category of low-frequency users when compared with high-frequency users just because they use them less overall. At the same time, IDF also gives some niche features greater weight, making it easier to highlight niche preferences in clustering. e) Clustering results Through such feature transformation and clustering using the K-Means algorithm, the results are more in line with the analysis objectives. From the market data, we found various groups with distinct behavioral characteristics, and roughly estimated the size, behavioral characteristics and background characteristics of each group. On this basis, we combine user research data to explore suggestions for product improvement. 8. SummaryThe biggest change that user segmentation brings to the field of user data research is to break down data silos and truly understand users. Analyze the characteristics of users behind a certain indicator number (their demographic attributes, behavioral characteristics, etc.), and then discover the reasons behind product problems, and find opportunities or directions for effective product improvement. When performing cluster analysis, the selection and preparation of features are very important: 1. Appropriate variables need to be significantly different in each sample; 2. There should not be a strong correlation between variables, otherwise it is necessary to use PCA and other methods to reduce the dimension first; 3. The data needs to be transformed according to the characteristics of the data itself and the business characteristics (such as standardization, logarithm, etc.); The selection of clustering algorithms requires considering data characteristics (whether there are variables, outliers, data volume, whether they are clustered), computing speed (exploratory analysis often requires faster computing speed), accuracy (whether the communities can be accurately identified), etc. to select the appropriate algorithm. For the parameters in the algorithm, such as the number of categories K in K-Means, it is necessary to combine technical indicators and business background to select a logically reasonable classification scheme. There are many clustering algorithms, each with its own characteristics and strengths. This article only takes two of the more commonly used methods as examples to stimulate discussion and hopefully inspire readers. Source: |
<<: Feishu: How to achieve team efficiency improvement and organizational upgrade
>>: Why is user activation important? Share 3 points!
The average online time of iQiyi users is long an...
The copywriting is obviously very well written, s...
Fitness and muscle-building course, this course is...
This article mainly introduces how many Douyin pl...
Looking back at 2021, there were many impressive ...
“What has changed in our world?” 1. New brands: N...
Recently, many new movies have been released on m...
Recently, Feng Chao from Dongguan received some m...
Caiyou Academy · Douyin 0 basic short video pract...
When it comes to Li Jiaqi and Viya, you might thi...
In 2011, Tiwtter launched a new advertising forma...
Friends who do bidding promotion and information ...
This article introduces 16 optimization strategie...
The market is changing, the industry is changing,...
Since the State Administration of Press, Publicat...