How does operations perform data analysis? Share 5 basic steps!

How does operations perform data analysis? Share 5 basic steps!

After looking at a lot of growth cases, and then looking at the work and performance goals at hand, do you still feel like you don’t know how to achieve them? That’s because many cases don’t describe the analytical process that led to the conclusions, but only describe the background and objectives of the problem, as well as the effects after optimization. The actual analysis process is often simply glossed over with the word "discovery".

Of course, some people will say that the data analysis process is a subjective process, and it is impossible to complete the entire analysis according to a unified process, especially in the rapid changes in the Internet field. So is the data analysis process a process that only has scattered techniques and no rules to follow, or is it a process that has clear steps and can be strictly followed? I think it's the latter.

Let us introduce a general data analysis methodology: the five-step method of data analysis.

This framework has the following features:

  1. It is not bound to specific business (the details of individual steps need to be combined with the business), but is based on the information needed for decision-making;
  2. Be open and can integrate personal experience and cutting-edge technology;
  3. It can be combined with big data technology to eliminate manual links and achieve automation;
  4. The logic is clear and easy to learn.

1. Five-step analysis

This simple five-step data analysis method can basically handle at least 80% of common data analysis problems in daily work. The remaining 20% ​​of scenarios can be expanded on this basic analysis methodology, which we will discuss in the following content.

1.1 Five basic steps

First, let's go through the five basic steps:

  1. Summary
  2. Segmentation
  3. evaluate
  4. Attribution
  5. decision making

1.1.1 Summary

In this step, we focus on indicators, which are the common ones such as DNU, DAU, GMV, ROI, etc. Whenever we talk about data analysis, we will be reminded that data analysis must “have clear goals.” Therefore, we do not need to elaborate on its importance.

Goals are of course the most important of all indicators. But goals alone are not enough, we also need other auxiliary indicators. For example, ROI is calculated by combining input and output; and GMV can also be calculated by multiplying the number of users by the average GMV per user. In this way, we split the calculation of a goal into a combination of more related indicators. Moreover, these indicators are more basic, and we can influence the changing trends of these indicators through some operational means.

This part is not difficult to understand. However, we need to find out the calculation relationship between the indicators, and gradually find all the indicators we need to care about. In today's Internet product operations, there is never a shortage of indicators to look at, and there are so many that it's dizzying. But we only need to care about those indicators that are relevant to our goals.

1.1.2 Segmentation

This step is equivalent to adding one or more dimensions to the indicator. The simplest dimension should be time. For example, we look at the changing trend of UV by day; or we look at the GMV brought by different pages, the GMV of different user groups, etc. If we understand the previous indicator as a number, after adding a dimension, it becomes a column of data; after adding two dimensions, it becomes a table, and so on.

Just like the current status of indicators, we can easily find many dimensions that can be used to split indicators. For example, the dates and crowds mentioned above, as well as the source channels for attracting new users, the active traffic comes from the conversion path, etc. By arranging and combining these dimensions, a large number of complex split dimensions can be generated, too many to keep track of.

Therefore, what is important is to distinguish the importance of the dimensions.

How to distinguish them?

We need to prioritize these split dimensions based on whether they are feasible. For example: We mentioned earlier the GMV brought by viewing different pages in the APP. However, if we do not have the necessary technical means or operational tools to allocate more traffic to pages with higher GMV, and cannot reduce the traffic of pages with lower GMV, then this method of page splitting does not leave us any room for operation, let alone optimization space after the operation.

If this is the case, we should consider the source page dimension as just a "nice to have" dimension rather than a key dimension.

Another example is user segmentation, especially when we hope to attract more high-quality new users from external investment to drive growth. At this time, we always want to first create user profiles for existing high-quality users and determine some features that can identify high-quality users, and then use these features to attract higher-quality users when launching.

This theory makes sense, but unfortunately, external investment channels cannot provide very precise population targeting, and can only provide coarse-grained divisions such as demographics and industry preferences. This also includes that we currently believe that the distribution channels are very accurate in marking users.

Therefore, it can be seen that when it comes to attracting new users, our ability to segment users is limited. It is not completely impossible, but it is very limited. The greater potential for user segmentation lies in promoting activity, that is, dividing our own user groups.

For example, it is common in growth cases to place different copy or image materials in the same position on the same page to conduct A/B testing between versions. Then the displayed version is a dimension that can be freely manipulated, because once we find which version is better, we can take action quickly. Therefore, the display version dimension is very suitable for segmenting indicators.

If the indicator part is just for monitoring, some analytical feeling can be reflected in the [segmentation] step. In this step, we need to find those real and actionable splitting dimensions so that our analysis conclusions can be implemented as soon as possible. But this section still leaves open the question that if there are multiple actionable split problems, then there are still differences between them.

For example: We can simply replace graphics and text, but we can also painstakingly iterate a large version. How to reflect and measure the complexity of this operation in the analysis process? This brings us to the issue of evaluation.

1.1.3 Evaluation

In the evaluation part, we need to use the indicator used as the target in the [Summary] step as the only criterion for evaluation. If our goal is simple GMV, or even simpler PV and UV, then after the [Segmentation] step, we can basically start to draw conclusions.

But this is not the case in practice. Our goal may be a compound goal - while increasing GMV, we must also control costs; while increasing PV, we also need to bring in GMV; or it may be directly a compound indicator such as ROI.

At this time, we can no longer just focus on the target indicator, but on the composite indicator. For example: Our goal is to increase GMV while controlling costs. To further simplify the problem, we define the costs as the cost of promoting old users to generate GMV and the cost of acquiring new users to generate GMV. Because usually in operations, the means of attracting new customers and promoting activation are different, which corresponds to the principle of the [segmentation] part, that is, the size of the operating space.

After that, we can subdivide the two indicators of GMV generated and investment cost according to the different dimensions of attracting new customers and promoting activation. For example: In terms of attracting new users, we invest in Baidu keywords, advertising alliances, and cooperate with other APPs to exchange traffic; and in terms of promoting activation, we set up A/B Test on the four banners ABCD on the APP.

As for new users, we can evaluate the three methods of Baidu keywords, advertising alliance and cooperative APP separately, and see how much new GMV can be obtained for every dollar invested. Therefore, we can choose the better method among different ways of attracting new customers and adjust the cost investment more optimally within the existing methods. As for old users, we can also evaluate how much GMV can be generated for each dollar invested in the A/B Test of the four banners ABCD.

In short, in the [Evaluation] step, we need to divide the indicators in the [Summary] part into two categories: the ultimate goal and the means to achieve the goal. For example, in the previous example, the cost of investment is the means to achieve GMV increase. Therefore, for every dollar of cost investment, we need to evaluate it based on the GMV generated. At this time, there are many options to achieve the GMV goal, such as promoting old users:

  • Keep the cost unchanged and replace the images and texts that are more likely to bring conversions to increase the GMV of every dollar invested;
  • Keep the GMV of each dollar unchanged and increase the cost (within limits);

This is still under the premise of temporarily ignoring the value that GMV may bring. If we take this part of the value into consideration, it can offset part of the investment cost, and then there will be more alternative options.

In the previous example, since our splitting dimension itself is relatively simple and only considers the banner in the APP and external methods of attracting new users, it is relatively easy to segment through some markers in the data. But in actual combat, there are some situations that we cannot separate. For example, in user interaction, the path to generate a GMV needs to go through several links, or like the four banners ABCD in the previous example, if the user clicks on two or even three of the banners.

So how do we disassemble it? This problem is the next step [attribution].

1.1.4 Attribution

This step is actually the "last mile" of reaching conclusions and making decisions, which is what we often call the process of analyzing "why".

In the previous steps, it can be clearly seen through the case that we have obtained some quantitative indicators that can be directly compared. In this case, we don’t actually need to do anything special in the [attribution] step, and can directly draw conclusions by comparing numerical values. But what should we do if we encounter multiple links or methods that cannot be clearly subdivided? There are several commonly used attribution ideas in daily data analysis.

For example, in our previous example, the user clicked on four positions ABCD in sequence to generate GMV.

  • First interaction attribution model: This is the first time a user does something, which is usually reflected in the data as the earliest time, the smallest sequence number, etc. In this case, we give A 100%, and B, C, and D 0%. Last interaction attribution model: This is the last time a user did something, which is reflected in the data as the most recent time, the largest sequence number, etc. Then we give D 100%, and A, B and C 0%
  • Linear attribution model: that is, the average score, so we give ABCD 25% respectively.
  • Weighted attribution model: that is, assigning certain weights to multiple contributing factors, for example, A and B are each recorded at 30%, and C and D are each recorded at 20%. Precisely because there is an additional dimension of weight, a certain design is required, and it can also serve as an analytical process. There are several common ways to set weights, such as the first and last two items are the most important and the others decrease in order of importance, or decrease in order of time, etc.

Of course, when choosing an attribution method, we will also consider the characteristics of the specific business to consider the impact of the sequence of behaviors, length of stay, etc. on the analysis goals.

1.1.5 Decision-making

Finally, you can make a decision, but after gradually eliminating uncertainty through the previous steps, decision-making is actually the simplest step - it is just about finding the best performing version, the best performing location, and the best means of attracting new customers.

When we have some new ideas, we can also add them as a version in the A/B Test and add them to this evaluation system for comprehensive evaluation.

1.2 Application Cases

This set of methodology is not only aimed at special analysis in daily work, but also can be found in some solidified methodologies.

Let’s look at a few examples of established methodologies:

1.2.1 A/B Test

The first case we are going to look at is A/B Test. In the A/B Testing process, we first need to determine the purpose of the experiment, that is, which indicator we want to improve and optimize through the experiment. Afterwards, we used the different versions in the experiment as segmentation dimensions and whether the indicators were achieved as the evaluation criteria for evaluation. If you do encounter problems that require attribution during the experiment, you also need to consider how to make the attribution.

Of course, as the complexity of the business continues to grow, the difficulty of A/B Testing is no longer in the process of comparison and conclusion drawing, but in how to design experiments so that more experiments can be conducted in a shorter time, with less user traffic, and effective conclusions can be obtained. This is also the starting point of all platforms in this area - the core content of Google's famous paper "Overlapping Experiment Infrastructure".

1.2.2 User Segmentation

User segmentation is a common operational method, but how to determine the accuracy of segmentation and how to maintain the accuracy in the future is indeed a data analysis problem. In the process of feature-based user segmentation, we must first confirm what kind of user groups we want to obtain.

When we have already performed grouping and want to study the characteristics of this group, we can use TGI (Target Group Index) as the target and use the size of TGI to measure the group's tendency towards various characteristics.

On the other hand, if we want to find users who like funny short videos, and use the act of liking as the definition of "like", we can also use TGI to measure the accuracy of segmentation. In this way, we can group users through various means. Different grouping methods correspond to different TGI values. What we need is the grouping method with the largest TGI value.

1.2.3 Classic management model: BCG matrix

In the classic BCG matrix, an implicit focus is on overall interests, and the means is the optimal allocation of resources - that is, to invest the limited resources of the enterprise in more promising businesses in order to maximize the overall interests at the enterprise level.

In order to conduct an in-depth study on this goal, this indicator was split into two dimensions in the BCG matrix. In the usual drawing method, the horizontal axis represents relative market share and the vertical axis represents market growth rate. Market share and market growth rate are the means to create profits, and profits are naturally the ultimate goal. Therefore, since the benefits brought by the means are different, different businesses have their own "destiny" in the four quadrants.

2. Optimization of methodology

Based on the overall description of the methodology, there are three points where the methodology can be optimized.

(1) Summary

The optimization of the summary part is to find newer and more appropriate auxiliary indicators to calculate the final target indicators. For example, in the financial field, compared with the calculation method based on income and expenditure, the DuPont Analysis method provides a breakdown method based on three aspects: sales rate, capital operation and debt level, which is easier to understand and take action.

(2) Segmentation

When explaining segmentation earlier, the focus was mainly on some objective dimensions. However, as analysis experience accumulates and algorithm capabilities improve, some more subjective segmentation dimensions will gradually be added. For example, user tags based on preferences. These dimensions provide new perspectives, but they also have their own "rules of play".

(3) Attribution

The attribution part provides an artificially defined splitting logic for those splitting logics that cannot be determined objectively. Because of the addition of human operations and the constant changes in objective conditions, optimization space gradually emerges, and the splitting method needs to be continuously optimized to adapt to business development and environmental changes.

Author: Yuhao, authorized to publish by Qinggua Media .

Source: Yuhao

<<:  Competitive product analysis report: Douyu VS Huya

>>:  2021 Douyin does not require fans to open a showcase (how to open a product showcase with 0 fans)

Recommend

Douyin City Store Visiting Number 0-1-10-0 Complete Thoughts Review

Douyin local store exploration account complete i...

Zhihu institutional account operation and promotion practice

When companies are flocking to popular platforms ...

100 event planning tools, how many do you know?

When you first start planning, your leader assign...

Wang Tong: "The Tipping Point for Increasing Followers through Short Videos"

Application Introduction The first course of Wang...

What are the essential tools for new media operators? (Recommended collection)

The purpose of using tools is to improve efficien...

Guide to submitting new apps to the App Store for review in Apple iOS 9

In the tenth lecture, we mentioned the details th...

3 drainage principles to help you attract new customers efficiently

User growth is almost the ultimate proposition of...