5 Things to Know About Growth AB Testing!

5 Things to Know About Growth AB Testing!

In growth work, AB testing can be said to be a method that is regarded as a golden rule, and it is also a tool that product/operation students can use at will. I have been exposed to AB testing more often at work, but I have also learned more lessons and studied more.

This time, I will share 5 key issues in the actual use of AB testing, and let’s discuss and avoid pitfalls.

1. Sample size estimation

In AB testing, the larger the sample size of the control group and the experimental group and the longer the experimental time, the more accurate the experimental results will be.

This may seem like common sense, but it is actually determined by statistical significance:

Statistical significance refers to the likelihood that the difference between the control and experimental groups is real and not due to random error.

Therefore, AB tests with longer cycles and larger sample sizes are more convincing. However, in actual work, the iteration speed of products or activities is fast, which requires that the experimental cycle of AB testing cannot be too long. Therefore, it is very important to estimate the sample size before AB testing.

The method of calculating sample size is somewhat complicated. For those who have abandoned advanced mathematics and probability theory for many years, it is recommended to seek help from data analysis students.

In this sample size calculator, enter the conversion rate of the original version (known), then enter the conversion rate of the optimized version (expected), and set the statistical significance level. Generally, a significant difference of more than 95% is considered significant. This way you can quickly get a sample size result.

As shown in the figure above, if the conversion rate of the original version is 10%, the expected conversion rate of the optimized new version is 12%. When conducting AB testing, only when the sample data of each group of users reaches more than 2,900 can it be said that the conversion rate of this new version is significantly different and credible.

After calculating the estimated sample size, another important task is to estimate the experimental period.

If a reliable AB test requires 2900 samples per experiment, but our product has only 200 daily active users, and each group has only 100 users after being divided into two groups, then 2900/100=29 days, which means that the AB test experiment will take 29 days to reach the required sample size.

At this time, it is necessary to evaluate whether this cycle is acceptable. If the cycle is too long, it means that it is inappropriate to conduct this AB test at this stage.

2. Test results analysis

The sample size estimation is carried out before AB testing. Because the conversion data of the optimized version is estimated, the sample size and experimental period are estimated data, which helps us make a preliminary judgment on the test sample and period before AB testing.

After the actual AB test experiment is completed, we also need to conduct a statistical significance test on the actual result data to ensure that the data difference between the control group and the experimental group is significant and credible.

In this tool, we input the actual data of groups A and B, and we can clearly see the difference in conversion rates between the two groups, as well as the statistical significance of the test results.

Taking the above figure as an example, although the conversion rate of group B is higher than that of group A, due to the small sample size, the test results did not reach a significant statistical difference, so we cannot conclude that group B optimization is better than group A.

At this time, there are two choices: one is to continue the experiment and analyze it after accumulating more experimental data, and the other is to abandon the experiment and conclude that this optimization has no obvious improvement.

It does not mean that continuing the experiment will definitely lead to significant differences. If the sample size continues to increase, but the conversion rate difference decreases, it means that more samples are needed. This situation often shows that the difference between the two versions is indeed not that big, but it is also necessary to judge whether to stop AB testing based on the actual situation.

3. Reverse Correlation Indicators

When conducting AB testing, there is generally a core indicator to judge the experimental results, and there are also some supporting or auxiliary indicators to better monitor the experiment and analyze the results. But some counter-indicators cannot be ignored.

What is a contrarian indicator? Negative indicators are indicators that may have a negative impact in AB testing experiments.

Let's take a simple example:

In order to increase the new user registration rate, the AB experiment over-packaged the new user benefits in the new version. Although the new user registration rate was increased, due to insufficient user expectation management, new users found that their actual new user benefits were greatly reduced after registration, which caused dissatisfaction with the product and led to a decrease in the new user's first order conversion rate.

The first-order conversion rate of new users is a reverse indicator worthy of attention in this experiment.

In order to increase the speed and effectiveness of experiments, AB testing often focuses on a small number of key process nodes and core indicators, but ignoring reverse indicators runs the risk of losing more than the gain.

4. Simpson’s Paradox

Simpson's paradox means that under certain conditions, two sets of data will satisfy certain properties when discussed separately, but once considered together, they may lead to opposite conclusions. This theory was proposed by British statistician Simpson.

Let’s take a simple example. In the AB test experiment of the first purchase process of new users:

On the first day, the conversion rate of group A was 10% (10/100) and that of group B was 12% (120/1000);

The next day, the conversion rate was 15% (150/1000) in group A and 16% (160/1000) in group B;

Looking at the two days separately, the conversion rate of Group B was higher than that of Group A.

But in total, the conversion rate of group A was 14.5% (160/1100), and the conversion rate of group B was 14% (280/2000);

The conversion rate of group A is higher than that of group B. Therefore, it is impossible to directly judge the experimental results during analysis.

The existence of Simpson's paradox puts forward more requirements for AB testing, such as reasonable selection of user samples, monitoring and adjustment of sample size, and comprehensive data analysis.

The reason why the daily and total data have opposite conclusions in this example is that the sample sizes of Group A and Group B on the first day are quite different.

5. Layered Experiment

For large-scale products and well-established growth teams, there will be multiple AB tests running at the same time, which requires considering layered experiments.

Layered experiment means that multiple experiments are built into a layered structure, and the traffic used by each layer of experiments can be continued to be used by the next layer of experiments. Some of them are not easy to understand, so let me give you an example:

Taking the new user process of e-commerce products as an example, after the new user downloads and opens the APP, there is an entrance to the new user gift package on the homepage. After clicking in, the new user can view the new user rights and discounted products. After viewing the new user products, the new user completes the order. This is the basic path for new user conversion.

In order to optimize the existing new customer conversion process, multiple AB experiments were conducted simultaneously on the home page display, landing page display, and new customer product details page of the new customer gift package.

On the homepage display, AB experiments were conducted on button color and guide copy. In order to ensure the uniqueness of the variables, other contents including copy were completely consistent in the button color experiment, and other contents including button color were also completely consistent in the copy experiment. This requires dividing 100% of the traffic into two parts, assuming 50% each, that is, 50% of users conduct a button color experiment (25% see a red button, 25% see a yellow button, and the two sets of copy are consistent), and the remaining 50% of users conduct a copy experiment (25% see the "Receive Benefits" copy, 25% see the "Place an Order for 1 Yuan" copy, and the two sets of buttons have the same color).

After entering the new user page, an AB experiment was conducted on the rights display method. The traffic (100%) coming from the first layer (home page) was subjected to the AB experiment of rights display. The 100% traffic from the first layer was just subjected to the button color and copywriting experiment. In order to avoid the influence of the upper-layer experiment on the rights display experiment, the traffic from the upper layer was randomly distributed to the AB groups of the rights display experiment. This is the orthogonality of the traffic in the layered experiment, and the upper-layer traffic is evenly distributed.

Layered experiments are rare in actual work, but mature products must take this situation into consideration in order to conduct multiple AB experiments more efficiently and scientifically at the same time. Teams must also maintain communication to avoid conducting experiments alone without realizing that the results are affected by each other's experiments and leading to inappropriate conclusions.

The above is the five-stage sharing about AB testing. In the future work, AB testing will continue and new problems will arise.

Author: Wu Yijiu

Source: Wu Yijiu

<<:  Changsha Tea Drinking and Tasting 2022 Audition Venue, Highly Recommended for You with Rich Nightlife

>>:  5 classic marketing strategies to attract new users!

Recommend

Seven training camp short video driving and monetization compulsory course video

Seven training camp short video driving and monet...

What is the Pomegranate Algorithm? How to deal with the pomegranate algorithm?

From the perspective of the entire Internet ecosy...

Why doesn't anyone like your ad?

I have been thinking about a question recently. I...

How to sell popular products in the era of private domain traffic?

What is a hot product? Not only do we need to mak...

Analysis and trends of short video industry marketing strategies!

This article mainly focuses on the marketing rese...

How to make steady progress when launching an APP?

That’s the title , but if you ask me how I got 1 ...

November Hot Marketing Calendar

The National Day holiday is over No more holidays...