Experiments to ignite user growth: A/B testing best practices

Experiments to ignite user growth: A/B testing best practices

In order to achieve scientific growth in the second half of the competition on the Internet , it is imperative to make A/B testing play the role of a growth engine. This article shares the value of A/B testing in improving business conversion rates , as well as how to effectively promote A/B testing in a team and the scientific design practice of A/B testing systems.

1. Toutiao’s Growth Secret: A/B Testing Driven

TikTok can be said to be the hottest growing company right now, and it is popular on the mobile phones of people walking on the streets. It has made Tencent feel a deep sense of crisis and forced it to respond. Since the second half of 2017, TikTok has shown phenomenal explosive growth.

Its parent company ByteDance, valued at $75 billion, is itself a company that places great emphasis on experiments and uses A/B testing to drive scientific growth.

A/B testing is a very natural thing for Toutiao products, and the entire company, starting from the top management Zhang Yiming, attaches great importance to it. 36Kr once wrote in a report, "When Toutiao releases a new app, its name must be packaged in N packages and put into major app stores for multiple A/B tests before deciding. Zhang Yiming told his colleagues: Even if you are 99.9% sure that it is the best name, what does it matter if you test it?"

Toutiao has been using data thinking since the day it came up with its name. The founding team did not have a brainstorming session , no voting, and no boss making the final decision. Instead, they adopted a scientific experimental approach and determined the name of Toutiao through data observation.

They sorted out the top 10 free apps on the App Store, classified them according to their names (catchy colloquialisms, sentimental apps, apps simulating special sounds, company names + purpose apps, etc.), and analyzed the percentage of each category. The analysis concludes that catchy and plain language works best.

Secondly, we conducted A/B testing on different channels to determine the publishing channels with similar prior effects, and released them separately with exactly the same interface, function and logo. We also counted core data indicators such as user downloads and activity on each channel, and finally found that Toutiao had the best effect.

2. What is A/B testing?

A/B testing is a product optimization method that develops two plans (such as two pages) for the same optimization goal, allowing some users to use plan A while other users use plan B. The conversion rate, click volume , retention rate and other indicators of different plans are counted and compared to judge the pros and cons of different plans and make decisions.

The diagram above is a typical example of A/B testing.

In companies where A/B testing is more mature, it may not be limited to just two versions, A and B. There may be ABC testing, ABCD testing, or even ABCDE testing.

In some cases, more special A/B tests may appear, such as AAB testing. Because it is necessary to verify the accuracy of the entire AB testing system, two control groups need to be set up, so it is called AAB testing.

Regardless of how many experiments are run at the same time, we can refer to them as A/B tests, or ABtest or ABtest in English.

Combining public data and in-depth industry surveys, we have compiled an overview of the industry's A/B testing frequency, from which we can see that a company's market value or size is positively correlated with the frequency of A/B testing.

Large companies like Google have relatively mature A/B testing systems and data analysis platforms, and conduct an average of 2,000 A/B tests per week, including some relatively complex experiments, such as recommendation algorithm A/B tests, as well as relatively simple A/B tests. As for domestic first-tier Internet companies such as BAT , they also conduct hundreds of A/B tests every week.

Most of the companies we work with are from a wide range of industries, such as Internet finance , e-commerce , O2O and other manufacturers. They do not have the ability and energy to develop a mature A/B testing platform on their own, so they choose to cooperate with Testin A/B testing to quickly apply A/B testing services to their business.

For example, before using Testin A/B testing, an Internet finance user could only perform 0.1 A/B tests per week. After using the CloudTest A/B testing service, the frequency of A/B testing was greatly improved, running about 30 A/B test experiments per week.

Of course, among its 30 experiments per week, about 1/3 of the experiments will achieve a 5%-30% increase in conversion rate indicators, and the remaining 2/3 of the experiments are not ideal and do not achieve a good improvement in data indicators.

From this example, we can see that about 2/3 of the product ideas do not meet expectations , which means that the conversion rate is actually not as good as the original version. This is also the fundamental reason why A/B testing is needed. Product decisions are made based on product intuition, but 2/3 of the improvements are not the optimal solution.

The above chart shows the A/B test growth curve of Microsoft Bing search engine , covering the growth of Bing’s A/B test experiments from 2008 to 2015.

It can be seen that in the early days of Bing products, the frequency of A/B tests per week was maintained at 10 to 50. After 2012, the frequency of Bing A/B tests per week began to grow rapidly.

The green curve in the lower right corner of the chart is the A/B test frequency growth curve of Bing mobile. From this chart, we can see that Bing attaches great importance to and seriously implements A/B testing experiments to drive data growth and promote business development.

3. A/B testing application scenarios and cases

Let’s first look at the four major application scenarios of A/B testing in mobile applications, namely App, landing page , back-end algorithm and mini program .

The APP side is currently the main carrier of mobile Internet growth. PC or H5 (such as the common circle of friends screen-sweeping activities) or advertising landing pages can be classified as landing pages. There are also back-end algorithm scenarios, such as recommendation algorithms, advertising algorithms, and thousands of faces for each person, etc.

Currently, the fastest growing application scenario is mini programs.

In different scenarios, the focus of A/B testing is different, but the core goal is still to revolve around business growth, which is the "North Star Indicator" that everyone is familiar with, or the specific goals set in A/B testing such as DAU and MAU.

Case 1: Camera photography application

Taking Camera360 as an example, it chose Testin A/B testing service to help it make product optimization decisions.

This case is an attempt in the process of commercializing its products, hoping to increase the payment ratio of emoticons or props in the store. However, to achieve the payment target, it is necessary to first increase the click-through rate of the store entrance.

Therefore, they set up multiple store entrance plans (changing icon style and copywriting ) and used A/B testing to verify which plan could maximize the store entrance click-through rate.

During the verification process, they also conducted relevant targeted tests on target groups, such as Japan, China, South Korea and other regions. Ultimately, they launched 7 to 8 test versions of this entrance at the same time, and through A/B testing, they increased the overall click-through rate by about 80%.

Case 2

This case is an app in the internet financial industry. They hope to increase the number of sign-ins by changing the copy of the sign-in button, thereby improving the retention rate. The button copy was changed from "Sign in" to "Sign in to Make Money", and an A/B test was conducted, allocating 5% of traffic to versions A and B respectively.

After testing, it was found that the number of sign-ins in the new version increased by 4.17% compared with the original version. The 95% confidence interval result showed that after the test results of a small group of people were extended to all users, there was a 95% probability of obtaining an increase of 1.7% to 6.6%; the p-value was less than 0.05, indicating that there was a significant statistical difference between the new and old versions, and the Power was 100%, indicating that the statistical power was significant.

Through this simple A/B test, the App retention rate was greatly improved.

In this test, we also used the visualization function of Testin A/B testing to achieve the comparison function by directly modifying the attributes of related elements without the intervention of developers.

So when does a product need A/B testing?

We know that conducting A/B testing requires costs, such as developing multiple versions and building a usable A/B testing and data analysis platform.

Considering the input-output ratio, there are two necessary conditions for establishing an A/B testing platform. One is that product decisions have a great impact, and the other is that product solution selection is difficult.

If a decision has a big impact on the product but the choice is not difficult, there is no need to conduct A/B testing. For example, whether to decide to add WeChat and third-party login methods to the App will have a big impact on the product but the decision is not difficult because the industry already has common solutions.

For example, if you add a very small function with a very deep entry point and a small number of users, then the priority of A/B testing is not high. A/B testing is only most appropriate when a product decision satisfies both the conditions of having a large impact and being difficult to choose.

Taking the tests we conduct ourselves as an example, we will prioritize the features to be tested based on the size of the impact and the difficulty of selection, and then determine which features to perform A/B testing.

4. Three Elements of A/B Testing

Through communication with our partners, such as Ziroom, 36kr , Bullet SMS or 51 Credit Card , we found that there are three key elements to the implementation of A/B testing:

  • First, the human factor, or the thinking habits and ways of thinking of the entire team.
  • Second, business process, which is the growth workflow.
  • Third, tools .

To elaborate, from the perspective of "people", the most important thing is to require the entire team to have the thinking habit of data-driven growth and A/B testing-driven decision-making.

At the same time, if the growth or product team leader himself does not have this awareness, believes that A/B testing is unimportant, and relies more on experience to make product optimization decisions, then A/B testing will be difficult to do.

Whether it is APP or the current mini-programs, new products are emerging in an endless stream and the competition faced by the products is extremely fierce. In addition, the current Internet traffic dividend period is gradually ending and the cost of acquiring customers is increasing. If you want to continue to achieve business growth, the most effective way is to implement A/B testing and drive growth with data.

Industry development trends determine that all teams will gradually migrate to the path of using scientific experiments for growth. Even if your current team has difficulty promoting A/B testing, I believe that in the near future, A/B testing will be the most important driver of product growth.

I have had in-depth exchanges with many European and American growth peers, and I have a deep feeling that the A/B testing atmosphere is stronger in their Internet companies. Mainly because labor costs in the United States are relatively high, they pay special attention to the input-output ratio, so they entered the stage of refined operations very early.

In terms of business process:

  • First, you need to pay attention to the form of your product, whether it is based on an APP, mini program, official account or website. The A/B testing implementation plan will be different for different business scenarios.
  • Second, we need to consider whether A/B testing is well integrated into the product iteration or growth team workflow. The best practice is to tightly couple the entire product optimization iteration process and release rhythm with A/B testing to form an assembly line operation. This is why companies such as BAT can achieve such a high weekly frequency of A/B testing.

In terms of tools, one is self-developed and the other is to use third-party services.

Self-developed services have certain advantages in terms of controllability and business coupling, but for general enterprises, their R&D costs and labor costs are very high. Developing A/B testing services also involves relatively strict data statistics and requires the deployment of professional data analysts.

If you use third-party tools currently on the market, such as Testin A/B testing service, you can minimize costs and accelerate the implementation of A/B testing services.

For example, after a mini program user connected to the Testin A/B testing service on the same day, he ran three A/B testing experiments on the same day. Whether you develop your own tools or use third-party tools, the key is to suit your own team.

5. A/B Testing Best Practices

The best process for A/B testing can be divided into four steps:

  • Analyze data : Analyze various data indicators of the existing original version, such as registration conversion rate, etc. For example, if the registration conversion rate is only 10%, come up with ideas for this conversion rate;
  • Propose an idea: For example, if you want to improve the registration process, users previously needed to enter a text message verification code, and you plan to change it to a picture verification code to form an improvement alternative plan. With this basic assumption, it is estimated that the conversion rate can be increased with a high probability;
  • Importance ranking: Due to limited team resources, it is impossible to verify all the requirements and ideas. Therefore, it is necessary to rank the importance and select the most important improvement plans for A/B testing, and then proceed to the fourth step;
  • A/B testing: In this process, we need to monitor the A/B test data. There are generally two results: one is that the data proves that the experiment is invalid, and the other is that the data proves that the experiment is effective. After a lot of testing, we found that for most A/B testing experiments, 1/3 were proven to be effective, and 2/3 were proven to be ineffective (the effects were not much different from the original version, or the effects were worse than the original version).

Here we need to pay attention that not all experiments will be proven to have a significant effect on indicator growth. If this is the case, there is no need for us to conduct experiments.

If you encounter this situation, you need to tell your team members not to be discouraged. It is precisely because some experiments have proven ineffective that we will find effective ways to grow.

Experimental failure is a high-probability event. Our best approach is to increase the frequency of testing and continue testing, rather than just testing briefly and returning to the old path of empirical decision-making.

If your team has never done A/B testing, here are three suggestions for you:

  1. Start with the simplest copywriting A/B test, such as testing the conversion rate of different copywriting in key buttons;
  2. Share more experiences among the team, and share your successful experiences. Everyone is willing to try things that work. Don’t share failed experiences every day. If you share too many failed experiences, you and your team will question A/B testing and affect team morale.
  3. You can give priority to using third-party free A/B testing tools, such as Testin A/B testing, which currently supports App, Web/H5, and mini programs.

6. Enterprise A/B Testing Maturity Model

The above introduces the three key factors for implementing A/B testing and the best practice process for A/B testing. In this section, I will share with you the enterprise A/B testing maturity model.

We divide enterprise A/B testing into four stages: the start-up stage, the growth stage, the mature stage, and the large-scale application stage. The most core indicator of the maturity of this capability is how many A/B tests can be conducted per week.

It is in its start-up phase, and can conduct 0 to 1 A/B test per week on average. The entire organizational structure is in the initial stage of trying A/B testing, but there is no mature A/B testing experimental platform internally. The simplest diversion method and data analysis method are still used for experiments.

The A/B test at this time is not a standard A/B test. From the perspective of the experimental evaluation system, a most basic indicator has been set, such as conversion rate, but it is still not systematized.

What are systematic indicators? That is, it evolves from a single indicator to a multi-dimensional indicator system, systematically tracking the multi-faceted impact of experiments on products.

The third stage is a relatively mature stage. At this time, 3 to 10 tests can be completed per week. A/B testing has become a part of the product iteration process, and advanced functions such as visual A/B testing and back-end A/B testing are required to meet diverse A/B testing needs.

In the stage of maturity and large-scale application, a term OEC is mentioned. OEC can be understood as a comprehensive evaluation index, which may be a composite index obtained by weighted averaging many individual indicators. Through the setting of OEC, guide the performance development of the entire organization.

7. A/B testing system design capabilities

The above shares how to implement A/B testing. Next, let me share with you what capabilities or features are needed to design a typical A/B testing system:

1. Scientific traffic segmentation

Including uniqueness, uniformity, flexibility, directionality and stratification and diversion.

  1. Uniqueness means that a precise and efficient hash algorithm is used to ensure that the trial version assigned to a single user each time he logs into the application is unique;
  2. Uniformity is to ensure that the distribution ratio of each dimension is uniform when diverting people;
  3. Flexibility requires that users be able to adjust the traffic distribution ratio between experimental versions at any time during the experiment.
  4. Directionality means precise and targeted diversion based on user tags, such as specific diversion based on user device tags and other custom tags.
  5. Layered and diverted traffic can meet the needs of conducting a large number of A/B tests in parallel.

Left: The tiered traffic diversion mechanism is not enabled; Right: The tiered traffic diversion mechanism is enabled

Here we will focus on why a layered traffic segmentation mechanism is needed. If there is no tiered traffic mechanism, the following limitations exist:

  • Each user can only participate in one A/B test experiment
  • Multiple experiments cannot be tested with all users at the same time, which may lead to biased results due to insufficient population coverage. The available experimental traffic for each experiment is limited by other ongoing experiments, and there is a lack of flexible traffic allocation mechanism.

With the layered traffic segmentation mechanism, the needs of conducting A/B testing in parallel for different businesses or scenarios, or between different product modules can be well met.

  1. Scientific Statistical Algorithms
  • Scientific statistics: use scientific statistical analysis methods to analyze experimental data and provide reliable test results;
  • Interval estimation gives a 95% confidence interval to avoid the decision risk brought by point estimation; statistical significance judgment uses p-value to judge the significance of differences between different experimental versions; statistical power judgment uses Power to judge whether the statistical power of different experimental versions is sufficient; lean analysis denoises the experimental data and removes noise data to improve the quality of statistical results.

The above is the basic sharing content. Due to limited space, I will have the opportunity to share more A/B tests with you later.

Author: Chen Guancheng, authorized to be published by Qinggua Media .

Source: testindata

<<:  IP account creation丨Master the core of IP operation in 3 dimensions

>>:  Chen Naiba's Douyin Book List Performance Course, the core skills and operating standards for quickly starting an account [Video Course]

Recommend

Trump cancels G7 summit, what's going on? G7 leaders switch to video conference

[Global Network Express Reporter Zhu Mengying] CN...

Pinduoduo Product Analysis

Pinduoduo focuses on the sinking market and is fa...

How to create a “hit product”?

Introduction: As long as the trend is right, ever...

Pinduoduo, can’t escape from vulgar marketing?

Today while surfing I saw a message saying: Open ...

Essential elements for Facebook advertising!

When it comes to Facebook, many people’s first im...

Weibo PUSH strategy and optimization plan!

Strategic products are already a relatively compl...

How to do cross-border marketing? One article explains

Hello everyone, today’s topic is very big - cross...

Where can I get the Wenchang Pagoda? Where is Wenchang Tower effective?

Now, there are enough Feng Shui mascots on the ma...

New trends in marketing and promotion in 2019!

When "What's Peppa Pig?" was all ov...