Data operation case: information flow feeds product optimization

The author of this article leads everyone to have a preliminary understanding of the basic recommendation engine and influencing factors of information flow, and summarizes the value of data operation through an actual case in information flow feeds data operation for everyone to learn and refer to.

1. Introduction to Information Flow

Information feeds have become almost ubiquitous, running through our 24-hour Internet life. When you are commuting on the subway, you can check Toutiao to get the latest news. The information flow has neatly arranged the hot articles in a queue for you to read. When you want to have a good meal, the information flow of Dianping has recommended many restaurants in the city. When you can't sleep at night and want to buy something to reward yourself after a hard day's work, the dazzling array of recommended products on Taobao are so accurate that you can't stop browsing...

Although the information flow format has been widely used, its earliest application was in the information content scenario, starting with the News Feed feature released by Facebook in 2006.

The platform aggregates content after sorting through established algorithms and rules, allowing users to consume content smoothly and efficiently on a single page. Users no longer need to frequently jump between portals and blog sites as they did in the prehistoric era of mobile interactive networks; platforms also keep users within their own jurisdiction more efficiently by providing aggregated content display platforms.

The English word for information flow is “Feed”, which is really a very clever word. Feed means "feed" in English, which vividly depicts the scene in the information flow where users are "fed" content by the platform in a certain order.

Users have limited time to consume content. How can a platform feed users with their favorite content within a limited time, so that they can consume more content on the platform (thereby bringing higher potential commercial value to the platform)? This is the "recommendation ranking" problem that all feed scene operators have been studying for years.

2. The basis of information flow: recommendation engine

The core of the recommendation engine is "how to recommend the right items to the right users", so the establishment of the connection between "items" and "users" is the most core proposition in the recommendation algorithm. The entire recommendation process can basically be summarized as the process of "recall" → "sort" → "adjust weight" → "output results". A simple metaphor will be used to help everyone understand the process.

Everyone must have participated in military training during their school days. The final parade review at the end of the training is the highlight of the entire training process. So how do we arrange the queues reasonably?

First, the instructor will need to "call" all the students in Class A to the playground to wait for the arrangement. Only the students in Class A can participate. The students in Class B and C do not need to participate.
Next, the instructor will ask the students to "sort" from tall to short, so that the team will not look uneven; at this time, although the students have been arranged from tall to short, some students may need to perform in the military band during the performance, so the instructor needs to "adjust" these students and exclude them;
Finally, the team arranged according to this rule is the team arrangement of Class A in the final performance.

Recommendation algorithms are a very deep and technical subject, but because this book is mainly aimed at operators, the author tries to summarize the main factors that affect the ranking of information flows from a more explicit level:

Time factor. Time is a relatively basic ranking factor. Many content products initially use time as the first ranking factor. For example, public accounts were initially sorted entirely by time. However, as the amount of content continues to increase, how to use recommendation algorithms to help users find the content they are most interested in becomes a proposition of other factors.
User portrait factor. The premise of this factor is very intuitive: "Different people have different tastes." Although we often say that we should not "label" people, for algorithms, only "labeling" can better understand a person. For example, if you have the label of "Internet practitioner", the content recommended to you will naturally be more inclined to Internet industry information, new technological trends, etc.; if you have the label of "pregnant mother", the content recommended to you will also include more parenting information.
Interest factor. Although both are understandings of people, user portraits focus more on people’s “attributes”, while interests focus more on people’s “hobbies”. Some products understand user interests in a more direct way, asking users to check their areas of interest when they first come in, and then recommending corresponding content to users. Secondly, they can also use some indirect methods, such as "the length of time users spend reading a certain piece of information", "the probability of users clicking on a certain type of information", etc. to understand user interests indirectly.
Positive and negative feedback factors. As the name suggests, it refers to the positive or negative feedback users give to the content recommended by the platform. Positive feedback includes "like" and "three clicks in one click"; negative feedback includes "report" and "don't want to see it again". Many users are also well aware of this. When they come into contact with a new information flow product, they will use this method to "tame" the information flow and "tune" the content that best suits them.
Interaction factors. It can be considered as a further refinement of positive and negative feedback factors. For example, many UP hosts at Bilibili often say "retweet, comment, like", hoping that through users' interactive indicators such as forwarding, sharing, commenting, and likes, their content will be considered as high-quality content by the algorithm, thereby gaining higher exposure. In addition, some specific behaviors of users, such as "purchase", are also very important signals for the recommendation algorithm to enhance the weight of related items.
Social factor. For products with social relationships, recommendation algorithms have more room to play. The one with the most unique advantages is WeChat, which has accumulated deep social relationship information of more than one billion users in China. For example, the content ranking of "Take a Look" uses the user's social relationships for algorithm recommendation. If more of our friends are "watching" a certain content, its ranking in our "Take a Look" will be better.
Heat factor. Current affairs change rapidly, and current breaking news events tend to attract more attention, resulting in popular current affairs events receiving higher recommendation rankings. Social trends are constantly changing. The latest hit TV series or the latest fashion trends will also make some products popular recently, thus obtaining a higher recommendation ranking.
Manual operation intervention factor. Sorting based on recommendation algorithms can basically solve most efficiency problems, but for some low-quality content, such as false news and vulgar content, manual intervention is required to downgrade or filter it.

3. Problem: How to cold start information feeds?

Having said that, I would like to share with you my previous experience in operating a tool product. Everyone may be familiar with the dilemma of most tool products: users stay for a long time and have poor stickiness, which leads to limited efficiency and methods of monetization. There are many competing products on the market. If we cannot quickly prove the value of our product through data indicators, the entire product will face the risk of being cancelled.

Therefore, how to increase the time users spend on the site has become a very important issue within our team. Our tool product has the function of WiFi connection. Previously, after users successfully connected to WiFi, the landing page they were redirected to was a "Connection Successful" page. Apart from that, there was no other connection.

However, at this time, the user is at an emotional high point of completing the operation, and is in a WiFi scenario that is not sensitive to traffic. We wondered if we could provide users with some content consumption value by taking over the content of information feeds, while also creating a commercial monetization scenario?

But we are a tool product team and have no experience in content operation. How can we create an information feed from 0 to 1? After analyzing the current situation of our team, we decided to quickly start from the following aspects: First, where does the information content come from? Some of our sister products have ready-made information content, but we need to develop the specific recommendation algorithms ourselves; although our algorithm team has no experience in content recommendation, their experience in recommendation in software distribution also has similarities that can be learned from and reused.

A good cook cannot cook without rice. We have both the "rice" and the "good cook", but whether it is "fried rice" or "rice soup" that our users think is the most delicious, we need to try more before we can come to a conclusion.

There are so many factors for recommendation sorting, but for us, due to the attributes of the tool product, not many of them can be used. According to our situation, we decided to conduct the following three groups of A/B test experiments:

Sorting based on user profile. The user attribute data we can obtain include: the user's software installation list data, which can infer the user's preferences to a certain extent; the user's geographic location data, which can recommend some local news, nearby attractions and other information. By combining the user data from these two aspects, we can recommend appropriate information content to users.
Sort by popularity. Because the information content we obtain does not come with popularity data from other platforms, popularity ranking is a relatively lagging process in our product. Users are required to continuously "feed" the algorithm through click behavior for learning, so as to recommend more popular content in the product to more users to read.
Sort by news release time. It is equivalent to a basic control group, which does not make too much intervention in the algorithm sorting of information, and is used to compare the results of the first two groups of experiments.

Based on the settings of the three groups of experiments, we selected three groups of random test user groups to implement the strategy, and set the "average information consumption time" as the key evaluation indicator. It took three long days to wait for the experimental results to show up. During these three days, our team was also betting on which strategy would perform best. Readers, guess which strategy will perform best?

4. Analysis: Find the deeper causes of the problem

The bets within the team basically all focused on the view that the strategies of the first two groups would be better. The view of colleagues who think that user portraits are better is straightforward: users will be more interested in content that is more relevant to them. Colleagues who think that popularity sorting will have a better effect are also right. Content that more people click on is often curious and fresh, which will naturally attract more people to read.

But after our operations staff collected and sorted the experimental data, they were a little surprised: the least popular option three, which was based on time sorting, actually had better "average information consumption time" than the first two options. The team was a little discouraged for a while, and there were also doubts about the technical capabilities of colleagues in the algorithm team.

As operators, we need to go one step further through data analysis at this time to see: Are the data indicators showing the whole truth?

In order to analyze this problem, we first broke it down.

Experimental data indicators:

Are there any problems with the data indicators we set?
Is there any problem with the calculation of data indicators?
Are the data indicator calculations for each experimental plan based on the same caliber?

The experimental design:

Is the selection of users in the experimental group random enough?
Are all the data required for the experimental strategy necessary?
Is the experimental strategy fully effective for its user group?

After disassembly and analysis, we found that the poor data indicators of the first two groups of solutions are not necessarily the whole truth. First of all, we found that there are certain problems in the setting of the indicator "average information consumption time". After all, our product is a tool product, and most users leave after connecting to WiFi. Information Feeds are destined to be a function only for some relatively idle users.

Therefore, the "average information consumption time" of users between the experimental groups is very discrete, and the existence of individual extreme value users in Plan 3 has raised the overall average time data. To solve this problem, we can make certain adjustments to extreme values during calculations and add the data indicator of "average information click-through rate" to more objectively evaluate the effectiveness of each solution.

Secondly, through analysis, it was found that due to data collection reasons, Plan 1 and Plan 2 did not fully achieve the effects of their respective strategies. For example, in solution one, “sorting based on user portrait”, many users in the experimental group had incomplete installation list data due to Android permission restrictions; the geographic location identification of some users’ IPs was not accurate enough. The test found that some users in Guangzhou were recommended local news in Beijing, which naturally affected the effectiveness of the strategy.

For example, in Plan 2, since some "clickbait" content has a high click-through rate, the first screen of the experimental group of users is full of "clickbait" content. The content quality is very low, and users jump out of the screen quickly after clicking, resulting in poor experimental results of the strategy.

5. The Importance of Data Operations Thinking

If we do not further analyze the data indicators and only look at the experimental results, we may directly think that "time sorting" is the best solution for our users and we should develop in this direction in the future. There is no need for the so-called optimization of the model algorithm. But only through analysis can we see the full picture of the facts more clearly and continuously propose optimization plans for iteration.

What is reflected here is the importance of problem-breaking thinking and the importance of logical problem-analysis thinking. I hope that through this book, I can share these thinking frameworks with you, the readers, and become a better operator.

Write at the back

In the future, we will share more articles on data operations, Internet products (or some personal artistic hobbies) on the platform. Everyone is welcome to communicate!

Author: Huang Yiyuan

Source: Huang Yiyuan

<<: 【Guxia Wuji】Jianghu Wuji Riding White Horse Tactics Document

>>: 10 Trend Predictions for Influencers and Influencers in 2020

Marketing strategies of Airbnb and other homestay platforms!

Estimated time for Changchun to be fully unsealed in 2022: When will it be fully unsealed? Attached is the latest official news

Recently, there have been several pieces of good n...

Data operation case: information flow feeds product optimization

1. Introduction to Information Flow

2. The basis of information flow: recommendation engine

3. Problem: How to cold start information feeds?

4. Analysis: Find the deeper causes of the problem

5. The Importance of Data Operations Thinking

Write at the back

Marketing strategies of Airbnb and other homestay platforms!

4 ways to prevent user churn!

How to write the copy for May Fourth Youth Day? Share 15 articles!

The latest news on Shaanxi’s three-child policy 2022 subsidies: What are the specific subsidy policies?

Douyin is a small business with zero threshold to make money with a daily income of 600+. All you need is hands!

How to plan and promote a perfect event?

Tips for optimizing Toutiao advertising accounts

Guilin Moving Mini Program Franchise Price Query, How much is the franchise price for Guilin Moving Mini Program?

Douyin and WeChat Reading violate users’ personal information (with original text)

What are the functions of the Guangzhou Farmhouse Mini Program? How to develop the Farmhouse Mini Program?

Recommend

How to deal with the crash of enterprise server hosting system?

Hulunbuir Mini Program Customization Company, how much does it cost to customize a course and purchase a mini program?

Zhihu ranking algorithm and traffic diversion method

New media operation: creating a high-viscosity fan community

4400 words to analyze the user growth system of Zebra English

Without tens of millions of users, how can the value of To B operations be reflected?

How to use growth hacking techniques to attract new customers and boost activation?

3 concepts for practical data analysis and decision-making!

How to seize the benefits of WeChat mini programs?

The two core elements of product promotion: content and channels

Estimated time for Changchun to be fully unsealed in 2022: When will it be fully unsealed? Attached is the latest official news

Learn Bazi from scratch, learn Bazi Jiugongge from scratch!

Information flow advertising landing page planning methodology!

APP advertising: How to choose the right delivery channel?

Douyin promotion methods, the most comprehensive guide to increasing Douyin fans!