Weimei Weight Loss: What is the role of user tags on the website?

Weimei Weight Loss: What is the role of user tags on the website?

The recommendation system on the website has two cornerstones: user tags and content analysis. Content analysis involves some aspects of machine learning. Compared with the two, user labeling is more difficult.

The user tags we often use on Toutiao's website include topics that users are more interested in or extremely important keywords, etc. We can obtain the user's gender information from third-party social accounts. The user's age information is mainly predicted from the model, mainly based on the user's reading time and device model. Frequently visited locations are mainly obtained by users authorizing websites to access them.

Of course, the simplest and most basic user tags on the website are the content tags that users browse. It is mainly divided into three aspects: the first aspect is the ability to filter out noise. The website filters out clickbait titles by the length of time users stay. The second aspect is hot spot punishment. For some extremely popular articles on the website, users will leave messages on them, but there are also some bad messages, which will be punished. For example, demotion and so on. The third aspect is time decay. As users grow older, their interests will change, so the website’s strategy will be more inclined towards new users. Now, as user actions increase, the influence of some weights over time will decrease. The fourth aspect is penalty display. If an article is recommended to users but no one clicks on it, the weight associated with it will be penalized.

It should be noted that user tags mostly find some simple keywords. For example, the first version of Toutiao's user tags is a batch computing framework. In this system, its process is simpler than others.

Face these challenges. At the end of 2014, Toutiao launched the user tag Storm cluster streaming computing system. After switching to streaming mode, labels are updated whenever there is a user action update. The CPU cost is relatively small, which can save 80% of the CPU time and greatly reduce computing resource overhead. At the same time, only dozens of machines are needed to support the update of interest models for tens of millions of users every day, and the feature update speed is very fast, basically achieving near real-time. This system has been in use since it went online.

But the problem is that with the rapid growth of users, the types of interest models and other batch processing tasks are increasing, and the amount of computation involved is too large. In 2014, it was difficult to complete the Hadoop task of batch processing millions of user label updates on the same day. The shortage of cluster computing resources can easily affect other work. The pressure of centralized writing to the distributed storage system has also begun to increase, and the update delay of user interest tags has become higher and higher. Of course, we also found that not all user tags require a streaming system. Information such as the user's gender, age, and permanent location does not need to be recalculated in real time and can still be updated daily.

IV. Evaluation and Analysis The above introduces the overall architecture of the recommendation system. So how to evaluate the recommendation effect?

There is a saying that I think is very wise, "If you can't measure something, you can't optimize it." The same goes for recommendation systems.

In fact, many factors will affect the recommendation effect. For example, changes in the candidate set, improvements or additions to the recall module, additions to recommended features, improvements to the model architecture, optimization of algorithm parameters, etc., are not listed one by one. The significance of evaluation lies in the fact that many optimizations may ultimately have negative effects, and the effects will not necessarily improve after the optimization is launched.

A comprehensive evaluation and recommendation system requires a complete evaluation system, a powerful experimental platform, and easy-to-use empirical analysis tools. The so-called complete system means that it is not measured by a single indicator. We cannot just look at click-through rate or length of stay, etc. A comprehensive evaluation is needed. In the past few years, we have been trying to see if we can combine as many indicators as possible to form a synthetic evaluation indicator, but we are still exploring. At present, our online launch still needs to be decided after in-depth discussion by a review committee composed of more experienced students in each business.

The reason why many companies do not perform well in algorithm development is not because their engineers are not capable enough, but because they need a powerful experimental platform and convenient experimental analysis tools that can intelligently analyze the confidence of data indicators.

The establishment of a good evaluation system needs to follow several principles, the first of which is to take into account both short-term and long-term indicators. When I was in charge of e-commerce in my previous company, I observed that many strategy adjustments seemed fresh to users in the short term, but were actually of no help in the long run.

Secondly, we must take into account both user indicators and ecological indicators. As a content creation platform, Toutiao must not only provide value to content creators so that they can create with more dignity, but it also has an obligation to satisfy users. These two must be balanced. The interests of advertisers must also be considered. This is a process of multi-party bargaining and balancing.

In addition, attention should be paid to the impact of synergistic effects. Strict traffic isolation is difficult to achieve in experiments, and attention should be paid to external effects.

A very direct advantage of a powerful experimental platform is that when there are many experiments online at the same time, the platform can automatically allocate traffic without the need for manual communication, and the traffic can be recycled immediately after the experiment ends, thereby improving management efficiency. This can help companies reduce analysis costs, accelerate algorithm iteration effects, and enable algorithm optimization work for the entire system to move forward quickly.

<<:  Viable business models for live streaming products: I have summarized 13

>>:  Zhou Huimin’s personal profile: What impact does keyword density have on website optimization?

Recommend

How much does a SEM bidding specialist earn?

Every year, a large number of graduates study SEM...

3 tips for user retention!

When we were doing research on user retention, we...

During product iteration, how to use data to drive user growth?

In product iteration aimed at user growth , "...

How to evaluate, monitor and promote KOL marketing channel conversion?

With the rapid development of the Internet , we h...

What do bidding promoters need to do every day? How to do bidding promotion?

For many friends who have just entered the biddin...

App paid promotion, five factors that affect user registration conversion rate

With the development of mobile Internet, the numb...

How to refine bidding promotion accounts and improve promotion effects!

When we do promotion, the proportion of bidding p...

Super practical Baidu information flow account volume skills

Many people have this question: the bids are the ...

OPPO App Store CPD Account Recharge Guide

1. Account Recharge Path: Finance – Account Recha...

How much does it cost to customize the Yongxin Musical Instruments Mini Program?

How much does it cost to customize the Yongxin Mu...