The shuffling layer is responsible for integrating the results of multiple heterogeneous queues, such as advertising, games, and natural traffic. It needs to obtain the optimal solution under multiple upstream and downstream and business constraints, which is relatively complex and difficult to control. This article mainly introduces some explorations and thoughts of the vivo advertising strategy team on information flow and app store shuffling from the perspectives of business and model. 1. BackgroundFirst, let me introduce what mixing is. As shown in the figure, mixing means that under the premise of ensuring user experience, heterogeneous content in different queues is reasonably mixed to achieve optimal revenue and better serve advertisers and users. The core challenges of shuffling are:
This introduction mainly focuses on the mixed arrangement practice of vivo information flow and store scenarios. Vivo's information flow scenarios include browsers, i-videos, and negative one screens. Its characteristics are numerous scenarios, high drop-down depth, diverse advertising formats, and strong user personalized needs. As for the store scenario, it is an overall vertical scenario. It involves a balance between advertising, games, and natural content, and requires achieving a comprehensive optimal solution under strict requirements such as maintaining content quality and user experience. We will introduce the characteristics of these two scenarios one by one later. 2. Information Flow Mixing Practice2.1 Introduction to information flow mixingLet us begin by introducing the practice of mixing in information flow scenarios. For information flow scenarios, as shown in the figure below, the main problem solved by the mixing side is the mixing of content queues and advertising queues. In other words, how to insert advertisements into appropriate positions while balancing user experience and advertiser interests. For traditional information streaming media, the main mixing method in the early days may be mainly based on fixed position templates. That is, the operator manually determines the insertion relationship between advertisements and content, which is simple and direct. But it also brings three obvious problems :
2.2 Industry Solution ResearchNext, we will introduce several common solutions in the industry. Take the solution of a workplace social platform as an example. It sets the optimization goal to optimize the revenue value under the premise that the user experience value is greater than a certain value. For the ads to be inserted, the user experience is monetized and the overall value is weighted with the commercial value. If the overall value is greater than the user experience value, the advertising content is delivered, otherwise the product content is delivered. In addition, when delivering, constraints such as intervals are also considered as shown in the right figure. His method is simple and direct, and many teams have achieved good results by using similar solutions. However, this solution only considers the value of a single item, but does not consider the mutual influence between items, and lacks consideration of long-term benefits. Next, we will introduce a solution for a short video. They use reinforcement learning to mix and sort. This solution abstracts the information flow mixing problem into a sequence insertion problem, and abstracts the insertion of different ads in different slots into different actions, which are selected through reinforcement learning. When considering reward design, it combines advertising value (such as revenue, etc.) and user experience value (such as sliding and leaving). The two are balanced by adjusting hyperparameters. However, this solution is highly dependent on engineering and the paper mainly uses offline testing, lacking online analysis. In addition, the model only considers single ad insertion, not multiple ads. Specifically for the iteration of vivo information flow scenarios, the shuffling iteration includes three stages: fixed-bit shuffling, Qlearning shuffling, and deep solution space shuffling. The overall idea is to accumulate samples and quickly explore benefits through a simple reinforcement learning solution in the Qlearning stage, and then upgrade to a deep learning solution. 2.3 Qlearning shufflingThe above is the basic process of reinforcement learning. The biggest feature of reinforcement learning is learning through interaction. The agent continuously learns knowledge and adapts to the environment based on the rewards or penalties it receives during interaction with the environment. State, reward, and action are the three most critical elements in reinforcement learning, which will be discussed in detail later. What are the benefits of the Qlearning mixed arrangement mechanism of vivo information flow? First of all, it will take into account the full page benefits and long-term benefits, which meets the requirements of multiple brushing scenarios. In addition, the Qlearning model can run in small steps and quickly accumulate samples while quickly verifying the effect. In the current overall system architecture, the mixed scheduling system is located after adx. After receiving the content queue and the advertising queue, the Qlearning model sends the weighting coefficient to adjust the weight of the advertisement, and after superimposing the business strategy, a fusion queue is generated. User behavior will also trigger the Qlearning model update. The operating principle of the Qlearning model is shown in the figure. First, the qtable is initialized, then an action is selected, and the qtable is updated according to the reward obtained by the action. The loss function considers both short-term and long-term benefits. In vivo's practice, in reward design, we comprehensively consider user experience indicators such as duration and advertising value, smooth the two, and weigh them through hyperparameters. In action design, the first phase uses a numerical method to generate advertising weighting coefficients, which act on the advertising ranking score and mix with the content side to achieve mixed ranking. The state design includes four parts: user features, context features, content features, and advertising features. Statistical features and context features have a great impact on the Qlearning model. In vivo information flow scenarios, Qlearning shuffling has achieved good results and has covered most scenarios. 2.4 Depth-position shufflingQlearning shuffling has some limitations:
In order to solve the problem of Qlearning , we developed deep positional shuffling. The shuffling mechanism was upgraded from the original numerical type to a positional shuffling that directly generates positions, and the model itself was upgraded from Qlearning to deep learning. This brings 3 benefits :
Our overall model architecture is the mainstream model architecture in the industry, similar to the dual-tower DQN model architecture. The left tower mainly inputs some state information including user attributes, behaviors, etc., and the right tower inputs action information, which is the basic information of solution space arrangement. It is worth mentioning that we will incorporate the solution of the previous brush into the current model as a feature. The new solution space model has a larger action space and a higher ceiling. However, sparse actions are difficult to learn fully, which can easily lead to inaccurate predictions. To solve this problem, we added small-flow random experiments online to improve the hit rate of sparse actions and enrich sample diversity. Sequence features are one of the most important features of the model and one of the important features of the reinforcement learning model to characterize the state. We have made some optimizations to the sequence. In the sequence attention module, in order to solve the matching degree between the user's historical interests and the ads to be inserted, we use transformers to characterize the user behavior sequence information; then, we use the ads to be inserted and the sequence attention operation to characterize the matching degree. In addition, in the sequence match module, we introduce prior information to generate strong cross features to supplement the attention; for the match weight, we extract information through CTR, hit, time weight, TF-IDF, etc. 3. App Store Mixing3.1 Introduction to store mixNext we introduce the app store mixing module. The core problem of store mixing is to achieve mixed order of advertising queue and game queue. As shown in the figure, the order of advertising and game is defined differently, making it difficult to compare directly. In addition, the payback period of joint operation games is long, and LTV is difficult to estimate accurately. Even if all are sorted according to ecpm, it is difficult to guarantee the effect. Let’s sort out the core challenges facing app stores:
Back to the vivo App Store mixing, the overall iteration includes four stages: fixed-bit mixing, PID quantity preservation, constrained mixing, and refined diversion of mixing. 3.2 PID quality assuranceFirst, let's introduce the PID solution. PID originated from the field of automation. In the early stage, in order to meet the demands of the business side, we referred to the mainstream solutions in the industry and initially realized the mixed scheduling capability by ensuring the quantity of advertisements and games. However, the solution is relatively simple, and it is difficult to associate PID with the revenue target, making it difficult to achieve the optimal revenue. 3.3 Constrained MixingThere is a certain degree of conflict between maintaining volume and maximizing profits. The biggest difficulty is how to achieve optimal comprehensive business profits while meeting the constraints of maintaining volume. The vivo store mixed arrangement adopts the idea of traffic splitting and fine-tuning, and re-arranges after PID volume preservation, comprehensively considering the balance point of user experience, advertising revenue, and game value. In view of the conflict between re-arrangement and PID volume preservation, the re-arrangement is only effective for some positions, so that the revenue exploration can be carried out under some traffic, such as the first screen, while meeting the volume preservation demand. At the rearrangement layer, we initially considered using the information flow shuffling solution and using reinforcement learning for shuffling. However, there are two problems :
In order to adapt to the characteristics of the store scene, we have made some adaptations and optimizations :
This version has achieved good results in the mixed display in the vivo store and has been fully released. 3.4 Mixed and refined traffic diversionBased on the constrained rearrangement, we consider whether further optimization can be performed.
So how can we achieve optimal returns while ensuring quantity? We began to try to fine-tune the traffic diversion of mixed scheduling, and removed the volume guarantee restrictions for some branches and relaxed the constraints, so that PID can focus on meeting business demands such as volume guarantee, and the model can focus on exploring better space. In the current version, when a request comes in, we will determine whether it is high-quality traffic based on the traffic diversion module. For high-quality traffic, we will explore the benefits through the mixed arrangement model, and for low-quality traffic, we will use PID to maintain the volume, and merge the final results. In this way, the reordering strategy can take effect on all requests with partial traffic, and the overall volume maintenance is also within the normal range. The diversion methods we have tried so far include commercial value diversion, game preference diversion, advertising space diversion, experience mechanism diversion, etc. We have also made some iterations on the rearrangement model. The current rearrangement layer and numerical model have some problems:
To solve the problem:
The benefits of this model in experimental traffic are more obvious than those of the original model, and it is not affected by upstream scoring and is more stable. IV. Future ProspectsThe outlook for the future includes four aspects:
In the process of exploring heterogeneous mixed scheduling, vivo encountered many challenges and also achieved certain benefits. Interested students are welcome to leave a message for discussion. |
<<: Let’s talk about SwiftUI layout protocol
>>: iOS 16.2 quasi-official version released, karaoke function is here
In iOS 14, if users don’t like native Safari and ...
A store without customers will close, and a produ...
A complete collection of contact information for ...
3 questions you must know about APP promotion and...
Jilin City has achieved an important phased victor...
First of all, I want to emphasize that this is a ...
Q: How long does it take to review a Baidu Mini P...
Q: How do I report a mini program? Or can someone...
What is a cold start and how to do it? Is there a...
NO.1 Corporate self-media is job self-media, not ...
Planning a complete event plan includes five impo...
In recent years, disputes in the short video indu...
Nowadays, public accounts have become one of the ...
He said he has 760,000 QQ groups, but QQ groups d...
Yulin Gardening Mini Program investment promotion...