Technical Practice of Heterogeneous Hybrid Scheduling in vivo Internet

The shuffling layer is responsible for integrating the results of multiple heterogeneous queues, such as advertising, games, and natural traffic. It needs to obtain the optimal solution under multiple upstream and downstream and business constraints, which is relatively complex and difficult to control. This article mainly introduces some explorations and thoughts of the vivo advertising strategy team on information flow and app store shuffling from the perspectives of business and model.

1. Background

First, let me introduce what mixing is. As shown in the figure, mixing means that under the premise of ensuring user experience, heterogeneous content in different queues is reasonably mixed to achieve optimal revenue and better serve advertisers and users.

The core challenges of shuffling are:

Different queue items have different modeling objectives, so it is difficult to compare them directly. For example, some queues are modeled according to CTR, while others are modeled according to ECPM, so they cannot be compared directly.
Candidate queues are often subject to a large number of product rules, such as spacing constraints, quantity guarantee, and first-place constraints.
Since the candidate queues are generated by the upstream parties' precise sorting algorithms, the order of the candidate queues cannot be modified during the mixed sorting due to business restrictions, which means that order-preserving mixed sorting is required.

This introduction mainly focuses on the mixed arrangement practice of vivo information flow and store scenarios.

Vivo's information flow scenarios include browsers, i-videos, and negative one screens. Its characteristics are numerous scenarios, high drop-down depth, diverse advertising formats, and strong user personalized needs. As for the store scenario, it is an overall vertical scenario.

It involves a balance between advertising, games, and natural content, and requires achieving a comprehensive optimal solution under strict requirements such as maintaining content quality and user experience. We will introduce the characteristics of these two scenarios one by one later.

2. Information Flow Mixing Practice

2.1 Introduction to information flow mixing

Let us begin by introducing the practice of mixing in information flow scenarios.

For information flow scenarios, as shown in the figure below, the main problem solved by the mixing side is the mixing of content queues and advertising queues. In other words, how to insert advertisements into appropriate positions while balancing user experience and advertiser interests.

For traditional information streaming media, the main mixing method in the early days may be mainly based on fixed position templates. That is, the operator manually determines the insertion relationship between advertisements and content, which is simple and direct.

But it also brings three obvious problems :

From the user's perspective, ads appear with equal probability in preferred scenarios and non-preferred scenarios, which damages the user experience.
From the business perspective, traffic is not delivered accurately, business service efficiency is low, and advertiser experience is poor.
On the platform side, resource mismatch leads to waste of platform resources.

2.2 Industry Solution Research

Next, we will introduce several common solutions in the industry.

Take the solution of a workplace social platform as an example. It sets the optimization goal to optimize the revenue value under the premise that the user experience value is greater than a certain value. For the ads to be inserted, the user experience is monetized and the overall value is weighted with the commercial value.

If the overall value is greater than the user experience value, the advertising content is delivered, otherwise the product content is delivered. In addition, when delivering, constraints such as intervals are also considered as shown in the right figure.

His method is simple and direct, and many teams have achieved good results by using similar solutions. However, this solution only considers the value of a single item, but does not consider the mutual influence between items, and lacks consideration of long-term benefits.

Next, we will introduce a solution for a short video. They use reinforcement learning to mix and sort. This solution abstracts the information flow mixing problem into a sequence insertion problem, and abstracts the insertion of different ads in different slots into different actions, which are selected through reinforcement learning. When considering reward design, it combines advertising value (such as revenue, etc.) and user experience value (such as sliding and leaving). The two are balanced by adjusting hyperparameters.

However, this solution is highly dependent on engineering and the paper mainly uses offline testing, lacking online analysis. In addition, the model only considers single ad insertion, not multiple ads.

Specifically for the iteration of vivo information flow scenarios, the shuffling iteration includes three stages: fixed-bit shuffling, Qlearning shuffling, and deep solution space shuffling.

The overall idea is to accumulate samples and quickly explore benefits through a simple reinforcement learning solution in the Qlearning stage, and then upgrade to a deep learning solution.

2.3 Qlearning shuffling

The above is the basic process of reinforcement learning. The biggest feature of reinforcement learning is learning through interaction. The agent continuously learns knowledge and adapts to the environment based on the rewards or penalties it receives during interaction with the environment. State, reward, and action are the three most critical elements in reinforcement learning, which will be discussed in detail later.

What are the benefits of the Qlearning mixed arrangement mechanism of vivo information flow? First of all, it will take into account the full page benefits and long-term benefits, which meets the requirements of multiple brushing scenarios. In addition, the Qlearning model can run in small steps and quickly accumulate samples while quickly verifying the effect.

In the current overall system architecture, the mixed scheduling system is located after adx. After receiving the content queue and the advertising queue, the Qlearning model sends the weighting coefficient to adjust the weight of the advertisement, and after superimposing the business strategy, a fusion queue is generated. User behavior will also trigger the Qlearning model update.

The operating principle of the Qlearning model is shown in the figure. First, the qtable is initialized, then an action is selected, and the qtable is updated according to the reward obtained by the action. The loss function considers both short-term and long-term benefits.

In vivo's practice, in reward design, we comprehensively consider user experience indicators such as duration and advertising value, smooth the two, and weigh them through hyperparameters. In action design, the first phase uses a numerical method to generate advertising weighting coefficients, which act on the advertising ranking score and mix with the content side to achieve mixed ranking.

The state design includes four parts: user features, context features, content features, and advertising features. Statistical features and context features have a great impact on the Qlearning model.

In vivo information flow scenarios, Qlearning shuffling has achieved good results and has covered most scenarios.

2.4 Depth-position shuffling

Qlearning shuffling has some limitations:

Qtable has a simple structure and small information capacity.
The Qlearning model has limited features that can be used, and it is difficult to model detailed features such as behavior sequences.
Currently, Qlearning shuffling depends on upstream scoring. Fluctuations in upstream scoring will cause fluctuations in results.

In order to solve the problem of Qlearning , we developed deep positional shuffling. The shuffling mechanism was upgraded from the original numerical type to a positional shuffling that directly generates positions, and the model itself was upgraded from Qlearning to deep learning.

This brings 3 benefits :

Decoupling from upstream greatly improves mixing stability
Deep network, can accommodate large amount of information
Ability to consider item interactions between pages

Our overall model architecture is the mainstream model architecture in the industry, similar to the dual-tower DQN model architecture. The left tower mainly inputs some state information including user attributes, behaviors, etc., and the right tower inputs action information, which is the basic information of solution space arrangement.

It is worth mentioning that we will incorporate the solution of the previous brush into the current model as a feature.

The new solution space model has a larger action space and a higher ceiling. However, sparse actions are difficult to learn fully, which can easily lead to inaccurate predictions. To solve this problem, we added small-flow random experiments online to improve the hit rate of sparse actions and enrich sample diversity.

Sequence features are one of the most important features of the model and one of the important features of the reinforcement learning model to characterize the state. We have made some optimizations to the sequence. In the sequence attention module, in order to solve the matching degree between the user's historical interests and the ads to be inserted, we use transformers to characterize the user behavior sequence information; then, we use the ads to be inserted and the sequence attention operation to characterize the matching degree. In addition, in the sequence match module, we introduce prior information to generate strong cross features to supplement the attention; for the match weight, we extract information through CTR, hit, time weight, TF-IDF, etc.

3. App Store Mixing

3.1 Introduction to store mix

Next we introduce the app store mixing module.

The core problem of store mixing is to achieve mixed order of advertising queue and game queue. As shown in the figure, the order of advertising and game is defined differently, making it difficult to compare directly. In addition, the payback period of joint operation games is long, and LTV is difficult to estimate accurately. Even if all are sorted according to ecpm, it is difficult to guarantee the effect.

Let’s sort out the core challenges facing app stores:

There are many business parties involved, and it is necessary to achieve the overall best while satisfying the requirements of user experience, advertising, and games.
Mixed store layouts often have related demands such as maintaining quantity, but maintaining quantity cannot be linked to overall revenue. Pursuing overall revenue will inevitably change the results of maintaining quantity and create conflicts. How can we achieve overall optimization while maintaining quantity?
Unlike information flow, stores are high-cost consumption scenarios with sparse user behavior. Many users will only download once in a long period of time.
Game LTV estimation is a difficult problem in the industry. How to provide a certain margin of error for game LTV on the mixed scheduling side?

Back to the vivo App Store mixing, the overall iteration includes four stages: fixed-bit mixing, PID quantity preservation, constrained mixing, and refined diversion of mixing.

3.2 PID quality assurance

First, let's introduce the PID solution. PID originated from the field of automation. In the early stage, in order to meet the demands of the business side, we referred to the mainstream solutions in the industry and initially realized the mixed scheduling capability by ensuring the quantity of advertisements and games. However, the solution is relatively simple, and it is difficult to associate PID with the revenue target, making it difficult to achieve the optimal revenue.

3.3 Constrained Mixing

There is a certain degree of conflict between maintaining volume and maximizing profits. The biggest difficulty is how to achieve optimal comprehensive business profits while meeting the constraints of maintaining volume.

The vivo store mixed arrangement adopts the idea of traffic splitting and fine-tuning, and re-arranges after PID volume preservation, comprehensively considering the balance point of user experience, advertising revenue, and game value. In view of the conflict between re-arrangement and PID volume preservation, the re-arrangement is only effective for some positions, so that the revenue exploration can be carried out under some traffic, such as the first screen, while meeting the volume preservation demand.

At the rearrangement layer, we initially considered using the information flow shuffling solution and using reinforcement learning for shuffling. However, there are two problems :

The rearrangement is only effective for the first time, and lacks the state transfer of conventional reinforcement learning.
Compared with the information flow scenario, the store scenario involves more business parties. How to consider the trade-offs among user experience, advertising revenue, and game value is a more complicated issue.

In order to adapt to the characteristics of the store scene, we have made some adaptations and optimizations :

First, regarding loss. Different from traditional reinforcement learning, since store scene behaviors are sparse and only effective for the first screen, there is a lack of state transfer. We set gamma to 0, and the overall state becomes similar to supervised learning, which improves system stability.
In the design of rewards, we comprehensively considered multiple factors such as the entire page game revenue, advertising revenue, and user experience to achieve the best benefits.
In the previous phase of action design, a numerical solution was still used.

This version has achieved good results in the mixed display in the vivo store and has been fully released.

3.4 Mixed and refined traffic diversion

Based on the constrained rearrangement, we consider whether further optimization can be performed.

First, the reordered candidate set is generated by PID with guaranteed quantity, which is not the global optimal one.
Secondly, when the candidate set is all ads or all games, there is no effective space for the current reordering (this area accounts for more than half of the online space).

So how can we achieve optimal returns while ensuring quantity?

We began to try to fine-tune the traffic diversion of mixed scheduling, and removed the volume guarantee restrictions for some branches and relaxed the constraints, so that PID can focus on meeting business demands such as volume guarantee, and the model can focus on exploring better space.

In the current version, when a request comes in, we will determine whether it is high-quality traffic based on the traffic diversion module. For high-quality traffic, we will explore the benefits through the mixed arrangement model, and for low-quality traffic, we will use PID to maintain the volume, and merge the final results. In this way, the reordering strategy can take effect on all requests with partial traffic, and the overall volume maintenance is also within the normal range.

The diversion methods we have tried so far include commercial value diversion, game preference diversion, advertising space diversion, experience mechanism diversion, etc.

We have also made some iterations on the rearrangement model. The current rearrangement layer and numerical model have some problems:

Numerical shuffling depends on upstream scoring, and changes in upstream deviations affect the accuracy of the shuffling model.
The influence of listwise factors such as contextual information and position information is not considered.

To solve the problem:

We use a generative model instead of a numerical model to directly generate mixed results and decouple them from the upstream.
Drawing on the idea of context-dnn, we adopt a context-aware approach to incorporate contextual influences into the generation method and label design.

The benefits of this model in experimental traffic are more obvious than those of the original model, and it is not affected by upstream scoring and is more stable.

IV. Future Prospects

The outlook for the future includes four aspects:

Model optimization: Deeply optimize the mix and match, make the modeling more refined, incorporate more real-time feedback signals, improve the model effect, and make the modeling more personalized.
Cross-scenario linkage: Try cross-scenario linkage and mixed scheduling to achieve the best exchange ratio and the best in all scenarios.
Unified paradigm: A unified mixed paradigm for sequence generation and sequence evaluation is established in each scenario.
On-device mixing: Try on-device mixing to capture user interests more timely and improve user experience.

In the process of exploring heterogeneous mixed scheduling, vivo encountered many challenges and also achieved certain benefits.

Interested students are welcome to leave a message for discussion.

<<: Let’s talk about SwiftUI layout protocol

>>: iOS 16.2 quasi-official version released, karaoke function is here

User operation: How to perform user segmentation reasonably?

Technical Practice of Heterogeneous Hybrid Scheduling in vivo Internet

1. Background

2. Information Flow Mixing Practice

2.1 Introduction to information flow mixing

2.2 Industry Solution Research

2.3 Qlearning shuffling

2.4 Depth-position shuffling

3. App Store Mixing

3.1 Introduction to store mix

3.2 PID quality assurance

3.3 Constrained Mixing

3.4 Mixed and refined traffic diversion

IV. Future Prospects

User operation: How to perform user segmentation reasonably?

Mini Program Development Cost List, How Much Does It Cost to Develop a WeChat Mini Program?

WhatsApp will allow transfer of chat history between iOS and Android devices

Three modes of pair design

Community fission: 5 steps to teach you how to operate a good community

Definition of server hosting

How to formulate a promotion plan?

What creative elements are there in the hot copywriting for the Winter Solstice?

Peipei's Dad "Pepei's Dad's Thinking Forum"

Credit card interest suspension installment conditions

Recommend

Chrome update for iOS: Can replace Safari as the default web browser

Case! 3 major steps for APP to acquire new users

The latest Android app market contacts and event application materials (updated in June 2017)

3 questions you must know about APP promotion and operation

The latest news on Jilin City express delivery in 2022: When will it return to normal? Can you ship now?

Top 10 Smart Hardware Products of 2014 (Part 1): Stories

How long does it usually take for a submitted Baidu Smart Mini Program to be reviewed and approved?

If you find an illegal mini program, how do you report it?

Taking Jianshu as an example, let’s explore how UGC content products are cold started!

Do you really understand the hardships of operating new media in an enterprise?

I have compiled 15 online channels for event promotion. I recommend you to save them.

Short video information flow picture material optimization case

The secret of creating 100,000+ popular articles from the perspective of product thinking

Earn 10,000 yuan a day from illegal income? What are some ways to make money quickly in a day?

Yulin Gardening Mini Program investment price inquiry, how much is the Yulin Gardening Mini Program investment price?