Design and implementation of the 2022 Spring Festival Douyin video red envelope system

Design and implementation of the 2022 Spring Festival Douyin video red envelope system

What we did

Business Background

During the Spring Festival event, Douyin combines videos and Spring Festival red envelopes, and users can send blessings to fans and friends by shooting videos and sending red envelopes.

Business gameplay

The entire activity is divided into two modes: B2C and C2C. The following is a brief introduction to the processes of these two modes.

B2C Red Packet

In the B2C red envelope gameplay, users need to first come to Douyin or Douyin Lite to participate in the Spring Festival Red Envelope Rain event, and there is a certain probability of receiving red envelope subsidies in the Spring Festival Red Envelope Rain event. Users can jump directly to the camera page after receiving the subsidy, or jump to the camera page after shooting a video. On the camera page, users will see a red envelope widget after shooting a video, and can see that the subsidy has been issued in the widget. After the user selects the subsidy, click Next to complete the submission and the video red envelope will be issued.

Figure 1: Spring Festival Red Envelope Rain Event Figure 2: Red Envelope Subsidy Figure 3: Red Envelope Widget Figure 4: b2c Red Envelope Sending Tab Page

C2C Red Packet

In the C2C red envelope gameplay, the user shoots a video and clicks on the widget, fills in the amount and number of the red envelope, selects the scope of red envelope recipients, and clicks Send Red Envelope to pull up the cashier counter. After the user completes the payment, clicks Next to publish the video to complete the distribution of the C2C red envelope.

Figure 1 C2C red envelope sending tab page Figure 2 Payment interface Figure 3 Widget display after red envelope payment

Red Envelope Collection

The red envelope collection process for B2C and C2C is the same. When a user encounters a video with a video red envelope on Douyin, there is a button to receive the red envelope below the video. When the user clicks on the red envelope to receive it, the red envelope cover will pop up. The user can receive the red envelope after clicking on the red envelope cover to open the red envelope. After the red envelope is successfully received, the pop-up window of the red envelope result will be displayed. In the red envelope result, the user can see the amount received and jump to the red envelope details page. In the red envelope details page, you can see the luck of other users in receiving the same red envelope.

Figure 1 Red Envelope Video Figure 2 Red Envelope Cover Figure 3 Red Envelope Collection Result Figure 4 Red Envelope Collection Details

Some of the problems we encountered

Design of universal red envelope system

As mentioned above, this Spring Festival event needs to support both B2C and C2C red envelopes. These two types of red envelopes have some similarities and many differences. In common, they both include the two operations of issuing and receiving red envelopes. In different points, for example, B2C red envelopes need to be sent by using subsidies, while C2C red envelopes require users to complete payment. After receiving B2C red envelopes, users need to withdraw cash, while C2C red envelopes go directly to change after receiving. Therefore, it is necessary to design a universal red envelope system to support multiple red envelope types.

In addition, for the red envelope system itself, in addition to sending and receiving red envelopes, it also involves the query of some red envelope information and the advancement of various state machines. How to divide these functional modules is also a point that needs to be considered.

Processing of the issuance of large traffic subsidies

As mentioned earlier, the B2C red envelope gameplay will first distribute subsidies. During the Spring Festival activities, a large number of users will participate in each red envelope rain. If these traffic are directly sent to the database, a large amount of database resources will be required. However, database resources are very scarce during the Spring Festival. How to reduce this part of resource consumption is also an issue that needs to be considered.

Selection of red envelope collection plan

In the red envelope business, receiving is a high-frequency operation. When designing the receiving method, the business scenario needs to consider whether a red envelope will be received by multiple users at the same time. Multiple users receiving the same red envelope at the same time may cause hot account problems and become a bottleneck for system performance. There are also multiple solutions to solve the hot account problem. We need to select the appropriate solution based on the business scenario characteristics of video red envelopes.

Stability Disaster Recovery

This Spring Festival event includes two business processes, B2C and C2C, in which each business flow link relies on many downstream services and basic services. In such a large-scale event, if a black swan event occurs, how to quickly stop the loss and reduce the overall impact on the system is an issue that must be considered.

Fund security guarantee

During the Spring Festival, B2C will issue a large number of red envelope subsidies. If the subsidies are over-issued or there are problems with the write-off of subsidies, a subsidy will be written off multiple times, which will cause a large amount of capital losses. In addition, C2C also involves the inflow and outflow of users' funds. If users find that the money has become less after receiving the red envelope, it may also cause a large number of customer complaints and capital losses. Therefore, adequate preparations need to be made for the security of funds.

Stress testing of the red envelope system

In the traditional stress testing method, we usually perform stress testing on a certain high-traffic interface to obtain the bottleneck of the system. However, in the red envelope system, users' sending, receiving and checking are all carried out at the same time, and these interfaces are also interdependent. For example, you need to send red envelopes first, and then trigger multiple people to receive them. Only after receiving them can you view the details of receiving them. If you use the traditional single-interface stress testing method, first of all, mock data will be very difficult, and the stress testing data corresponding to payment needs to be specially generated because it involves real names. In addition, it is difficult to obtain the real bottleneck of the system through stress testing of a single interface. Therefore, how to perform full-link stress testing on the system to obtain the accurate bottleneck of the system is also a problem we need to solve.

How we do it

Design of universal red envelope system

For the red envelope system, the core operations include sending, receiving, and refunding unreceived red envelopes. In addition, we will also need to check some red envelope information and receipt information. At the same time, for the three core operations of sending, receiving, and refunding, we need to maintain their status. At the same time, in our business scenario, there is also the issuance of subsidies unique to B2C, and we also need to maintain the status of subsidies.

After the preliminary introduction to the red envelope system above, we can see several functional modules of red envelopes, including issuance, collection, refund, subsidy issuance, and various information inquiries, as well as state machine maintenance. After sorting out the functions of red envelopes, we began to divide the modules of red envelopes.

Division Principles

  • Functional cohesion, each system only handles one task, which facilitates subsequent system development and iteration, as well as troubleshooting
  • The API gateway layer only performs simple proxy processing
  • Asynchronous task decomposition
  • Separate read and write, split red envelope core operations and red envelope queries into two services

Divide modules

Red Packet Gateway Service

  • HTTP API gateway, connecting to clients and h5 externally, encapsulating various system rpc interfaces internally, with functions such as current limiting, permission control, and downgrade

Red Envelope Core Services

  • Mainly carries the core functions of red envelopes, including the issuance, receipt, refund of red envelopes, and the issuance of red envelope subsidies, maintains the red envelope state machine, and promotes the status of red envelopes

Red Envelope Query Service

  • Mainly carries the red envelope query function, including red envelope details, red envelope sending status, red envelope receiving status, red envelope receiving details, and red envelope subsidy information

Red Packet Asynchronous Service

  • Mainly carries out red envelope asynchronous tasks to ensure the flow of the state machine, including red envelope transfer, red envelope refund, and red envelope subsidy status advancement

Red Envelope Basic Services

  • It mainly carries the public calls of various red envelope systems, such as operations on DB, redis, tcc, public constants and tool classes, which is equivalent to the basic toolkit of red envelope.

Red Envelope Reconciliation Service

  • Mainly carries the reconciliation logic of red envelopes and finance, and reconciles with finance on a daily basis

Overall architecture

Finally, the system architecture of the entire video red envelope is shown in the figure

Processing of the issuance of large traffic subsidies

Synchronous reward distribution

In the red envelope subsidy distribution process, in order to cope with the large traffic during the Spring Festival, the entire process has undergone several iterations of solutions.

In the initial solution design, we handled it according to the synchronous subsidy issuance process. The upstream link called the red envelope system interface to issue coupons. After the coupons were successfully issued, the user perceived that the coupons were successfully issued and could use the coupons to issue red envelopes. The overall process of the initial solution is as follows:

One problem with the above solution is that during the Spring Festival event, the entire link needs to be able to withstand the total traffic during the event, and ultimately the traffic will hit the database, and database resources are relatively scarce during the Spring Festival.

Asynchronous reward distribution

In order to solve the problem of synchronous reward issuance, the overall process is changed to peak shaving through MQ, thereby reducing the downstream traffic pressure, which is equivalent to changing from synchronous to asynchronous. After the user participates in the activity, an encrypted Token will be issued to the client for client display and interaction with the server. The asynchronous coupon issuance plan for the activity is shown in the figure below.

This solves the problem of large traffic, but introduces other problems accordingly. In the initial plan, the user's red envelope subsidies will first be stored in the red envelope system. We can find the corresponding records in the red envelope database for subsequent user inquiries and verifications of subsidies. However, in the asynchronous mode, it is estimated that it will take 10 minutes for the entire subsidy to be recorded. After the user perceives the issuance of coupons on the APP interface, he may immediately start using the subsidy to distribute video red envelopes, or go to the red envelope widget to check the red envelope subsidies he has received. At this time, the subsidy has not yet been recorded in the red envelope system.

Final Solution

In order to solve the above problems, we have modified the entire logic of red envelope subsidy video red envelope issuance and red envelope subsidy query. When users use red envelope subsidies to issue video red envelopes, we will first perform a storage operation on the subsidy. Only after the storage is successful can this subsidy be used to issue red envelopes. In addition, for the query interface, we cannot perceive whether all subsidies are fully accounted for. Therefore, every time we query, we need to query the full token list at the reward issuing end. At the same time, we also need to query the user's subsidies in the database and merge these two parts of data to get the full subsidy list.

In the above process, in order to solve the problem of MQ asynchronous delay, we actively record when the user makes a request. The user's active operations include using subsidies to issue red envelopes and querying subsidies. Why do we only record when the subsidy red envelope is issued but not when the subsidy is queried? Because the user's query behavior is a high-frequency behavior and involves batch operations. Before operating the DB, we cannot perceive whether the subsidy has been recorded, so it will involve batch processing of the DB. Even every time the user queries, we need to repeat this operation, which will lead to a large waste of DB resources. However, recording when the subsidy is issued is a low-frequency, single subsidy operation. We only need to record it when the user verifies it, which can greatly reduce the pressure on the database and save database resources.

Selection of red envelope collection plan

In the technical solution for receiving video red envelopes, we also have some options and thoughts, which we would like to share with you here.

Pessimistic locking scheme

Solution 1 is also the most common idea. When the user receives the red envelope, the database is locked, the amount is deducted, and then the lock is released to complete the entire red envelope collection. The advantage of this solution is that it is clear and straightforward, but the problem with this solution is that when multiple users come to collect red envelopes at the same time, it will cause conflicts in database row locks and require waiting in line. When there are too many queued requests, it will cause a waste of database links and affect the performance of the overall system. At the same time, if no feedback is received from the upstream for a long time, it will cause a timeout. The user side may keep retrying, causing the overall database link to be exhausted, resulting in a system crash.

Red Envelope Pre-splitting Plan

The problem with solution 1 is that multiple users claiming the red envelopes at the same time will cause lock conflicts. To resolve lock conflicts, the locks can be split into finer granularity to increase the concurrency of a single red envelope. The specific solution is as follows:

In Solution 2, the process of sending red envelopes was changed. When sending red envelopes, the red envelopes will be pre-split and split into multiple red envelopes. In this way, the lock granularity is refined. When users receive red envelopes, the previous competition for a single red envelope lock is changed to the current allocation of multiple red envelope locks. Therefore, when receiving red envelopes, the problem becomes how to allocate red envelopes to users. A common idea is to generate a serial number through the self-increment method of redis when the user requests to receive a red envelope. The serial number corresponds to the red envelope that should be received. However, this method is highly dependent on redis. When the redis network jitters or the redis service is abnormal, it is necessary to downgrade to query the DB for red envelopes that have not been received to obtain the serial number, and the overall implementation is relatively complicated.

Final Solution

In the scenario of video red envelopes, the entire business process is that users shoot videos and send red envelopes, and then the red envelopes are triggered when the video is swiped in the video recommendation feed stream. Compared with group chat scenarios such as WeChat and Feishu, the number of concurrent red envelopes in video red envelopes is not very high, because the user's video swiping operation and the feed stream itself have completed the traffic dispersion, so for video red envelopes, the number of concurrent red envelopes is not very high. From a business perspective, in terms of demand realization, we need to be able to obtain the number of unclaimed red envelopes after the user has completed the red envelope collection and send it to the user for display. Solution 1 is very convenient to obtain the red envelope inventory, while Solution 2 is more troublesome to obtain the inventory. In addition, from the perspective of system development complexity and disaster recovery, Solution 1 is relatively a more suitable choice. However, we need to deal with the risks in Solution 1. We need other ways to protect DB resources and minimize lock conflicts. The specific solutions are as follows:

Red Packet Redis Current Limitation

  • In order to minimize DB lock conflicts, we will first limit the flow according to the red envelope order number, and allow the number of remaining red envelopes * 1.5 requests to pass each time. If the flow is limited, a special error code will be returned, and the front end will be trained up to 10 times. In the case of too many requests, this method will be used to slowly process them.

Memory Queuing

  • In addition to redis current limiting, in order to reduce DB locks, we add a red envelope memory lock in the red envelope collection process. For a single red envelope, only requests that obtain the memory lock can continue to request the DB, thereby migrating the DB lock conflict to the memory for early processing. Memory resources are very cheap compared to DB resources. When the request volume is too large, we can expand horizontally.
  • In order to implement memory locks, we made several changes. First, we need to ensure that the same red envelope request can be sent to the same tce instance. Here we adjusted the gateway layer routing. When the gateway layer calls the downstream service, it will use the routing strategy according to the red envelope number to ensure that requests with the same number are sent to the same instance. In addition, we implemented a set of memory locks based on channels in the core service of the red envelope system, and the memory lock corresponding to the red envelope will be released after the red envelope is received. In addition, in order to prevent the lock from occupying too much memory or not being released in time, we set up a timer task to handle it regularly.

Asynchronous transfer

  • From the perspective of interface time consumption, transfer is a time-consuming operation. It involves interaction with third-party payment institutions, cross-computer room requests, and long response delays. Asynchronizing transfers can reduce the delay of the red envelope receiving interface and improve service performance and user experience.
  • In addition, from the user's perspective, users are more concerned about whether the red envelope is received successfully after clicking on it. As for whether the balance is synchronized to the account, users are not so sensitive. In addition, the transfer itself has a process from transfer to successful transfer. Asynchronous transfer has basically no impact on user perception.

Stability Disaster Recovery

The disaster recovery of the entire red envelope system is mainly carried out through interface current limiting, service degradation and multiple mechanisms to ensure the advancement of the state machine. These methods are introduced below:

Interface current limiting

Interface current limiting is a common disaster recovery method used to protect the system from processing requests within the acceptable range and prevent excessive external requests from crashing the system. Before implementing interface current limiting, we first need to communicate with upstream and downstream and products to obtain an estimated amount of red envelopes issued and received, and then sort out the overall traffic of the entire link in modules based on the amount of issuance and receipt. The following is a B2C full-link request volume we sorted out at that time.

After obtaining the request volume of each module, you can summarize the traffic requests of each interface, each service of the red envelope system, and each service that the downstream depends on. At this time, it will be more convenient to limit the flow.

Service degradation

Core dependency downgrade

During the Spring Festival event, the entire link of the red envelope system depends on many services. These downstream link dependencies can be divided into core dependencies and non-core dependencies. When the downstream core service is abnormal, a certain link may be unavailable. At this time, you can directly downgrade and return a more friendly text prompt at the API layer, and then release it after the downstream service is restored. For example, in the C2C red envelope sending process, users need to complete the payment before they can send red envelopes. If the financial payment process is abnormal or the payment success status is not completed for a long time, it will cause the red envelope to fail to send after the user pays, and it will also cause the front end to continuously query the red envelope status, resulting in a sharp increase in the number of requests, causing service pressure, and even affecting the red envelope issuance and query of B2C. At this time, the red envelope issuance of C2C can be downgraded and returned by downgrading the interface to reduce service pressure and reduce the impact on other business logic.

Non-core dependency downgrade

In addition to core dependencies, the red envelope system has some non-core downstream dependencies. For these dependencies, if the service is abnormal, we can reduce some of the user experience to ensure the availability of the service. For example, as we mentioned in 4.2, users need to obtain all available red envelope subsidies before sending B2C red envelopes. We will go to the reward distribution end to query all token lists, then query our own DB, and then merge and return. If the interface for obtaining the token list is abnormal, we can downgrade and only return the subsidy data in our own DB. This ensures that users can still issue red envelopes in this case, affecting only the display of part of the subsidy, rather than affecting the entire red envelope sending link.

Multiple mechanisms ensure the advancement of the state machine

In the red envelope system, if an order has not reached the final state for a long time, such as the red envelope has not been received for a long time after the user received it, or the user has not been refunded for a long time after the C2C red envelope has not been received, it may cause customer complaints. Therefore, it is necessary to ensure that the status of each order in the system can be pushed to the final state in a timely and accurate manner.

Here we have several ways to ensure this. The first is callback. After the order of the dependent system is processed, it will promptly notify the red envelope system. This method is also the most timely method. However, relying solely on callbacks may cause the dependent party to be abnormal or the network jitter to cause the callback to be lost. At this time, we will send an mq to the red envelope system at each stage of the red envelope, and consume the mq at a certain interval to actively query the order status of the dependent party for update. Finally, we will have a scheduled task for each state machine to use as a backup. If the scheduled task is executed multiple times and still has not reached the final state, lark will notify you, and timely manual intervention will be made to find problems.

Fund security guarantee

Transaction idempotence

In programming, idempotency means that the impact of executing a request any number of times is the same as the impact of executing it once. In fund security, using order numbers to perform corresponding idempotent logic processing can prevent asset losses. Specifically, in the red envelope system, in the issuance, receipt and refund of red envelopes, we use the unique key of the order number to ensure the idempotency of the interface. In addition, the subsidy issuance interface of the red envelope system is idempotent. If the same order number is used multiple times to request subsidy issuance, we need to ensure that only one coupon will be issued.

There are many solutions to achieve idempotency, including through databases or redis. The most reliable way is to achieve it through the unique key conflict of the database, but this method will introduce some additional problems when there are sharded instances in the database. Here we will briefly introduce the issuance of subsidies. In the design of the business system, we established the database table of the business according to the uid sharding method, which resulted in the sharding key of the subsidy being uid, although we also set the subsidy order number of the red envelope as the unique key. However, there is a risk that if the upstream system calls for subsidy issuance, the same external order number changes the uid, which may cause two requests to be hit on different database instances, resulting in the failure of the unique index and asset loss. In order to solve this problem, we have introduced an additional database with the subsidy issuance external order number as the sharding key to solve this risk.

B2C Red Packet Verification

In addition to taking appropriate financial security considerations into account when designing the system during the development process, we also need to verify whether there are any financial security issues in our system through reconciliation.

In the B2C link, the entire link is mainly from subsidy distribution to red envelope collection. We perform corresponding hourly hive reconciliation on the upstream and downstream data of these links.

C2C Red Packet Verification

In the C2C link, the entire process starts from the user initiating payment, to the user receiving the transfer, and finally the red envelope expires and is refunded. The three processes of payment, transfer, and refund need to be checked accordingly. At the same time, it is also necessary to ensure that the user's red envelope issuance amount is greater than or equal to the red envelope transfer amount + the red envelope refund amount. The greater than or equal here is because the entire cycle from the successful issuance of the red envelope to the successful refund will be more than 24 hours. In addition, there may be orders where the transfer is in transit, resulting in multiple refund orders. If the requirement is strictly equal, the specific reconciliation timing cannot be controlled.

Stress testing of the red envelope system

As mentioned earlier, the link of the red envelope system includes multiple interfaces, such as sending, receiving, and checking. It is necessary to simulate the real behavior of users to perform stress testing in order to obtain the real performance of the system. Here we use the script stress testing method of the stress testing platform to perform stress testing.

First, the entire stress test link needs to be reconstructed, and the upstream and downstream need to communicate whether stress testing is possible. If stress testing is not possible, corresponding mock processing needs to be performed. In addition, for storage services, databases, redis and mq, the correct transmission of stress test targets must be ensured, otherwise it may affect the online operation.

After the stress testing link is transformed, you need to construct the corresponding stress testing script, which is divided into two scripts for B2C and C2C.

B2C Red Packet Link Stress Test

The above is the entire link of B2C stress testing. First, subsidies are issued, then subsidies are queried and red envelopes are issued based on the subsidies. In order to simulate the situation where multiple people come to collect the red envelopes, we started multiple goroutinues to collect the red envelopes concurrently.

C2C Red Packet Link Stress Test

Because C2C red envelopes involve payment-related operations, the entire link is another set of processes, so a separate script is also required for C2C. In the stress testing process, because it involves external system dependencies, if you wait for the entire link to be OK before doing a stress test together, some unknown problems may occur. Therefore, we need to start stress testing the entire link after our own stress testing is OK. In the figure, we have added corresponding mock switches to the blue modules related to payment to control the results of the stress testing. When the mock switch is turned on, a result will be directly constructed and returned. When the mock switch is turned off, it will normally request the finance department to obtain the result.

Subsequent planning

Service Set

In the system disaster recovery mentioned above, if the red envelope core service is changed, or the database DB main machine room crashes, it will affect all users. At this time, it can only be downgraded and returned, and the entire system cannot be quickly switched and restored. Later, we will consider changing the service to a set architecture. Divide the service server and the corresponding storage into a separate Set. Each Set only handles the traffic within the corresponding divided unit. At the same time, traffic splitting and fault isolation are realized between multiple units, as well as data backup between Sets. In this way, when a unit is abnormal later, the traffic of the corresponding unit can be promptly cut to the backup unit.

<<:  iOS 16 new feature: You can prove to websites that you are not a robot

>>:  Research on audio and video playback technology in Android system

Recommend

How to accurately position online education product operations?

The author currently works in an online education...

How far away from the Milky Way disk can we see the entire Milky Way?

A friend asked: If the solar system moves slightl...

How does Tik Tok make you addicted?

The world of short videos has never been peaceful...

To do communication is to help users show off!

Are young people increasingly forming smaller cir...

Classic SEO case, classic SEO case for fast ranking of old website!

I believe everyone is familiar with quick ranking...

Xiaolu Emotional Love Army "Love Finger Male Needle"

Baidu network disk download location: i1-72-Xiaol...