What we didBusiness BackgroundDuring the Spring Festival event, Douyin combines videos and Spring Festival red envelopes, and users can send blessings to fans and friends by shooting videos and sending red envelopes. Business gameplayThe entire activity is divided into two modes: B2C and C2C. The following is a brief introduction to the processes of these two modes. B2C Red PacketIn the B2C red envelope gameplay, users need to first come to Douyin or Douyin Lite to participate in the Spring Festival Red Envelope Rain event, and there is a certain probability of receiving red envelope subsidies in the Spring Festival Red Envelope Rain event. Users can jump directly to the camera page after receiving the subsidy, or jump to the camera page after shooting a video. On the camera page, users will see a red envelope widget after shooting a video, and can see that the subsidy has been issued in the widget. After the user selects the subsidy, click Next to complete the submission and the video red envelope will be issued. Figure 1: Spring Festival Red Envelope Rain Event Figure 2: Red Envelope Subsidy Figure 3: Red Envelope Widget Figure 4: b2c Red Envelope Sending Tab Page C2C Red PacketIn the C2C red envelope gameplay, the user shoots a video and clicks on the widget, fills in the amount and number of the red envelope, selects the scope of red envelope recipients, and clicks Send Red Envelope to pull up the cashier counter. After the user completes the payment, clicks Next to publish the video to complete the distribution of the C2C red envelope. Figure 1 C2C red envelope sending tab page Figure 2 Payment interface Figure 3 Widget display after red envelope payment Red Envelope CollectionThe red envelope collection process for B2C and C2C is the same. When a user encounters a video with a video red envelope on Douyin, there is a button to receive the red envelope below the video. When the user clicks on the red envelope to receive it, the red envelope cover will pop up. The user can receive the red envelope after clicking on the red envelope cover to open the red envelope. After the red envelope is successfully received, the pop-up window of the red envelope result will be displayed. In the red envelope result, the user can see the amount received and jump to the red envelope details page. In the red envelope details page, you can see the luck of other users in receiving the same red envelope. Figure 1 Red Envelope Video Figure 2 Red Envelope Cover Figure 3 Red Envelope Collection Result Figure 4 Red Envelope Collection Details Some of the problems we encounteredDesign of universal red envelope systemAs mentioned above, this Spring Festival event needs to support both B2C and C2C red envelopes. These two types of red envelopes have some similarities and many differences. In common, they both include the two operations of issuing and receiving red envelopes. In different points, for example, B2C red envelopes need to be sent by using subsidies, while C2C red envelopes require users to complete payment. After receiving B2C red envelopes, users need to withdraw cash, while C2C red envelopes go directly to change after receiving. Therefore, it is necessary to design a universal red envelope system to support multiple red envelope types. In addition, for the red envelope system itself, in addition to sending and receiving red envelopes, it also involves the query of some red envelope information and the advancement of various state machines. How to divide these functional modules is also a point that needs to be considered. Processing of the issuance of large traffic subsidiesAs mentioned earlier, the B2C red envelope gameplay will first distribute subsidies. During the Spring Festival activities, a large number of users will participate in each red envelope rain. If these traffic are directly sent to the database, a large amount of database resources will be required. However, database resources are very scarce during the Spring Festival. How to reduce this part of resource consumption is also an issue that needs to be considered. Selection of red envelope collection planIn the red envelope business, receiving is a high-frequency operation. When designing the receiving method, the business scenario needs to consider whether a red envelope will be received by multiple users at the same time. Multiple users receiving the same red envelope at the same time may cause hot account problems and become a bottleneck for system performance. There are also multiple solutions to solve the hot account problem. We need to select the appropriate solution based on the business scenario characteristics of video red envelopes. Stability Disaster RecoveryThis Spring Festival event includes two business processes, B2C and C2C, in which each business flow link relies on many downstream services and basic services. In such a large-scale event, if a black swan event occurs, how to quickly stop the loss and reduce the overall impact on the system is an issue that must be considered. Fund security guaranteeDuring the Spring Festival, B2C will issue a large number of red envelope subsidies. If the subsidies are over-issued or there are problems with the write-off of subsidies, a subsidy will be written off multiple times, which will cause a large amount of capital losses. In addition, C2C also involves the inflow and outflow of users' funds. If users find that the money has become less after receiving the red envelope, it may also cause a large number of customer complaints and capital losses. Therefore, adequate preparations need to be made for the security of funds. Stress testing of the red envelope systemIn the traditional stress testing method, we usually perform stress testing on a certain high-traffic interface to obtain the bottleneck of the system. However, in the red envelope system, users' sending, receiving and checking are all carried out at the same time, and these interfaces are also interdependent. For example, you need to send red envelopes first, and then trigger multiple people to receive them. Only after receiving them can you view the details of receiving them. If you use the traditional single-interface stress testing method, first of all, mock data will be very difficult, and the stress testing data corresponding to payment needs to be specially generated because it involves real names. In addition, it is difficult to obtain the real bottleneck of the system through stress testing of a single interface. Therefore, how to perform full-link stress testing on the system to obtain the accurate bottleneck of the system is also a problem we need to solve. How we do itDesign of universal red envelope systemFor the red envelope system, the core operations include sending, receiving, and refunding unreceived red envelopes. In addition, we will also need to check some red envelope information and receipt information. At the same time, for the three core operations of sending, receiving, and refunding, we need to maintain their status. At the same time, in our business scenario, there is also the issuance of subsidies unique to B2C, and we also need to maintain the status of subsidies. After the preliminary introduction to the red envelope system above, we can see several functional modules of red envelopes, including issuance, collection, refund, subsidy issuance, and various information inquiries, as well as state machine maintenance. After sorting out the functions of red envelopes, we began to divide the modules of red envelopes. Division Principles
Divide modulesRed Packet Gateway Service
Red Envelope Core Services
Red Envelope Query Service
Red Packet Asynchronous Service
Red Envelope Basic Services
Red Envelope Reconciliation Service
Overall architectureFinally, the system architecture of the entire video red envelope is shown in the figure Processing of the issuance of large traffic subsidiesSynchronous reward distributionIn the red envelope subsidy distribution process, in order to cope with the large traffic during the Spring Festival, the entire process has undergone several iterations of solutions. In the initial solution design, we handled it according to the synchronous subsidy issuance process. The upstream link called the red envelope system interface to issue coupons. After the coupons were successfully issued, the user perceived that the coupons were successfully issued and could use the coupons to issue red envelopes. The overall process of the initial solution is as follows: One problem with the above solution is that during the Spring Festival event, the entire link needs to be able to withstand the total traffic during the event, and ultimately the traffic will hit the database, and database resources are relatively scarce during the Spring Festival. Asynchronous reward distributionIn order to solve the problem of synchronous reward issuance, the overall process is changed to peak shaving through MQ, thereby reducing the downstream traffic pressure, which is equivalent to changing from synchronous to asynchronous. After the user participates in the activity, an encrypted Token will be issued to the client for client display and interaction with the server. The asynchronous coupon issuance plan for the activity is shown in the figure below. This solves the problem of large traffic, but introduces other problems accordingly. In the initial plan, the user's red envelope subsidies will first be stored in the red envelope system. We can find the corresponding records in the red envelope database for subsequent user inquiries and verifications of subsidies. However, in the asynchronous mode, it is estimated that it will take 10 minutes for the entire subsidy to be recorded. After the user perceives the issuance of coupons on the APP interface, he may immediately start using the subsidy to distribute video red envelopes, or go to the red envelope widget to check the red envelope subsidies he has received. At this time, the subsidy has not yet been recorded in the red envelope system. Final SolutionIn order to solve the above problems, we have modified the entire logic of red envelope subsidy video red envelope issuance and red envelope subsidy query. When users use red envelope subsidies to issue video red envelopes, we will first perform a storage operation on the subsidy. Only after the storage is successful can this subsidy be used to issue red envelopes. In addition, for the query interface, we cannot perceive whether all subsidies are fully accounted for. Therefore, every time we query, we need to query the full token list at the reward issuing end. At the same time, we also need to query the user's subsidies in the database and merge these two parts of data to get the full subsidy list. In the above process, in order to solve the problem of MQ asynchronous delay, we actively record when the user makes a request. The user's active operations include using subsidies to issue red envelopes and querying subsidies. Why do we only record when the subsidy red envelope is issued but not when the subsidy is queried? Because the user's query behavior is a high-frequency behavior and involves batch operations. Before operating the DB, we cannot perceive whether the subsidy has been recorded, so it will involve batch processing of the DB. Even every time the user queries, we need to repeat this operation, which will lead to a large waste of DB resources. However, recording when the subsidy is issued is a low-frequency, single subsidy operation. We only need to record it when the user verifies it, which can greatly reduce the pressure on the database and save database resources. Selection of red envelope collection planIn the technical solution for receiving video red envelopes, we also have some options and thoughts, which we would like to share with you here. Pessimistic locking schemeSolution 1 is also the most common idea. When the user receives the red envelope, the database is locked, the amount is deducted, and then the lock is released to complete the entire red envelope collection. The advantage of this solution is that it is clear and straightforward, but the problem with this solution is that when multiple users come to collect red envelopes at the same time, it will cause conflicts in database row locks and require waiting in line. When there are too many queued requests, it will cause a waste of database links and affect the performance of the overall system. At the same time, if no feedback is received from the upstream for a long time, it will cause a timeout. The user side may keep retrying, causing the overall database link to be exhausted, resulting in a system crash. Red Envelope Pre-splitting PlanThe problem with solution 1 is that multiple users claiming the red envelopes at the same time will cause lock conflicts. To resolve lock conflicts, the locks can be split into finer granularity to increase the concurrency of a single red envelope. The specific solution is as follows: In Solution 2, the process of sending red envelopes was changed. When sending red envelopes, the red envelopes will be pre-split and split into multiple red envelopes. In this way, the lock granularity is refined. When users receive red envelopes, the previous competition for a single red envelope lock is changed to the current allocation of multiple red envelope locks. Therefore, when receiving red envelopes, the problem becomes how to allocate red envelopes to users. A common idea is to generate a serial number through the self-increment method of redis when the user requests to receive a red envelope. The serial number corresponds to the red envelope that should be received. However, this method is highly dependent on redis. When the redis network jitters or the redis service is abnormal, it is necessary to downgrade to query the DB for red envelopes that have not been received to obtain the serial number, and the overall implementation is relatively complicated. Final SolutionIn the scenario of video red envelopes, the entire business process is that users shoot videos and send red envelopes, and then the red envelopes are triggered when the video is swiped in the video recommendation feed stream. Compared with group chat scenarios such as WeChat and Feishu, the number of concurrent red envelopes in video red envelopes is not very high, because the user's video swiping operation and the feed stream itself have completed the traffic dispersion, so for video red envelopes, the number of concurrent red envelopes is not very high. From a business perspective, in terms of demand realization, we need to be able to obtain the number of unclaimed red envelopes after the user has completed the red envelope collection and send it to the user for display. Solution 1 is very convenient to obtain the red envelope inventory, while Solution 2 is more troublesome to obtain the inventory. In addition, from the perspective of system development complexity and disaster recovery, Solution 1 is relatively a more suitable choice. However, we need to deal with the risks in Solution 1. We need other ways to protect DB resources and minimize lock conflicts. The specific solutions are as follows: Red Packet Redis Current Limitation
Memory Queuing
Asynchronous transfer
Stability Disaster RecoveryThe disaster recovery of the entire red envelope system is mainly carried out through interface current limiting, service degradation and multiple mechanisms to ensure the advancement of the state machine. These methods are introduced below: Interface current limitingInterface current limiting is a common disaster recovery method used to protect the system from processing requests within the acceptable range and prevent excessive external requests from crashing the system. Before implementing interface current limiting, we first need to communicate with upstream and downstream and products to obtain an estimated amount of red envelopes issued and received, and then sort out the overall traffic of the entire link in modules based on the amount of issuance and receipt. The following is a B2C full-link request volume we sorted out at that time. After obtaining the request volume of each module, you can summarize the traffic requests of each interface, each service of the red envelope system, and each service that the downstream depends on. At this time, it will be more convenient to limit the flow. Service degradationCore dependency downgrade During the Spring Festival event, the entire link of the red envelope system depends on many services. These downstream link dependencies can be divided into core dependencies and non-core dependencies. When the downstream core service is abnormal, a certain link may be unavailable. At this time, you can directly downgrade and return a more friendly text prompt at the API layer, and then release it after the downstream service is restored. For example, in the C2C red envelope sending process, users need to complete the payment before they can send red envelopes. If the financial payment process is abnormal or the payment success status is not completed for a long time, it will cause the red envelope to fail to send after the user pays, and it will also cause the front end to continuously query the red envelope status, resulting in a sharp increase in the number of requests, causing service pressure, and even affecting the red envelope issuance and query of B2C. At this time, the red envelope issuance of C2C can be downgraded and returned by downgrading the interface to reduce service pressure and reduce the impact on other business logic. Non-core dependency downgrade In addition to core dependencies, the red envelope system has some non-core downstream dependencies. For these dependencies, if the service is abnormal, we can reduce some of the user experience to ensure the availability of the service. For example, as we mentioned in 4.2, users need to obtain all available red envelope subsidies before sending B2C red envelopes. We will go to the reward distribution end to query all token lists, then query our own DB, and then merge and return. If the interface for obtaining the token list is abnormal, we can downgrade and only return the subsidy data in our own DB. This ensures that users can still issue red envelopes in this case, affecting only the display of part of the subsidy, rather than affecting the entire red envelope sending link. Multiple mechanisms ensure the advancement of the state machineIn the red envelope system, if an order has not reached the final state for a long time, such as the red envelope has not been received for a long time after the user received it, or the user has not been refunded for a long time after the C2C red envelope has not been received, it may cause customer complaints. Therefore, it is necessary to ensure that the status of each order in the system can be pushed to the final state in a timely and accurate manner. Here we have several ways to ensure this. The first is callback. After the order of the dependent system is processed, it will promptly notify the red envelope system. This method is also the most timely method. However, relying solely on callbacks may cause the dependent party to be abnormal or the network jitter to cause the callback to be lost. At this time, we will send an mq to the red envelope system at each stage of the red envelope, and consume the mq at a certain interval to actively query the order status of the dependent party for update. Finally, we will have a scheduled task for each state machine to use as a backup. If the scheduled task is executed multiple times and still has not reached the final state, lark will notify you, and timely manual intervention will be made to find problems. Fund security guaranteeTransaction idempotenceIn programming, idempotency means that the impact of executing a request any number of times is the same as the impact of executing it once. In fund security, using order numbers to perform corresponding idempotent logic processing can prevent asset losses. Specifically, in the red envelope system, in the issuance, receipt and refund of red envelopes, we use the unique key of the order number to ensure the idempotency of the interface. In addition, the subsidy issuance interface of the red envelope system is idempotent. If the same order number is used multiple times to request subsidy issuance, we need to ensure that only one coupon will be issued. There are many solutions to achieve idempotency, including through databases or redis. The most reliable way is to achieve it through the unique key conflict of the database, but this method will introduce some additional problems when there are sharded instances in the database. Here we will briefly introduce the issuance of subsidies. In the design of the business system, we established the database table of the business according to the uid sharding method, which resulted in the sharding key of the subsidy being uid, although we also set the subsidy order number of the red envelope as the unique key. However, there is a risk that if the upstream system calls for subsidy issuance, the same external order number changes the uid, which may cause two requests to be hit on different database instances, resulting in the failure of the unique index and asset loss. In order to solve this problem, we have introduced an additional database with the subsidy issuance external order number as the sharding key to solve this risk. B2C Red Packet VerificationIn addition to taking appropriate financial security considerations into account when designing the system during the development process, we also need to verify whether there are any financial security issues in our system through reconciliation. In the B2C link, the entire link is mainly from subsidy distribution to red envelope collection. We perform corresponding hourly hive reconciliation on the upstream and downstream data of these links. C2C Red Packet VerificationIn the C2C link, the entire process starts from the user initiating payment, to the user receiving the transfer, and finally the red envelope expires and is refunded. The three processes of payment, transfer, and refund need to be checked accordingly. At the same time, it is also necessary to ensure that the user's red envelope issuance amount is greater than or equal to the red envelope transfer amount + the red envelope refund amount. The greater than or equal here is because the entire cycle from the successful issuance of the red envelope to the successful refund will be more than 24 hours. In addition, there may be orders where the transfer is in transit, resulting in multiple refund orders. If the requirement is strictly equal, the specific reconciliation timing cannot be controlled. Stress testing of the red envelope systemAs mentioned earlier, the link of the red envelope system includes multiple interfaces, such as sending, receiving, and checking. It is necessary to simulate the real behavior of users to perform stress testing in order to obtain the real performance of the system. Here we use the script stress testing method of the stress testing platform to perform stress testing. First, the entire stress test link needs to be reconstructed, and the upstream and downstream need to communicate whether stress testing is possible. If stress testing is not possible, corresponding mock processing needs to be performed. In addition, for storage services, databases, redis and mq, the correct transmission of stress test targets must be ensured, otherwise it may affect the online operation. After the stress testing link is transformed, you need to construct the corresponding stress testing script, which is divided into two scripts for B2C and C2C. B2C Red Packet Link Stress TestThe above is the entire link of B2C stress testing. First, subsidies are issued, then subsidies are queried and red envelopes are issued based on the subsidies. In order to simulate the situation where multiple people come to collect the red envelopes, we started multiple goroutinues to collect the red envelopes concurrently. C2C Red Packet Link Stress TestBecause C2C red envelopes involve payment-related operations, the entire link is another set of processes, so a separate script is also required for C2C. In the stress testing process, because it involves external system dependencies, if you wait for the entire link to be OK before doing a stress test together, some unknown problems may occur. Therefore, we need to start stress testing the entire link after our own stress testing is OK. In the figure, we have added corresponding mock switches to the blue modules related to payment to control the results of the stress testing. When the mock switch is turned on, a result will be directly constructed and returned. When the mock switch is turned off, it will normally request the finance department to obtain the result. Subsequent planningService SetIn the system disaster recovery mentioned above, if the red envelope core service is changed, or the database DB main machine room crashes, it will affect all users. At this time, it can only be downgraded and returned, and the entire system cannot be quickly switched and restored. Later, we will consider changing the service to a set architecture. Divide the service server and the corresponding storage into a separate Set. Each Set only handles the traffic within the corresponding divided unit. At the same time, traffic splitting and fault isolation are realized between multiple units, as well as data backup between Sets. In this way, when a unit is abnormal later, the traffic of the corresponding unit can be promptly cut to the backup unit. |
<<: iOS 16 new feature: You can prove to websites that you are not a robot
>>: Research on audio and video playback technology in Android system
The author currently works in an online education...
With the rapid development of the Internet, full-...
After submitting an app for review, developers wi...
Human nature is often the driving force behind ma...
A friend asked: If the solar system moves slightl...
The world of short videos has never been peaceful...
When smartphones first entered the market, Steve ...
Are young people increasingly forming smaller cir...
Doctor, I have "little wings" growing i...
How much does it cost to attract investment in th...
Preface Operations is an interesting thing. As on...
Three secrets to making money: 1. Don’t look at i...
I believe everyone is familiar with quick ranking...
Information flow advertising was first born on Fa...
Baidu network disk download location: i1-72-Xiaol...