WeChat has such a huge amount of traffic, especially the instantaneous peak, which is a huge challenge for any team and architect. We are also wondering how the WeChat team will handle the traffic of grabbing red envelopes. It just so happened that the public account of Tencent Lecture Hall distributed this article today. Although it did not introduce the specific technical details, it is still worth learning from the macro strategy, and I would like to share it with you. 400x Challenge Compared with last year's red envelope sending between users, this year's WeChat red envelope shaking method has brought a huge explosion in business volume. The wave of red envelopes sent out at 10:30 on New Year's Eve alone reached 120 million, which is 400 times the peak on New Year's Eve in 2014 (the peak number of red envelopes opened per minute in 2014 was only 25,000)! Entering the red envelope grabbing stage, the background data instantly soared What’s the difficulty in sending out 1 billion red envelopes? The WeChat team concluded that there are three major difficulties: Fast - How to ensure that users can quickly shake out red envelopes? Accurate - How to ensure that the red envelope you shake can be successfully opened? Stability - How to ensure that the opened red envelopes can be shared? A large number of users shake their red envelopes at the same time, generating tens of millions of requests per second in an instant. If such a large number of requests are not channeled and directly reach the backend, it will inevitably cause the backend service to overload or even crash. The backend monitoring data curve on New Year's Eve above can explain everything - under the heavy diversion and pressure reduction of the frontend, the backend server load still instantly soared more than ten times. Three major response strategies are put into practice In response to the above three difficulties, the WeChat backend development team mainly adopted three coping strategies: lossy service, flexible availability, and large system with small scale. Lossy service - pursuing high availability and fast response. What is lossy service? Lossy service is to carefully split the product process and selectively sacrifice some data consistency and integrity to ensure that most core functions can run. This is a unique operation strategy accumulated by Tencent in the PC era - under the premise of certain resources, in the ever-changing Internet conditions, we do our best to meet the core needs of users. The core points of WeChat red envelopes are shaking, opening, and sharing red envelopes. The entire system must be designed to ensure that these three steps are completed in one go as much as possible. When any related system has an abnormality, the system must be downgraded immediately to prevent system avalanche. System degradation can be divided into two aspects. The first is to split and simplify the core functions and ensure the feasibility of the shortest critical path through auxiliary lightweight service implementation. For example, the red envelope shaking logic is placed in the access layer to convert tens of millions of requests per second into tens of thousands of red envelope requests per second, which are then transmitted to the back-end logic of the red envelope service to reduce the possibility of avalanche. Comment: Detrimental service means letting important things and important people go first. This is also very common in reality. Soldiers have priority when buying tickets, leaders block roads for inspections, and leaders' cars go first. The same is true for ordinary people like us. At the same time, the backend adopts asynchronous splitting. When receiving a user request, it only performs legitimacy verification. After the verification is completed, it directly notifies the success, and the subsequent business logic enters the asynchronous queue for processing, which reduces the user's waiting time and greatly reduces the probability of peak avalanche. The longest-time deposit operation is skipped directly and processed asynchronously Another aspect is to take overload protection measures: WeChat Red Packet's overload protection strategy has been pre-buried in the client, and there will be corresponding prompts in the case of connection failure or timeout, reducing the number of repeated requests by users. The access layer will also perform self-protection, limit the response speed for clients that frequently send requests, and divide the system load into several levels, guiding the client to use different speed limits when reaching different thresholds; when abnormal situations occur, measures such as reducing the number of red packets, asynchronous flow control to reduce the rate of opening/sharing red packets, etc. are taken to reduce the pressure on the server side; at the same time, WeChat Red Packet also has a full-process stress testing process to automatically evaluate the entire business link in advance to prevent overload. Comment: Block the entry of back-end traffic at the front end. For example, when communication fails, the current user will no longer put any pressure on the back end. You may not have seen this picture, it has actually been on standby on your phone Under the heavy protection of the lossy service concept, in the first wave of red envelope shaking experience activities, WeChat red envelopes passed the test with almost full marks. The role of overload protection was quite obvious. It reduced pressure and filtered layer by layer on the client and access, and finally only passed tens of thousands of levels of pressure to the background. Flexible and available - refine the scenarios to grasp the core needs. Flexible availability is a method supported by the value of lossy service. The key point is to actually combine user usage scenarios, adjust product strategies according to resource consumption, and design several different levels of user experience scenarios to ensure that key data is returned as successfully as possible and requests are accepted normally, and never fall down easily. Flexible services are more product-oriented. Their significance lies in deeply understanding the core value of each scenario of the product, grasping the core needs of users in each scenario, and designing different levels of methods to meet core demands. The red envelope team also has corresponding measures for the practice of flexible services in WeChat red envelopes, which can be mainly divided into several categories. 1. System disaster recovery: In the face of large-scale requests, system disaster recovery is essential. Disaster recovery can generally be divided into logical layer disaster recovery and data layer disaster recovery. This time, the WeChat backend development team adopted a 30% switching logic layer solution in the disaster recovery deployment, that is, the core services can achieve automatic disaster recovery switching when up to 1/3 of the servers have problems to ensure service quality and improve the warning level in exchange for system availability. 2. Resource isolation: As the name suggests, it is to isolate resources to reduce the impact between service branches. Starting from the logic, in the resource logic, when service A assigns tasks to service BC at the same time, a single maximum allocation upper limit is set to prevent problems in any branch from affecting the entire service chain. In this way, even if problems occur in some services, it will not affect the collapse of the entire service. 3. Quick rejection: When the service is overloaded, the request is rejected as soon as possible, and the service caller changes the machine to retry to avoid overloading of a single server. Quick rejection and early rejection in lossy service are conceptual methods. Solve the problem from the source of the process. The lower the cost, the smaller the impact. The problem is solved by protecting the back end with the front end. Comment: One thing that needs to be pointed out here is that the client is different from the Web system. The premise of doing this operation is to anticipate the critical path in advance and embed relevant instructions and policies in the client version update. When receiving data acquisition exceptions, the client automatically reduces the request frequency. For example, if a request fails, the user will definitely want to refresh the page again, but it may not actually request the backend, but return directly. Please be patient. If it is not embedded in advance, it will be too late to deal with it when there is a problem. 4. Payment grouping: Starting from the payment link, all red envelopes are divided into 50 groups and placed on 50 separate sets without affecting each other. If a single set has a problem, it will only affect 1/50 users at most, ensuring that the majority of people's services are not disturbed. Grouping and set-making is also an important technical means of flexibility and availability. This thinking is very similar to the container thinking in real life - through standardized and large-scale box design, it can cope with complex and diverse goods, making each circulation link independent and flexible. 5. Traffic preloading: Starting from the client, let the client automatically download and pre-install resources that consume a lot of traffic, such as voice and pictures, in advance, divert the traffic peak in advance, and on the day of the event, CDN will prepare hundreds of GB of bandwidth to cope with it. This is also similar to the fast and slow separation in overload protection. Loading traffic-consuming services in advance can avoid conflicts during peak hours. Comment: This is advance preparation, making full use of the cache from all paths. Big system, small work - ensure the single function of the process "Big system, small implementation" can be said to be a kind of consciousness. Its core idea is to reduce a system with complex functions into a small one, reduce module coupling, lower correlation, and use multiple independent modules to realize the functions of the overall system. "Big system, small implementation" adopts the approach of simplifying the complex and divide and conquer, which is convenient for development and rapid implementation. WeChat Red Packet is such a huge backend system with many modules. The WeChat backend development team adopted a highly modular system approach to divide the module into small, highly self-made systems, forming a high cohesion and low coupling pattern. Each module will not be overly dependent on each other. The advantage of this is that the entire service will not be affected by any one module, thus avoiding the risk of a single module affecting the entire system and achieving true grayscale service. Comment: Reduce coupling, increase the difficulty of problem solving and daily maintainability. Massive service capabilities determine success or failure From Didi Taxi in 2014 to WeChat Red Packet in 2015, Tencent has used cases one by one to prove its strength in mass services. In fact, the real hero behind the scenes that supports the smooth operation of WeChat Red Packet is a technical system called "Massive Way 2.0" within Tencent. The three major means of lossy service, flexible service, and big system with small operations are also derived from this system. The smoke of the mobile Internet war is getting stronger, and BAT is doing everything it can to compete for the payment entrance. In the process of business from starting to running and then taking off, the mass service capabilities behind the giants will have a more and more profound impact on their ultimate success or failure. |
<<: Exclusive reveal of the prequel to WeChat red envelopes
>>: How to speed up NFC development?
Cover The article cover is the first information ...
On September 20, 2016, the ASUS Zenvolution new p...
On April 27, the technology event GMIC2017 opened...
On July 8, according to the Wall Street Journal, ...
Data source: CBRE Research, Q3 2020 Financial Ind...
《Cotton Swab Medical Science Popularization》 Mi Y...
Recently, I often hear information flow advertise...
BillBill is abbreviated as Bilibili. As the platf...
As the saying goes, when you are writing a propos...
Do you think black holes are far away from our li...
For many people, they may dream of becoming rich ...
This article shares with you an optimization case...
1. Market Analysis 1.1 Analysis of the pan-entert...
The "morning C and night A" skin care m...
Quantum computers have been one of the hottest st...