How JD.com handles the impact of data center networks on application performance

As the scale of modern data centers continues to expand, network topology and routing forwarding become increasingly complex. Traditional data centers use mainframes and minicomputers, and the network scale is relatively small. Ordinary chassis switches can meet network requirements. With the popularization of CLOS cluster architecture, standard x86 server clusters have gradually replaced mainframes and minicomputers with low cost and high scalability and become the mainstream of data centers.

The figure below is a typical data center solution based on the CLOS architecture. In such a large-scale network, how to make data be transmitted from the sender to the receiver at the fastest speed becomes a key factor in network performance tuning.

At the Future Data Center Core Technology Seminar held by JD.com's IT Resource Service Department, several R&D directors and technical backbones from JD.com's artificial intelligence, big data, and cloud computing teams held in-depth discussions on the topic of how networks affect application performance.

One reason why the network affects application performance is that processor performance is getting higher and higher, and the point-to-point latency between applications is getting lower and lower. For example, the MPI protocol used in high-performance computing and AI applications can have a point-to-point transmission latency of less than 1 microsecond (1us), while the single hop latency of most switches now exceeds 3 microseconds.

From the topology diagram above, we can see that the same data center needs to go through 5 Hops (from Rack ToR to Row Spine, to Data Center Spine, then to Row Spine, and then to Rack ToR), which consumes 15 microseconds of latency. 1 microsecond is 15 microseconds. More than 90% of the time in running the application is consumed on the network, and this does not include any retransmission caused by packet loss on the network.

How to reduce the impact of network on application performance

1. Use high-performance switches

If the performance of the switch can be reduced from 3 microseconds to 0.3 microseconds, the delay of the entire network will be reduced to one tenth of the original.

2. Use high-performance and stable switches

The forwarding performance of some switches is unstable. In different packet sizes, the forwarding performance will be different. In the case of small packets, there can be low latency, and in the case of large packets, the latency will increase significantly, resulting in unpredictable network performance. The forwarding performance of some switches does not fluctuate with the change of packet size and always maintains a low latency state.

3. Avoid unfairness in many-to-one communication

If this unfair phenomenon occurs, it will lead to uneven network forwarding speeds and a first-come, first-served phenomenon.

4. Establish a fast network congestion control mechanism

In large networks, congestion is inevitable. How to effectively manage congestion and reduce packet loss and retransmission caused by congestion is a very important technical difficulty in current network management.

5. The strategy of slowing down data transmission is better than retransmitting data after packet loss.

In the network, slow transmission and packet loss retransmission are two methods used to solve congestion. Practice has proved that slow transmission is more effective in solving congestion problems than packet loss retransmission.

Management and control of network congestion

Through the discussion at the seminar, we found that the attributes of the application determine the communication mode in the network, such as multiple initiators in storage applications accessing single or multiple targets, many-to-many communication in MPI applications, communication between workers and parameter servers in machine learning, one-to-many communication in CDN, etc.

When a many-to-one situation occurs, in order to reduce the retransmission caused by packet loss, we need to take measures to reduce the speed of the sender to reduce the pressure on the switch buffer. In terms of network congestion management and control, the industry usually adopts two methods: PFC (Priority based Flow Control) and ECN (Explicit Congestion Notification).

1. PFC is a congestion management mechanism initiated at the switch ingress port

In normal non-congested situations, the switch ingress buffer does not need to store data. When the buffer of the switch egress port reaches a certain threshold, the switch ingress buffer begins to accumulate. When the ingress buffer reaches the threshold we set, the switch ingress begins to actively force its upstream port to slow down. Since PFC is based on priority control, this back pressure may affect applications with the same priority.

2. ECN is a congestion control mechanism initiated at the egress port of the switch

When the egress buffer of the switch reaches the set threshold, the switch will change the ECN bit in the packet header to add an ECN label to the data. When the data with the ECN label reaches the receiving end, the receiving end will generate a CNP (Congestion Notification Packet) and send it to the sending end. The CNP contains information about the flow or QP that caused the congestion. When the receiving end receives the CNP, it will take measures to reduce the sending speed.

It can be seen that ECN is a congestion control mechanism based on TCP flow or RDMA QP. It only works on the flow or QP that causes congestion and will not affect other applications.

Wang Zhongping, technical director of the hardware system department of JD.com's IT Resource Service Department, proposed that in managing network congestion, PFC and ECN should be used in combination to effectively achieve a balance between performance and operability. The following recommendations can be referred to in the specific implementation process:

Lv Ke, head of JD.com's IT Resource Services Department, said: "How to reduce the impact of the network on application performance is a very complex issue, and it is also a problem that all data center managers have been trying to solve. The best way is for our network personnel and application personnel to discuss the application's requirements for the network. Our professional technical team will test and select the most suitable network products and solutions based on the requirements."

<<: It’s already 2017, do Android phones still need root?

>>: iOS 11 updates the sixth developer beta, and the biggest change is the App Store logo

WeChat water delivery mini program function, how much does it cost to develop a mini program for delivering bottled water to your door?

Blog

Recommend

Face-scanning payment is well received but not popular. Why are consumers and merchants unwilling to use advanced technology?

I believe most people are familiar with mobile pa...

No matter how vague the requirements given by your boss are, you don’t have to be afraid once you learn this method from Tencent experts!

I recently read the book "Deliberate Practic...

How JD.com handles the impact of data center networks on application performance

How to promote Xiaohongshu? 2 steps!

“Ding Xiang Doctor”’s new media matrix building skills for millions of fans!

Doumeng e-commerce monthly sales of tens of millions of small store operation course sharing

Huke.com C4D software series course video

User operation: 4 steps to build a points system

Want to plan hot marketing in August? You may overlook these two details.

Weekly SEM introductory course

When will the willow catkins begin and end in 2022? Are the catkins floating in the air in spring willows or poplars?

How to conduct an activity with a higher degree of completion?

WeChat water delivery mini program function, how much does it cost to develop a mini program for delivering bottled water to your door?

Recommend

Face-scanning payment is well received but not popular. Why are consumers and merchants unwilling to use advanced technology?

Let's talk about the operating thinking and skills of 10 yuan

Kuaishou Live Selling Operation Guide

GEO Data Analysis Video Course Series

Swift version of infinite slideshow scrolling

Download Yang Guozheng's Parent Eye Method video + recording + pictures from Baidu Cloud!

The Quotes of Steve Jobs You Haven't Heard of - Another Steve Jobs

4 strategies to improve user retention!

Sichuan Xunniu March Chan Theory Basic Course

No matter how vague the requirements given by your boss are, you don’t have to be afraid once you learn this method from Tencent experts!

100 essential tools for new media operations!

How to increase followers and traffic through Xiaohongshu?

What causes urticaria: What is the essence of SEO?

Android: Imitate WeChat source code

Google's Android beta device search feature exposed: supports network and car search platforms