As the scale of modern data centers continues to expand, network topology and routing forwarding become increasingly complex. Traditional data centers use mainframes and minicomputers, and the network scale is relatively small. Ordinary chassis switches can meet network requirements. With the popularization of CLOS cluster architecture, standard x86 server clusters have gradually replaced mainframes and minicomputers with low cost and high scalability and become the mainstream of data centers. The figure below is a typical data center solution based on the CLOS architecture. In such a large-scale network, how to make data be transmitted from the sender to the receiver at the fastest speed becomes a key factor in network performance tuning. At the Future Data Center Core Technology Seminar held by JD.com's IT Resource Service Department, several R&D directors and technical backbones from JD.com's artificial intelligence, big data, and cloud computing teams held in-depth discussions on the topic of how networks affect application performance. One reason why the network affects application performance is that processor performance is getting higher and higher, and the point-to-point latency between applications is getting lower and lower. For example, the MPI protocol used in high-performance computing and AI applications can have a point-to-point transmission latency of less than 1 microsecond (1us), while the single hop latency of most switches now exceeds 3 microseconds. From the topology diagram above, we can see that the same data center needs to go through 5 Hops (from Rack ToR to Row Spine, to Data Center Spine, then to Row Spine, and then to Rack ToR), which consumes 15 microseconds of latency. 1 microsecond is 15 microseconds. More than 90% of the time in running the application is consumed on the network, and this does not include any retransmission caused by packet loss on the network. How to reduce the impact of network on application performance 1. Use high-performance switches If the performance of the switch can be reduced from 3 microseconds to 0.3 microseconds, the delay of the entire network will be reduced to one tenth of the original. 2. Use high-performance and stable switches The forwarding performance of some switches is unstable. In different packet sizes, the forwarding performance will be different. In the case of small packets, there can be low latency, and in the case of large packets, the latency will increase significantly, resulting in unpredictable network performance. The forwarding performance of some switches does not fluctuate with the change of packet size and always maintains a low latency state. 3. Avoid unfairness in many-to-one communication If this unfair phenomenon occurs, it will lead to uneven network forwarding speeds and a first-come, first-served phenomenon. 4. Establish a fast network congestion control mechanism In large networks, congestion is inevitable. How to effectively manage congestion and reduce packet loss and retransmission caused by congestion is a very important technical difficulty in current network management. 5. The strategy of slowing down data transmission is better than retransmitting data after packet loss. In the network, slow transmission and packet loss retransmission are two methods used to solve congestion. Practice has proved that slow transmission is more effective in solving congestion problems than packet loss retransmission. Management and control of network congestion Through the discussion at the seminar, we found that the attributes of the application determine the communication mode in the network, such as multiple initiators in storage applications accessing single or multiple targets, many-to-many communication in MPI applications, communication between workers and parameter servers in machine learning, one-to-many communication in CDN, etc. When a many-to-one situation occurs, in order to reduce the retransmission caused by packet loss, we need to take measures to reduce the speed of the sender to reduce the pressure on the switch buffer. In terms of network congestion management and control, the industry usually adopts two methods: PFC (Priority based Flow Control) and ECN (Explicit Congestion Notification). 1. PFC is a congestion management mechanism initiated at the switch ingress port In normal non-congested situations, the switch ingress buffer does not need to store data. When the buffer of the switch egress port reaches a certain threshold, the switch ingress buffer begins to accumulate. When the ingress buffer reaches the threshold we set, the switch ingress begins to actively force its upstream port to slow down. Since PFC is based on priority control, this back pressure may affect applications with the same priority. 2. ECN is a congestion control mechanism initiated at the egress port of the switch When the egress buffer of the switch reaches the set threshold, the switch will change the ECN bit in the packet header to add an ECN label to the data. When the data with the ECN label reaches the receiving end, the receiving end will generate a CNP (Congestion Notification Packet) and send it to the sending end. The CNP contains information about the flow or QP that caused the congestion. When the receiving end receives the CNP, it will take measures to reduce the sending speed. It can be seen that ECN is a congestion control mechanism based on TCP flow or RDMA QP. It only works on the flow or QP that causes congestion and will not affect other applications. Wang Zhongping, technical director of the hardware system department of JD.com's IT Resource Service Department, proposed that in managing network congestion, PFC and ECN should be used in combination to effectively achieve a balance between performance and operability. The following recommendations can be referred to in the specific implementation process: Lv Ke, head of JD.com's IT Resource Services Department, said: "How to reduce the impact of the network on application performance is a very complex issue, and it is also a problem that all data center managers have been trying to solve. The best way is for our network personnel and application personnel to discuss the application's requirements for the network. Our professional technical team will test and select the most suitable network products and solutions based on the requirements." |
<<: It’s already 2017, do Android phones still need root?
>>: iOS 11 updates the sixth developer beta, and the biggest change is the App Store logo
I believe most people are familiar with mobile pa...
Today let’s talk about the 10-yuan operational th...
In order to write this article well, I have been ...
GEO Data Analysis Series Video Course Resource In...
Source code introduction: Swift version of slides...
Download Yang Guozheng's Parent Eye Method vi...
[[125806]] Deciding what not to do is just as imp...
Improving user retention is a big topic. Next, we...
Introduction to the resources of the March Chaos ...
I recently read the book "Deliberate Practic...
1 Graphics and text editor ◆135 Editor www.135edi...
Xiaohongshu is a platform that focuses on communi...
Have a website and don’t know how to optimize it?...
Source code introduction: Test account: 123456888...
Google said it is preparing to upgrade the "...