[51CTO.com original article] As we all know, during the HTTP plaintext transmission of data, a series of problems such as hijacking, tampering, monitoring, and theft will be encountered. The solution to this problem is to make HTTPS transformation. The role of HTTPS is to introduce the TLS/SSL handshake protocol at the session layer and presentation layer, and to deal with the problems encountered during the plaintext transmission of data through data encryption and decryption, to ensure the integrity and consistency of data, and to bring users a safer network experience and better privacy protection. However, HTTPS adds the TLS/SSL handshake link, and the application data transmission needs to be symmetric encrypted, which poses a greater challenge to performance. As a good architecture, it is necessary to balance security and performance. If the balance is tilted too much towards either side, the final user experience will be affected. Therefore, in order to balance security and performance, Suning's full-site HTTPS transformation began at the end of 2015 and lasted for more than a year. The main work was system HTTPS transformation, HTTPS performance optimization, and HTTPS grayscale launch. It has become possible for users to get the ultimate experience when accessing under HTTPS. Overview of the full-site HTTPS solutionSuning.com started planning to do things related to HTTPS in 2015. At that time, there was very little information to refer to, and detailed cases of HTTPS transformation related to e-commerce websites were even harder to find. The following figure shows the HTTPS solution for Suning.com's entire website: As shown in the figure, the entire solution is built in three steps, namely system transformation, performance optimization and grayscale launch :
HTTPS solution: system transformationHTTPS access layer definition The first priority of system transformation is to open port 443. A mature network system will include CDN, hardware load balancing, application firewall, Web server, application server, and finally the data layer. Does the entire link need to be HTTPS? Does it increase the SSL handshake consumption at each layer? The answer is no. Therefore, the SSL handshake should be completed as early as possible. The first thing to consider during the SSL process is the positioning of the HTTPS access layer. As shown in the figure below, this is the location of the HTTPS access layer in Suning.com's architecture As shown in the figure, we put the HTTPS access layer between the CDN and the application system, and adopt a four-layer + seven-layer load balancing architecture. The four-layer load does not handle HTTPS offloading, and its main responsibility is to distribute TCP. The entire SSL handshake is completed at the seven-layer load, and the application system uses port 80 afterwards, which is equivalent to completing the entire HTTPS offloading process. The advantage of doing this is that, on the one hand, the system application layer does not need to make any adjustments for HTTPS; on the other hand, all future HTTPS scheduling, optimization and configuration can be completed at the access layer. Page resource replacement Step 1: Understand Mixed Content For a page, the request for the page is loaded using HTTPS. Once the internal page elements have HTTP properties, an error will appear in the RFC standard, called Mixed Content. Therefore, if you want to load a secure HTTPS page, you should not confuse HTTP requests in it. Step 2: Replace http:// with // Replace http:// with // so that all elements of the page can adapt to follow the original request. Step 3: Definition and use of x-request-url Of course, we also encountered some pitfalls during the //replacement process. For example, the following figure shows the interaction process of Suning.com's single sign-on system: As shown in the figure, when the user's authID is invalid, a request is made to https://xxx.suning.com/authStatus for authentication. The access layer will unload all requests and the address will become HTTP. If the user enters the business system for authentication, Reponse 302 will jump to the single sign-on system. At this time, the page in the second step will be recorded as the original page and returned to the user end. The user requests the single sign-on system. After the single sign-on system completes the authentication, it will jump back to the HTTP address, which ultimately leads to Mix Content on the user end. Therefore, we introduce x-request-url to solve the problem, as shown below: All original request protocols are recorded in x-request-url. If the business system authenticates, it must follow the protocol recorded in x-request-url to deal with the user-side Mix Content problem caused by bounce. The problem that the native app cannot recognize // The reason why the browser can recognize //, but the native App cannot recognize // is very simple, because the browser itself has been adapted. At that time, Suning had a system on the server side that provided an interface to provide images to various terminals. After the HTTPS transformation, there were no problems on both the PC and the client side. But the next day, many users suddenly could not load images because the request could not be recognized in the native APP. The only solution here is for the client developer to adapt. The following figure is an example of an App that cannot recognize // How to handle certificates and private keys on commercial CDN? The handling of CPN certificates is a problem that most small Internet companies will encounter. Because these small companies cannot build their own CDN like Alibaba and JD.com, and the same is true for Suning. Suning's CDN consists of two types: self-built and commercial. Once a commercial CDN is used, the problem of how to pass HTTPS will arise. Once the company gives the private key to a third party or manufacturer, it will have no control over the CDN servers of all manufacturers. When a hacker attacks the manufacturer's server, encryption is meaningless because the private key has been leaked. As shown in the figure below, the industry's more recognized response methods are: dual certificate strategy, four-layer acceleration and keyless solution.
When CDN needs to use the private key, it transmits the necessary parameters to the Key Server through an encrypted channel, and the Key Server calculates the result and returns it. HTTPS Testing Strategy When a new protocol is introduced, how to test it? The main steps are as follows:
In addition, we have also introduced a traffic diversion test system: Its idea is very simple, it captures traffic based on domain names and user requests, puts all captured traffic into the Copy Server to amplify it, amplifies it several times, and then sends it back to the system through the Sender. In this way, the functionality and performance impact of HTTPS can be verified through the user's real traffic. HTTPS solution performance optimizationBefore we talk about how to optimize the performance of HTTPS, let's take a look at the entire TLS handshake process, as shown below: As shown in the figure, a handshake process is divided into eight steps in the worst case:
Suning.com's full-site HTTPS solution has done a lot of work in terms of performance optimization, such as the reasonable use of HSTS, Session resume, and Ocsp stapling, as well as optimizations such as client HTTPS performance and HttpDNS to resolve DNS attack hijacking. Reasonable use of HSTS The purpose of the Web security protocol HSTS is to force clients (such as browsers) to use HTTPS to establish connections with servers. The advantage is that it reduces the overhead of HTTP 302 redirects. 302 redirects not only expose the user's access site, but are also easily hijacked by middlemen (downgrade hijacking, man-in-the-middle attacks), and most importantly, they reduce access speed (affecting performance). The disadvantage is that HSTS forces HTTPS on the client side during the max-age expiration time, which the server cannot control. Therefore, when downgrading is required, HTTPS cannot be switched to HTTP in time. Of course, you can also manually configure the maxage value dynamically, so you can achieve the downgrade effect by setting maxage to 0. In addition, HSTS is strict HTTPS. Once the network certificate is wrong, the web page will be directly inaccessible (users cannot choose to ignore it). Reasonable use of session resume After the first TLS handshake between the user and the client, or between the client and the server, is a TLS handshake still needed for the second data transmission? Here, session reuse can be used. Session resume is a mechanism that has long been defined in the RFC standard and was involved when HTTPS was first released. There are two ways to reuse sessions: Session ID and Session tickets. The following figure shows the implementation process : Session ID. Use the session ID in the client hello to query the server's session cache. If the server has a corresponding cache, the existing session information is used to complete the handshake in advance, which is called a simplified handshake. Session ID is a standard field in the TLS protocol, and all browsers on the market support Session ID. It should be noted that sharing SSL sessions between multiple processes on a single machine is meaningless for a cluster environment. Therefore, it is necessary to implement multi-machine sharing of Session ID here. It can be placed in redis. Nginx provides a module ssl_session_fetch_by_lua_block that specifically handles Session ID. Session tickets. Session tickets are a supplement to session IDs. The server encrypts the session information into a ticket and sends it to the browser. The browser will send the ticket in the subsequent handshake request. If the server can successfully decrypt and process the ticket, the simplified handshake can be completed. Obviously, the advantage of session tickets is that the server does not need to consume a lot of resources to store session content. However, session tickets are only an extended feature of the TLS protocol. The current support rate is not very wide, only about 60%. It is also necessary to maintain a global key for encryption and decryption, and the security and deployment efficiency of the KEY need to be considered. Reasonable use of Ocsp stapling Ocsp stands for Online Certificate Status Check Protocol (rfc6960), which is used to query the CA site for certificate status, such as whether the certificate has been revoked or expired. Usually, the browser uses the OCSP protocol to initiate a query request, the CA returns the certificate status content, and then the browser accepts the status of whether the certificate is credible. As shown below, this is the Ocsp implementation process : This process is very time-consuming, because the CA site may be located abroad, resulting in unstable network and long RTT. Is there any way to not directly request OCSP content from the CA site? OCSP stapling can achieve this function. The principle of OCSP stapling is that the server completes the CA certificate verification process on behalf of the client, saving the user's time. When the browser initiates the client hello, it will carry a certificate status request extension. After seeing this extension, the server will return the OCSP content directly to the browser to complete the certificate status check. Since the browser does not need to query the CA site directly for the certificate status, this feature significantly improves access speed. HTTPS solution: grayscale launchGrayscale launch can follow the three principles of grayscale, downgrade, and open and close. The grayscale principle means that the entire launch process should be grayscaled according to region, version, and user level, and the user data collected by grayscale will determine the progress of the entire plan. The downgrade principle ensures that each step of the operation is reversible and rollback, that is, open to expansion and closed to modification, which is the cornerstone of reusable design. HTTPS switch control In terms of HTTPS switch control, Suning mainly builds three switches: content management, CDN, and client:
New problems encountered during the launch process After completing the switch control, we encountered some new problems during the official launch process, such as: Referrer, DNS hijacking, HTTPS performance monitoring , etc. Referrer Currently, most browsers do not send Referrer information by default when protocol downgrade occurs. The most typical scenario is when clicking a link from an HTTPS page to an HTTP website, the browser does not include the Referrer field in the request header. When the Referrer is not included, it has a huge impact on big data because it is impossible to trace the source of traffic. For modern browsers, this problem can be solved by adding the meta tag to the page:<meta name="referrer" content="always" /> DNS Hijacking DNS hijacking refers to the illegal destruction of the domain name resolution process, which causes the request to be resolved to an incorrect node in order to achieve some malicious purpose. When we use HTTP, DNS anomalies may not affect the functionality of the request, but HTTPS will definitely not be able to respond because the illegal node does not have a certificate and private key. Suning's approach is to monitor the normality of DNS through some wave tests. As shown in the figure below, we detected that there were a large number of DNS resolution anomalies in a certain area of Suning Chinese Specialty Store . DNS hijacking has a great impact on users. Once a page cannot be opened, users will think that there is something wrong with the page and will not visit it a second time. The following figure shows the problems that occurred in Suning.com’s Hebei region : As shown in the picture, the entire frame of the page is there, but there is no picture. It was finally determined that it was caused by DNS hijacking. The solution here is to establish a complete risk control system, build wave test nodes across the country, record the entire request image and page, and save them, as shown below: At that time, after the user in Hebei region sent a request, TCP could not establish a connection, and SSL could not handshake. The reason was DNS hijacking, which was mapped to an illegal wrong node. The solution was the downgrade method I just mentioned, through IP If it is a Hebei Mobile user, HTTPS will be downgraded to HTTP, and HTTPS will continue to be used in other places. After the local operator solves the problem, it will be restored. HTTPS Performance Monitoring The following figure shows the monitoring page of Suning.com mobile terminal : The most important thing about HTTPS grayscale is good monitoring. There must be a monitoring coverage. To do grayscale well, you must analyze business, performance, on-site and off-site delivery, CPS and other data at each step of the online process. After all data analysis is normal, gradually expand the area and deploy according to the APP version and user level. HTTPS Future OutlookHere is a low-latency Internet transport layer protocol based on UDP: QUIC (Quick UDP Internet Connection). The TCP/IP protocol family is the foundation of the Internet. This UDP protocol was proposed by Google and is intended to replace the TCP protocol. Of the two protocols, UDP is lighter and has much less error checking, but its reliability is weaker than TCP. At present, for the QUIC protocol, some foreign companies are emphasizing that they can ensure security and ensure that the handshake will not affect the original transmission during the trial stage. This may be the future development direction. 【About the Author】
Zhu Yiquan, an architect at Suning Cloud Commerce IT Headquarters, has participated in the construction of a panoramic application performance monitoring platform, mobile terminal performance optimization and improvement, and the construction of a unified mobile access layer. He has led the launch and optimization of HTTPS for the entire Suning.com website. He focuses on application layer network performance optimization and has extensive experience in HTTPS, HTTP/2, and other fields. His goal is to ensure fast, stable, and secure communications in complex network environments through optimization. The above content is compiled based on Mr. Zhu Yiquan ’s speech at the WOTA2017 special session on “Technical Challenges Behind E-commerce Promotions”. Due to space constraints, some encryption suites, SPDY&HTTP/2, HTTP/2 stress testing tools, SSL hardware accelerator cards, etc. are not shared in this article. For more details, please see: Video & PPT Top 3 hot articles recommended this month Things about hybrid cloud: how to make public cloud and private cloud achieve 1+1>2 AR/VR experts tell you: What will the future of human-computer interaction look like? Detailed explanation by a senior architect with 15 years of experience: Microservice transformation practice of a large Internet company [51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites] |
<<: The 9th Aiti Tribe Technical Clinic
>>: Apple 2017 Fall Conference: A sneak peek at new features of Apple Watch
gossip In spring, if children repeatedly experien...
Baidu Aicaigou is a B2B e-commerce platform that ...
At work, I need to teach some newcomers about B-s...
Today we are going to talk about writing a planni...
At around 11:50 on February 8, a landslide occurr...
The core essence of new media is actually tools ....
Many webmasters and channels that do CPA advertis...
Apple often announces some free replacement plans...
Based on the research and analysis answers given ...
When you exercise in the gym, do you carry alcoho...
Chinese people are keen on "detoxification&q...
Milk has become an indispensable drink for our da...
Insects are not only of many kinds, but also have...
Every spring, many animals start preparing for re...
Someone asked a question online: Soon, many netiz...