How to balance security and performance? Exploration and practice of HTTPS optimization for e-commerce websites

How to balance security and performance? Exploration and practice of HTTPS optimization for e-commerce websites

[51CTO.com original article] As we all know, during the HTTP plaintext transmission of data, a series of problems such as hijacking, tampering, monitoring, and theft will be encountered. The solution to this problem is to make HTTPS transformation. The role of HTTPS is to introduce the TLS/SSL handshake protocol at the session layer and presentation layer, and to deal with the problems encountered during the plaintext transmission of data through data encryption and decryption, to ensure the integrity and consistency of data, and to bring users a safer network experience and better privacy protection. However, HTTPS adds the TLS/SSL handshake link, and the application data transmission needs to be symmetric encrypted, which poses a greater challenge to performance.

As a good architecture, it is necessary to balance security and performance. If the balance is tilted too much towards either side, the final user experience will be affected. Therefore, in order to balance security and performance, Suning's full-site HTTPS transformation began at the end of 2015 and lasted for more than a year. The main work was system HTTPS transformation, HTTPS performance optimization, and HTTPS grayscale launch. It has become possible for users to get the ultimate experience when accessing under HTTPS.

Overview of the full-site HTTPS solution

Suning.com started planning to do things related to HTTPS in 2015. At that time, there was very little information to refer to, and detailed cases of HTTPS transformation related to e-commerce websites were even harder to find.

The following figure shows the HTTPS solution for Suning.com's entire website:

As shown in the figure, the entire solution is built in three steps, namely system transformation, performance optimization and grayscale launch :

  • System transformation. If the original system wants to support HTTPS, it must be transformed. First, the HTTPS access layer must be established, that is, port 443 must be opened to allow all application systems to support HTTPS access. On this basis, page resource replacement is performed to solve the problem that errors will occur when an HTTPS page has an HTTP request. After doing these two things, problems such as certificate processing on CDN and HTTPS testing solutions are also solved.
  • Performance optimization. When modifying the system and adding two TLS handshakes, it will inevitably cause certain overhead and loss in performance. How to make up for the loss of performance and achieve a balance between performance and security? The performance optimization part includes several optimization points, which will be elaborated in detail below.
  • Grayscale launch. This part takes the most time. In the process of HTTPS launch, there are many pitfalls, some of which are problems that were not discovered before. This proves that the entire site, all regions, and all users cannot be rolled out to HTTPS at once. Grayscale launch can be done according to the operator, city, and user level of the traffic.

HTTPS solution: system transformation

HTTPS access layer definition

The first priority of system transformation is to open port 443. A mature network system will include CDN, hardware load balancing, application firewall, Web server, application server, and finally the data layer. Does the entire link need to be HTTPS? Does it increase the SSL handshake consumption at each layer? The answer is no.

Therefore, the SSL handshake should be completed as early as possible. The first thing to consider during the SSL process is the positioning of the HTTPS access layer.

As shown in the figure below, this is the location of the HTTPS access layer in Suning.com's architecture

As shown in the figure, we put the HTTPS access layer between the CDN and the application system, and adopt a four-layer + seven-layer load balancing architecture. The four-layer load does not handle HTTPS offloading, and its main responsibility is to distribute TCP. The entire SSL handshake is completed at the seven-layer load, and the application system uses port 80 afterwards, which is equivalent to completing the entire HTTPS offloading process.

The advantage of doing this is that, on the one hand, the system application layer does not need to make any adjustments for HTTPS; on the other hand, all future HTTPS scheduling, optimization and configuration can be completed at the access layer.

Page resource replacement

Step 1: Understand Mixed Content

For a page, the request for the page is loaded using HTTPS. Once the internal page elements have HTTP properties, an error will appear in the RFC standard, called Mixed Content. Therefore, if you want to load a secure HTTPS page, you should not confuse HTTP requests in it.

Step 2: Replace http:// with //

Replace http:// with // so that all elements of the page can adapt to follow the original request.

Step 3: Definition and use of x-request-url

Of course, we also encountered some pitfalls during the //replacement process. For example, the following figure shows the interaction process of Suning.com's single sign-on system:

As shown in the figure, when the user's authID is invalid, a request is made to https://xxx.suning.com/authStatus for authentication. The access layer will unload all requests and the address will become HTTP. If the user enters the business system for authentication, Reponse 302 will jump to the single sign-on system. At this time, the page in the second step will be recorded as the original page and returned to the user end. The user requests the single sign-on system. After the single sign-on system completes the authentication, it will jump back to the HTTP address, which ultimately leads to Mix Content on the user end.

Therefore, we introduce x-request-url to solve the problem, as shown below:

All original request protocols are recorded in x-request-url. If the business system authenticates, it must follow the protocol recorded in x-request-url to deal with the user-side Mix Content problem caused by bounce.

The problem that the native app cannot recognize //

The reason why the browser can recognize //, but the native App cannot recognize // is very simple, because the browser itself has been adapted.

At that time, Suning had a system on the server side that provided an interface to provide images to various terminals. After the HTTPS transformation, there were no problems on both the PC and the client side. But the next day, many users suddenly could not load images because the request could not be recognized in the native APP.

The only solution here is for the client developer to adapt. The following figure is an example of an App that cannot recognize //

How to handle certificates and private keys on commercial CDN?

The handling of CPN certificates is a problem that most small Internet companies will encounter. Because these small companies cannot build their own CDN like Alibaba and JD.com, and the same is true for Suning. Suning's CDN consists of two types: self-built and commercial. Once a commercial CDN is used, the problem of how to pass HTTPS will arise. Once the company gives the private key to a third party or manufacturer, it will have no control over the CDN servers of all manufacturers. When a hacker attacks the manufacturer's server, encryption is meaningless because the private key has been leaked.

As shown in the figure below, the industry's more recognized response methods are: dual certificate strategy, four-layer acceleration and keyless solution.

  • The dual certificate strategy. Its idea is very simple. When a user goes to the CDN, they provide the CDN certificate for encryption and decryption. From the CDN to the application server, the application's own certificate is used for encryption and decryption. This method can ensure that the application's key does not need to be provided to the CDN manufacturer, but the fundamental problem is still not solved, that is, the CDN manufacturer's certificate is still likely to be leaked. If leaked, the user side will still be affected.
  • Layer 4 acceleration. Many CDN vendors are capable of providing TCP acceleration, such as dynamic, restoration, and optimization. CDN vendors only provide Layer 4 mode and TCP proxy, and do not consider request caching. In this way, there is no need to expose the certificate to CDN vendors. This method is suitable for dynamic back-to-origin requests, such as adding to a shopping cart, submitting an order, logging in, etc.
  • Keyless solution. Applicable to finance, providing a real-time computing Key Server.

When CDN needs to use the private key, it transmits the necessary parameters to the Key Server through an encrypted channel, and the Key Server calculates the result and returns it.

HTTPS Testing Strategy

When a new protocol is introduced, how to test it? The main steps are as follows:

  • Source code scanning: When the developer completes the resource replacement, Jenkins is used to traverse the code base, and the shell script scans the HTTP link.
  • Crawling and scanning the page. We will write some crawler scripts to scan the links in the test environment.
  • Test environment verification. Automated testing is good, but the main core processes still need to be manually covered to prevent HTTPS from having unknown effects on page loading. For example, if some pages are accessed using HTTPS, the system may not support HTTPS yet, so manual verification is required.
  • Online pre-release and traffic diversion test. The modified version of HTTPS released online has no impact on users, because users still use HTTP traffic. You can choose to pre-release online. After the pre-release verification is completed, the user's traffic is switched from HTTP to HTTPS through 301. This will be discussed in depth later when we talk about grayscale.

In addition, we have also introduced a traffic diversion test system: Its idea is very simple, it captures traffic based on domain names and user requests, puts all captured traffic into the Copy Server to amplify it, amplifies it several times, and then sends it back to the system through the Sender. In this way, the functionality and performance impact of HTTPS can be verified through the user's real traffic.

HTTPS solution performance optimization

Before we talk about how to optimize the performance of HTTPS, let's take a look at the entire TLS handshake process, as shown below:

As shown in the figure, a handshake process is divided into eight steps in the worst case:

  1. Send a Syn packet to the Web client. After receiving and confirming it, send a SynAck to the server at the same time. This is still an HTTP request.
  2. To convert HTTP to HTTPS, a 302 or 301 redirect is required.
  3. The user sends an HTTPS request again and does a TCP handshake.
  4. Do the first phase of TLS full handshake, Client hello to Server hello.
  5. When the certificate arrives at the client for the first time, the client needs to go through the verification process and perform CA domain name resolution.
  6. Second, TLS handshake.
  7. The process of verifying the validity of an online certificate.
  8. In the second phase of TLS full handshake, the gray part at the bottom is the actual data communication.

Suning.com's full-site HTTPS solution has done a lot of work in terms of performance optimization, such as the reasonable use of HSTS, Session resume, and Ocsp stapling, as well as optimizations such as client HTTPS performance and HttpDNS to resolve DNS attack hijacking.

Reasonable use of HSTS

The purpose of the Web security protocol HSTS is to force clients (such as browsers) to use HTTPS to establish connections with servers.

The advantage is that it reduces the overhead of HTTP 302 redirects. 302 redirects not only expose the user's access site, but are also easily hijacked by middlemen (downgrade hijacking, man-in-the-middle attacks), and most importantly, they reduce access speed (affecting performance).

The disadvantage is that HSTS forces HTTPS on the client side during the max-age expiration time, which the server cannot control. Therefore, when downgrading is required, HTTPS cannot be switched to HTTP in time. Of course, you can also manually configure the maxage value dynamically, so you can achieve the downgrade effect by setting maxage to 0. In addition, HSTS is strict HTTPS. Once the network certificate is wrong, the web page will be directly inaccessible (users cannot choose to ignore it).

Reasonable use of session resume

After the first TLS handshake between the user and the client, or between the client and the server, is a TLS handshake still needed for the second data transmission? Here, session reuse can be used. Session resume is a mechanism that has long been defined in the RFC standard and was involved when HTTPS was first released.

There are two ways to reuse sessions: Session ID and Session tickets. The following figure shows the implementation process :

Session ID. Use the session ID in the client hello to query the server's session cache. If the server has a corresponding cache, the existing session information is used to complete the handshake in advance, which is called a simplified handshake. Session ID is a standard field in the TLS protocol, and all browsers on the market support Session ID. It should be noted that sharing SSL sessions between multiple processes on a single machine is meaningless for a cluster environment. Therefore, it is necessary to implement multi-machine sharing of Session ID here. It can be placed in redis. Nginx provides a module ssl_session_fetch_by_lua_block that specifically handles Session ID.

Session tickets. Session tickets are a supplement to session IDs. The server encrypts the session information into a ticket and sends it to the browser. The browser will send the ticket in the subsequent handshake request. If the server can successfully decrypt and process the ticket, the simplified handshake can be completed. Obviously, the advantage of session tickets is that the server does not need to consume a lot of resources to store session content. However, session tickets are only an extended feature of the TLS protocol. The current support rate is not very wide, only about 60%. It is also necessary to maintain a global key for encryption and decryption, and the security and deployment efficiency of the KEY need to be considered.

Reasonable use of Ocsp stapling

Ocsp stands for Online Certificate Status Check Protocol (rfc6960), which is used to query the CA site for certificate status, such as whether the certificate has been revoked or expired. Usually, the browser uses the OCSP protocol to initiate a query request, the CA returns the certificate status content, and then the browser accepts the status of whether the certificate is credible.

As shown below, this is the Ocsp implementation process :

This process is very time-consuming, because the CA site may be located abroad, resulting in unstable network and long RTT. Is there any way to not directly request OCSP content from the CA site?

OCSP stapling can achieve this function. The principle of OCSP stapling is that the server completes the CA certificate verification process on behalf of the client, saving the user's time. When the browser initiates the client hello, it will carry a certificate status request extension. After seeing this extension, the server will return the OCSP content directly to the browser to complete the certificate status check.

Since the browser does not need to query the CA site directly for the certificate status, this feature significantly improves access speed.

HTTPS solution: grayscale launch

Grayscale launch can follow the three principles of grayscale, downgrade, and open and close. The grayscale principle means that the entire launch process should be grayscaled according to region, version, and user level, and the user data collected by grayscale will determine the progress of the entire plan. The downgrade principle ensures that each step of the operation is reversible and rollback, that is, open to expansion and closed to modification, which is the cornerstone of reusable design.

HTTPS switch control

In terms of HTTPS switch control, Suning mainly builds three switches: content management, CDN, and client:

  • Content management switch. The function of the content management switch is to ensure that all links under operation and maintenance can be replaced.
  • CDN switch. Every page needs to do a 301 redirect from HTTP to HTTPS, and these redirects are configured in the CDN.
  • Client switch. This is the switch for the mobile acceleration SDK.

New problems encountered during the launch process

After completing the switch control, we encountered some new problems during the official launch process, such as: Referrer, DNS hijacking, HTTPS performance monitoring , etc.

Referrer

Currently, most browsers do not send Referrer information by default when protocol downgrade occurs. The most typical scenario is when clicking a link from an HTTPS page to an HTTP website, the browser does not include the Referrer field in the request header. When the Referrer is not included, it has a huge impact on big data because it is impossible to trace the source of traffic.

For modern browsers, this problem can be solved by adding the meta tag to the page:<meta name="referrer" content="always" />

DNS Hijacking

DNS hijacking refers to the illegal destruction of the domain name resolution process, which causes the request to be resolved to an incorrect node in order to achieve some malicious purpose. When we use HTTP, DNS anomalies may not affect the functionality of the request, but HTTPS will definitely not be able to respond because the illegal node does not have a certificate and private key.

Suning's approach is to monitor the normality of DNS through some wave tests. As shown in the figure below, we detected that there were a large number of DNS resolution anomalies in a certain area of ​​Suning Chinese Specialty Store .

DNS hijacking has a great impact on users. Once a page cannot be opened, users will think that there is something wrong with the page and will not visit it a second time.

The following figure shows the problems that occurred in Suning.com’s Hebei region :

As shown in the picture, the entire frame of the page is there, but there is no picture. It was finally determined that it was caused by DNS hijacking.

The solution here is to establish a complete risk control system, build wave test nodes across the country, record the entire request image and page, and save them, as shown below:

At that time, after the user in Hebei region sent a request, TCP could not establish a connection, and SSL could not handshake. The reason was DNS hijacking, which was mapped to an illegal wrong node. The solution was the downgrade method I just mentioned, through IP

If it is a Hebei Mobile user, HTTPS will be downgraded to HTTP, and HTTPS will continue to be used in other places. After the local operator solves the problem, it will be restored.

HTTPS Performance Monitoring

The following figure shows the monitoring page of Suning.com mobile terminal :

The most important thing about HTTPS grayscale is good monitoring. There must be a monitoring coverage. To do grayscale well, you must analyze business, performance, on-site and off-site delivery, CPS and other data at each step of the online process. After all data analysis is normal, gradually expand the area and deploy according to the APP version and user level.

HTTPS Future Outlook

Here is a low-latency Internet transport layer protocol based on UDP: QUIC (Quick UDP Internet Connection). The TCP/IP protocol family is the foundation of the Internet. This UDP protocol was proposed by Google and is intended to replace the TCP protocol. Of the two protocols, UDP is lighter and has much less error checking, but its reliability is weaker than TCP. At present, for the QUIC protocol, some foreign companies are emphasizing that they can ensure security and ensure that the handshake will not affect the original transmission during the trial stage. This may be the future development direction.

【About the Author】

[[203117]]

Zhu Yiquan, an architect at Suning Cloud Commerce IT Headquarters, has participated in the construction of a panoramic application performance monitoring platform, mobile terminal performance optimization and improvement, and the construction of a unified mobile access layer. He has led the launch and optimization of HTTPS for the entire Suning.com website. He focuses on application layer network performance optimization and has extensive experience in HTTPS, HTTP/2, and other fields. His goal is to ensure fast, stable, and secure communications in complex network environments through optimization.

The above content is compiled based on Mr. Zhu Yiquan ’s speech at the WOTA2017 special session on “Technical Challenges Behind E-commerce Promotions”.

Due to space constraints, some encryption suites, SPDY&HTTP/2, HTTP/2 stress testing tools, SSL hardware accelerator cards, etc. are not shared in this article. For more details, please see: Video & PPT

Top 3 hot articles recommended this month

Things about hybrid cloud: how to make public cloud and private cloud achieve 1+1>2

AR/VR experts tell you: What will the future of human-computer interaction look like?

Detailed explanation by a senior architect with 15 years of experience: Microservice transformation practice of a large Internet company

[51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites]

<<:  The 9th Aiti Tribe Technical Clinic

>>:  Apple 2017 Fall Conference: A sneak peek at new features of Apple Watch

Recommend

The latest media operation tools in history (121 types)

The core essence of new media is actually tools ....

What is the advertising alliance’s collection behavior?

Many webmasters and channels that do CPA advertis...

Case | Marketing suggestions for 6 types of mobile financial users

Based on the research and analysis answers given ...

How dirty are gyms? Here's one more reason not to go to the gym!

When you exercise in the gym, do you carry alcoho...

How to solve lactose intolerance in milk?

Milk has become an indispensable drink for our da...

Why is Seahorse Dad called "the best husband in the animal world"?

Every spring, many animals start preparing for re...