How to improve JD shopping cart performance by 30%

How to improve JD shopping cart performance by 30%

01 Background

Challenges facing shopping carts:

1) New business: As business forms become more diverse, shopping carts are constantly supporting new businesses, and the number of external interfaces they rely on is also increasing;

2) Downward migration: some interfaces called by the front end are moved down to the shopping cart middle platform;

3) Pre-placement: Many businesses in the checkout process are pre-placed in the shopping cart, such as coupons and Jingdou;

4) Expansion: To improve the user experience, the number of items that the shopping cart can accommodate is constantly increasing;

As a result, the number of RPC interfaces and paging calls that shopping carts rely on are increasing. As the beginning of the transaction process, the shopping cart itself has a large amount of traffic. In the context of complex business, how to improve performance and ensure user experience has become a major challenge facing shopping carts.

02Fully asynchronous transformation solution

Although increasing server resources can solve the problem to a certain extent, it will bring about a large cost overhead and is contrary to the spirit of craftsmanship. Can performance be improved through technical means? Through analysis, asynchronous transformation has become an effective means to solve this problem.

1) Different RPCs in parallel

The shopping cart relies on dozens of interfaces, and there are complex dependencies between the interfaces. We must first sort out the dependencies between the interfaces and identify which ones can be run in parallel. Then we split the original code into two parts: RPC asynchronous request and result processing. According to the dependency relationship, we can maximize the parallel execution of RPC and reduce the asynchronous response waiting time in the result processing stage, thereby achieving the purpose of improving performance.

2) Batch interface multi-page parallel

Most of the shopping cart dependent interfaces are batch interfaces, and there is a data volume limit for a single call, so the data needs to be split into multiple paging calls. Then multiple paging can also be parallelized. The transformation encapsulates the asynchronous paging tool so that the business layer is unaware of the paging logic. The asynchronous tool automatically splits the data that exceeds the interface limit into multiple paging parallel calls to improve the response speed of a single interface.

3) The underlying layer uses JSF asynchronous calls

The asynchronous call is based on JD RPC framework JSF. It is recommended to use version 1.7.5 or later, which supports CompletableFuture.

03Problems and Solutions

The overall solution for asynchronous transformation is not complicated, but many details were encountered during the actual implementation:

1) Exception retry needs to be refined

When calling synchronously, if the timeout occurs, the call will be retried. If it is changed to asynchronous, the retry will fail because there is generally no error during the call. It is necessary to wait for the asynchronous response timeout in the result processing stage before retrying.

In addition, when multiple pages are running in parallel, when a page request times out, only the page with the error should be retried. The bottom layer encapsulates the paging call, and the upper layer business code cannot sense which page has timed out when getting data. Therefore, the field information must be saved in the packaging class during the asynchronous call and returned to the business layer together. After the Get data times out, the page with the error should be retried separately.

When an exception occurs, not all situations require retry. When encountering exceptions such as current limiting, retry is not possible. The underlying tool needs to automatically filter current limiting exceptions, and of course supports custom rules.

2) Asynchronous RPC monitoring is more complicated

The underlying RPC time consumption monitoring needs to be split into two parts. The start time is recorded when the paging call is made, and the end time is recorded after the asynchronous result arrives. If the call is abnormal or the Get timeout occurs, the call failure needs to be marked. The call time also needs to be recorded for retries, and the normal call and the retry call need to be recorded separately.

In addition to monitoring the RPC time consumption, you also need to monitor the Get waiting time in the result processing phase, which is the time that really affects the application performance. Since the underlying layer is a paging call, the number of business calls and the number of underlying RPC calls are not the same.

3) Paginated asynchronous results cannot be merged, otherwise the abnormal Provider information cannot be obtained

The result of the underlying asynchronous call must be returned to the upper layer as is through the wrapper class. In addition to the need for single-page retry mentioned above, another reason is that the asynchronous result must be retained so that the timed-out Provider information can be output after the paging timeout. This is because the Provider information depends on the JSFCompletableFuture of the JSF framework. If the results are merged at the bottom layer, the information will be lost.

4) Each page timeout needs to be controlled separately

The paging call process is shown in the figure above. When processing the results, the Get timeout for each page needs to be controlled separately, because the results are obtained sequentially. When obtaining the subsequent pages, the waiting time of the previous pages should also be calculated to ensure that the entire result acquisition time does not exceed the maximum timeout of a single page. The calculation formula is as follows:

Timeout = RPC timeout > (current time - asynchronous call start time) ? RPC timeout – (current time - asynchronous call start time) : 0

5) Paging balance

To avoid data skew caused by too little data on the last page, the requested data needs to be evenly distributed to each page to maximize the performance of the entire request.

04Benefits

After the transformation, the time consumption of the core interface of the shopping cart was reduced by 30%, ensuring user experience and saving a lot of server resources. When new RPC interfaces are added later, as long as they are on the non-critical path of the call topology, there will be no significant impact on the performance of the shopping cart. In addition, when the capacity is increased, except for a few interfaces that cannot be called in paging, the impact on performance is relatively small.

<<:  Google launches a set of iOS 16 lock screen widgets for iPhone

>>:  How to formulate a global framework for B-side projects? Take this director-level experience!

Recommend

Operation means having a love relationship with users!

My understanding of user operations Operational w...

How to improve operational decision-making efficiency? I shared 3 methods

It’s strange to say, but my public account is oft...

Baidu bidding nanny-level tutorial is here!

This article will be a nanny-level tutorial, cove...

Xiaohongshu promotion strategy: the road to brand self-upgrade!

1. Explosive growth Xiaohongshu has been extremel...

Where does iFly fly to?

iFlytek has become very popular recently and has ...

A list of plants that are mispronounced | Is it an orange or a tangerine?

The Chinese names of plants are like refined code...

There are so many advertising channels, how do you choose?

Faced with the intensified competition of product...