From no idea where to start to countermeasures, Suning Financial's mobile login optimization method

From no idea where to start to countermeasures, Suning Financial's mobile login optimization method

[Original article from 51CTO.com] The ancients said: The most difficult things in the world must be done from the easy, and the most important things in the world must be done from the details; this means that to solve difficult problems, you must plan from when they are easy to solve, and to do big things, you must start from the small, and the same is true for software engineering.

During the login optimization process, Suning Financial adheres to the idea of ​​"doing the difficult rather than the easy, and doing the big rather than the small", and has achieved improvements in login response time, success rate and user experience.

This article mainly introduces the principles and techniques of Suning Financial Mobile Login Optimization from the following aspects:

  • Find the way to login optimization
  • Improve system monitoring and measurement
  • Combing every aspect of login
  • Optimize each aspect of login

Find the way to login optimization

With the rapid growth of the number of users and the increasing number of business system accesses, we often receive feedback from users: slow login response, login being kicked, etc. It is very difficult for the project team to figure out how to solve this problem, which feels like looking for a needle in a haystack.

Our team is thinking about these two questions:

  • When a login exception occurs, why do we need to spend a lot of time and manpower to restore the scene?
  • Why did our developers only discover the abnormality after the login?

"It is easier said than done," and what we think is that we should put aside the optimization of a single point and start from the easy part, collecting data first, tracing user usage behavior and system runtime snapshots, improving system monitoring and measurement, and establishing an end-to-end full-link monitoring system, thereby establishing a full-link data indicator system.

"To achieve the big things, one must start with the details." Then sort out each link of the entire process, and then prescribe the right remedy for each detail in the process to break them down one by one.

Improve system monitoring and measurement

To build an end-to-end full-link monitoring system, we divide the monitoring system into three small stages:

  • Client Data Collection
  • End-to-end link is established
  • Gateway and backend microservice monitoring are connected

The details are as follows:

Figure 1: End-to-end monitoring system link diagram

Improve client monitoring collection

The client automatically collects information such as response time, success rate, system runtime snapshot information in case of exceptions, and user access tracks for all login methods.

When a network anomaly occurs, use a network monitoring gadget to collect relevant network anomaly data and conduct targeted monitoring of network anomalies with a higher probability of anomalies.

Open end-to-end links

A global uniqueFlag is generated each time the client makes a network request, uploaded to the client monitoring system and transmitted to the API gateway system as client request information, thus achieving full-link log connection between the client and the backend.

Connect the backend service system monitoring

The API gateway system and the backend service system use the self-developed RSF microservice calling framework. Each time a microservice is called, traceID runs through all microservices on the link, collects information logs such as backend business logic, exception stack, runtime snapshots, etc., and asynchronously sends them to the monitoring system.

Both the API gateway system and the single sign-on system are connected to the self-developed second-level monitoring system, and can provide rapid feedback on system problems within seconds.

[[232896]]

Figure 2: Mobile terminal login to the second-level monitoring system dashboard

Combing every aspect of login

The login of Suning Financial Mobile Terminal involves many systems, which are in a complex hardware and network environment, as shown in the following figure:

Figure 3: Factors affecting Suning Financial mobile login

There are many factors involved, including the client, network, backend systems, etc., as follows:

  • User network status - whether the user network is connected and available
  • User network quality - user network access method, whether it is a weak network environment
  • Client login related logic processing
  • Client Cookie Management
  • Carrier DNS Service
  • CDN Services
  • Network link quality
  • Backend system environment
  • Backend system calls
  • Back-end system login-related business logic
  • Backend system cookie management
  • ...

Optimize each aspect of login

Optimizing network links

After establishing end-to-end full-link monitoring, we found that analyzing problems that were previously difficult to locate is no longer so difficult.

Table 1: Login response time before optimization

As shown in the figure above, from the monitoring data: the network time consumption is obviously too long, proving that the network quality of our Suning Financial App is not good.

So, the first thing is to improve the network quality. There are several ways:

Upgrading CDN services

Due to historical issues, our previous client network library did not support SNI, resulting in the connection to an old version of the acceleration platform that did not support SNI extensions. The old platform not only had fewer edge nodes but also had a smaller coverage area.

The client immediately updated the network infrastructure library to support SNI and also switched our financial CDN service to the new acceleration platform.

Compared with the old platform, the new platform has a qualitative improvement, with the following features: high efficiency, many nodes, wide coverage, and support for HTTP/2.0.

Use HTTP/2, multiplexing, and accelerated transmission

HTTP/2 uses binary frame format instead of text format for transmission, breaking through the limitation of concurrent requests and achieving complete multiplexing, which maximizes data transmission efficiency and link building multiplexing benefits. HTTP2 provides the best network performance in 3G/4G/Wi-Fi networks.

Figure 4: HTTP/2.0 acceleration effect

After implementing network link optimization, monitoring data showed that the network link performance was improved by an average of 200ms, but we were not very satisfied.

Merge network requests and remove all redirects

Can the network link be further optimized? Through link analysis, the network link involves user equipment, basic operator network, CDN vendors, and server performance. It is very difficult to further optimize the network, and the room for improvement is limited.

“It is more difficult to plan than to do it easily”, so we are wondering, can we reduce uncertain network interactions and optimize by adjusting the core login process?

By analyzing the logs collected by the monitoring system, it is found that each time the client interacts with the backend system three times (two redirection requests). The timing diagram is as follows:

Figure 5: Login core process sequence diagram before optimization

Obviously, we need to do subtraction, and the easiest solution is to reduce redirection. By transforming the single sign-on system and API gateway system, the three network requests between the client and the backend system are reduced to one network request.

The previous client redirect request was changed to the API gateway system calling other backend systems through RPC remote services. The timing diagram is as follows:

Figure 6: Login core process timing diagram after optimization

After merging the network request optimization, the effect is surprising. The login response time is as follows:

Table 2: Login response time comparison after optimization

Standardize Cookie Management

The optimization results are very good, but the login success rate is still not very high. Combined with the analysis of the monitoring system, there are mainly the following reasons:

  • Various network errors: Due to the particularity of login scenarios, user network anomalies and network instability often occur.
  • The cookie is invalid or empty. Case 1: The user has not used it for more than 15 days, and the user login cookie expires normally, which belongs to the normal login business logic.

Case 2: Abnormal cookies are invalid or empty. Through analysis, it is mainly caused by the chaotic cookie management of multiple systems on the client and backend, which is also the main reason for login being kicked.

How can we achieve more standardized management? We mainly standardize the front-end and back-end cookie management from the following details:

Standardize Cookie Management of Backend Systems

The back-end system (API gateway system, single sign-on system) standardizes the settings of various cookie attributes, especially the domain, path, expires and other attributes for unified and standardized management.

Standardize client cookie management, changing from manual management to automatic system management

The client needs to support login status management of various business systems, including native and H5.

Due to the differences between each business system, Android's previous design was to manually process cookies, resulting in particularly high subsequent maintenance costs.

After optimization, the Android end uses the system framework CookieManager to manage cookies and uses UC's CookieManager to store cookies. As shown below:

Figure 7: Optimized Android Cookie Management

After standardizing Cookie management, the login success rate has increased significantly. See the figure below:

Table 3: Comparison of login success rates after optimization

From looking for a needle in a haystack and not knowing where to start when login anomalies occur, to quickly and accurately locating the problem; from sorting out each link in the login process to optimizing each step of the login process, we always adhere to the concept of "the difficult is easier than the easy, and the big is more important than the small"!

Conclusion

The road ahead is long and arduous, but I will keep exploring! Thanks to the unremitting efforts of the project team, the login of Suning Financial App has become more efficient and stable.

However, the pursuit of the best experience in the spirit of geeks to create the best species has never ended! In the future, we will continue to work hard to explore the intelligent monitoring system, in-depth optimization of network links, and new interactive methods, and never stop!

Author: Zhang Xudong, Gu Yu

Introduction: Zhang Xudong is a senior engineer at Suning Financial R&D Center, mainly responsible for mobile terminal performance monitoring and performance optimization. He has more than 7 years of work experience in social, e-commerce, education, payment, finance and other related industries, and has a deep understanding and rich experience in mobile Internet technology research and development.

Gu Yu, a senior server engineer at Suning Financial R&D Center, has 9 years of experience in mobile Internet development and is good at server architecture and planning. He is currently responsible for Suning Financial's mobile gateway architecture and has a deep and systematic understanding of the planning, requirements, and experience of Internet servers.

[51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites]

<<:  As the entrance to 1 billion traffic, will mini-program games be the key?

>>:  The difference between programmers in large companies and programmers in small companies

Recommend

Why is the iPhone SE considered Apple's weakest link?

For the iPhone SE released earlier this year, App...

Don’t do these 4 things to your children during the holidays!

The Spring Festival is here. According to traditi...

Buyout or subscription? Apple is worried about the future of the App Store

[[240568]] In the past, when I bought apps in the...

Kuaishou live broadcasting techniques and process

Background of Kuaishou Jewelry Industry Jewelry p...

Use this trick well and your user conversion rate will increase exponentially!

Let me tell you a story first: Smart little begga...

Your analysis may not be correct. Logical reasoning has many traps.

[[149777]] People like to use a single chain of l...

TikTok from 0 to 1 basic course

TikTok 0 to 1 Basic Course Resource Introduction:...

A complete method for running a good event, with 12 cases

Even if the product's features and experience...

The purest horse in the world is actually the "Corgi"? !

As the saying goes, "Lü Bu among men, Red Ha...