Five-minute technical talk | A brief discussion on Android application startup optimization methods

Five-minute technical talk | A brief discussion on Android application startup optimization methods

The difficulty of startup speed optimization is closely related to the specific app. Basically, as the number of users and business increases, the difficulty of startup optimization also increases. Therefore, different developers often have very different understandings of startup optimization due to the different apps they face. This article makes an in-depth analysis of startup optimization work, from the definition of startup optimization problems, to the detailed decomposition of the problems, to the specific optimization steps and the tools that need to be used, to help developers efficiently solve startup performance problems. Except for the tool part in the article, which is targeted at the Android platform, the rest of the thinking should be universal.

Part 01

Problem Definition  

Startup optimization is a very common task. When many developers hear this term, they will basically subconsciously explain it as: "Startup optimization is to improve the startup speed of the app." This understanding is the most direct and simple, but it only covers part of the content of startup optimization.

The overall startup optimization work can be summarized as follows: Under certain system resources , optimize the application startup process so that the key path of application startup can maximize the use of system resources and remain within a stable range for a long time.

Any problem needs to be defined first to help us reduce complexity and determine the scope of solution implementation. For startup optimization, the first thing to do is to define the starting and ending points of the app to be optimized. The definition of the starting and ending positions may be different for different apps. We usually recommend starting from the perspective of user experience to make a definition that suits each situation.

Part 02

Data is the most important

Data is the top priority in our optimization work. It guides the direction of optimization work. Optimization work should be carried out based on the principle of discovering problems from data and ultimately verifying solutions to problems through data .

Therefore, before implementing the optimization strategy, the first thing to do is to collect the startup phase data as detailed as possible, including the current startup time, the time taken to start each sub-phase, the time taken by various asynchronous tasks, the CPU and other system resource usage during the startup phase, the process and thread status, etc. All this data needs to be collected and analyzed from different dimensions.

The work of data sorting and collection can help us sort out the current indicators on the one hand, and on the other hand, the process of sorting data can deepen developers' understanding of the entire system startup process from the code level, help developers grasp the main context of system startup, understand it clearly, and establish the relationship between data items and code logic. When a certain data item fluctuates, developers can directly locate the code module that may have problems.

Part 03

Optimization ideas

Looking at the startup speed optimization work alone, we can start from two directions: business process optimization and system resource utilization optimization .

3.1 Business process optimization

Business process optimization is to optimize the business processes involved in the startup phase. The business process optimization mentioned here has two meanings:

  • Non-essential tasks, that is, tasks that do not affect the app's usability, can be delayed as much as possible.
  • It is necessary to ensure the effective connection of subtasks on the key startup path, and there is no situation where the execution time of a task is too long, causing other tasks to wait.

Point 1 should be easy to understand. Let’s explain point 2:

During the overall startup process, we need to find the critical path of the startup process, which is often composed of multiple different processes. If there is a waiting situation in the tasks and processes on the critical path, then it is the place we need to optimize.

For example, an app with a splash screen ad at startup needs to fetch ad data at startup and enter the main interface after the ad display is finished. Then the acquisition of ad data and the display after acquisition can be understood as two related subtasks in the startup phase. If the ad data is still not returned after the main thread enters the ad display phase, there will be a waiting time for the ad data to return, that is, there is no effective connection between the subthreads (tasks).

In the business process optimization part, the most important thing people pay attention to is the management of various startup tasks. Many shared articles introduce the design of the asynchronous management framework for startup tasks. I will not repeat this part here. We will add other aspects related to task management:

  • Process management, and startup task management of different processes. In the Android system, most large apps are multi-process apps, so the initialization timing of each process outside the main process should become the focus of startup optimization, which will also be mentioned in system resource optimization. In addition, the task framework needs to be able to support different startup tasks for different processes, and the same startup task needs to support different execution modes (synchronous or asynchronous) in different processes.
  • The startup framework should have statistical functions for task execution time, task waiting time, and overall task throughput, and can even adjust the number of concurrent threads based on the data.
  • The startup process is divided into different stages, and the tasks are also executed in different stages.

3.2 System Resource Optimization

Business process optimization and system resource utilization optimization are not separate. Business process optimization is to ensure that we use system resources most effectively, while system resource utilization optimization allows us to examine whether the current business process is reasonable from another perspective.

Many data indicators of system resources are difficult to measure, such as how to measure the CPU utilization rate during the startup phase, how to measure the execution efficiency of threads, etc. Therefore, it is difficult to collect online data for this part of the work. Developers should focus on improving development tools and discovering problems through tools.

System resource optimization mainly depends on the following indicators:

  • Whether the CPU is used reasonably, pay attention to thread lock competition, the main thread obtains the CPU time slice, the main thread execution status (runnable, running, sleeping), whether the main process obtains insufficient CPU time, whether the thread is over-competing, etc. This involves the discovery and handling of lock competition, the handling of thread priority, the control of process startup timing, whether IO blocks threads, and other optimization directions.
  • Whether the IO usage is reasonable (including network and local IO), pay attention to whether there is frequent IO, whether there are unnecessary IO operations in the startup phase, whether IO causes the main thread to be blocked, whether there is large file reading and writing, etc.
  • Thread status, including whether the thread pool is used reasonably, whether the number of threads is too many, whether there are delays in task collaboration between threads, etc.
  • Memory situation, mainly observe whether there are too many GC problems, whether the heap memory is occupied too much during the startup phase, etc.

The above two directions of work basically cover most of the work content of startup speed index optimization. It is worth pointing out that the above work does not mean that all aspects are optimized once and for all, but it is necessary to look back frequently, because as the optimization work progresses, the startup status will change (for example, locks that did not conflict before may start to conflict).

Another point worth noting is that for each specific optimization strategy, developers should conduct sufficient testing in a local experimental environment, such as evaluating the benefits of the optimization strategy for high-end, medium-end, and low-end models and different network types. Due to the different online models and complex network conditions, many times the strategy cannot achieve the expected effect after it is launched, so it is best to compare the observed data through online AB experiments for each optimization.

3.3 Continuous Performance Guarantee - Fighting Entropy

Another important aspect of startup optimization is how to continuously ensure that the current optimization effect does not deteriorate with the iteration of functions. Our optimization of processes and resource usage is essentially to ensure a certain orderliness in the code execution process. This orderliness ensures the resource utilization rate of the startup key link. However, with the increase of code and the increasing number of app businesses, this orderliness is very easy to break. According to the law of increasing entropy, if we do not take measures, the system will definitely develop in a chaotic direction. That is to say, no matter how much effort you have put into optimization before, the system performance will always gradually deteriorate. If you do not deal with this deterioration trend, all your efforts will be wasted.

So how do we fight against entropy increase? In our daily lives, a road section is under repair from time to time. This repair is a way to fight against entropy increase, and it is also the strategy we should take in engineering - discovering and solving problems. However, in the code, the discovery of problems is not so intuitive. The data can tell us that the performance has deteriorated, but it often takes us a lot of effort to locate the specific point of deterioration. So in terms of continuous assurance, what we need to do is to find problems as early as possible and locate and solve them as soon as possible.

The continuous assurance mechanism requires the establishment of a laboratory performance testing environment:

  • Ability to establish a performance baseline, timely discover performance issues caused by code integration in daily development, directly locate the MR that causes the problem, and effectively reduce the complexity of locating the problem;
  • Continuously monitor online performance data during the grayscale phase, and check the performance impact of the version before going online;
  • Establish data indicators and alarm mechanisms for online performance to monitor performance issues caused by online function changes;
  • Phased data indicators directly reflect the impact of changes on performance and narrow the scope of problem location.

In general, for startup performance optimization, we need to establish the workflow shown in the figure below to ensure the efficiency, effectiveness and sustained effect of the optimization work:

picture

Above we mainly listed the thinking directions for starting the optimization work, and did not cover all the optimization points under these directions. It is worth pointing out that when we formulate optimization strategies, we need to adapt to local conditions according to the situation of each app , find problem points according to our own situation, and determine the priority of optimization strategies, which strategies should be prioritized and which strategies are unnecessary to implement (considering ROI); at the same time, we need to evaluate the benefits of each strategy in advance, rather than listing all strategies and optimizing them one by one.

Part 04

Optimization Tools (Android)

If you want to do your work well, you must first sharpen your tools. In the optimization work, we often encounter problems like the following:

  • The startup process is long, has many tasks, and has complex codes. It takes a lot of effort to sort out the overall startup process.
  • It is difficult to measure system resources during the startup process, such as CPU usage, multi-thread lock contention, memory, IO, etc.
  • Unable to accurately locate time-consuming functions or bottleneck processes during the startup process;
  • Before the optimization strategy is launched, it is impossible to accurately measure the specific benefits after implementing the optimization strategy;
  • ......

For thread optimization, IO optimization and other special projects, there are tools like Matrix, which can help us find problems. I would like to focus on the Perfetto tool provided by Android, because it can help us understand the entire system startup process and the usage of various system resources. And based on the API provided by Perfetto, we can develop powerful automatic analysis tools to help us produce automated tools in finding problems, locating problems, and evaluating the effectiveness of optimization strategies.

4.1 Introduction to Perfetto Function

First of all, the visualization tools provided by Perfetto can help us analyze the situation of the app running for a period of time from various different angles. Here are a few simple examples:

Check the CPU time occupied by each process over a period of time:

picture

Check the CPU usage of threads in different processes over a period of time:

picture

Display thread execution status information over a period of time:

picture

In addition to the above examples, visualization tools can also analyze multi-thread lock contention problems, file IO problems, Heap memory changes , etc., which are very helpful for daily optimization work. The above are just examples, and many functions still need to be used in practice to further experience.

The real power of Perfetto is not the visualization tools it provides, but the ability to collect and analyze data:

  • The collected system data was organized and exposed to developers in the form of data tables that can support SQL queries, which in turn supports our customized analysis of the data in SQL data tables.
  • Provides a Tracing SDK, which allows app developers to add custom events to perfetto-trace files and use the TracingSDK during analysis;
  • Provides a Python API so that developers can perform trace-analysis on trace files based on Python.

In fact, Perfetto provides us with a set of tools that can automatically analyze app performance and discover performance problems . If this automation capability is combined with our daily development pipeline, it will be of great help in early detection of performance problems and prevention of degradation.

As the application version iterates, the application startup data will change accordingly, so startup optimization work also needs long-term iteration to ensure the continuous guarantee of the startup experience. This article mainly summarizes the ideas, implementation methods and tools of startup optimization work. In actual work, the optimization problems faced by applications with different user scales are also different. I hope the summary here can inspire you.

<<:  iOS 17 releases dual system updates!

>>:  Microsoft optimizes Teams Shifts app to send shift notifications to designated employees

Recommend

What are the real user needs?

It is said that making products must meet user ne...

Trend Li Gongzi Practical Training Camp (Phase II)

Resource introduction of Trend Li Gongzi Practica...

GitLab acquires GitHost, making cloud management more convenient

[[132946]] On April 26, GitLab acquired GitHost. ...

Is Apple finally going to add facial recognition to the MacBook Pro?

According to foreign media iLounge, in 2020, Appl...

YouTube reveals: 5-second ads earn more than 120-second ads

Last weekend, I opened an app and was about to wa...

Operational tips: 5 ways to quickly increase community activity

I often hear people say that they joined a commun...

What is the 400 number used for? What is the consultation number 400?

What is the 400 number used for? What is the cons...