CI build performance issues caused by upgrading JDK 11 on Feishu Android

CI build performance issues caused by upgrading JDK 11 on Feishu Android

Author: Qin Bingbing & Song Zhiyang

I. Summary

This article starts with the CI build performance degradation caused by the upgrade of Feishu Android to JDK 11. Combining the new features of higher-version JDK in Docker containers and GC, this article digs deep into the source code implementation of JVM and Gradle, and introduces the analysis process and repair methods in detail for reference by other teams that upgrade JDK.

II. Background

Recently, when Feishu adapted to Android 12, the targetSdkVersion and compileSdkVersion were changed to 31. After the change, the following build problem occurred.

Many people on StackOverflow have encountered the same problem. The simple and non-invasive solution is to upgrade the JDK version used for building from 8 to 11.

Feishu currently uses AGP 4.1.0. Considering that the upgrade to AGP 7.0 in the future will require JDK 11, and the new version of AS has already laid the foundation, the JDK version used for construction has also been upgraded to 11.

3. Problem

After the upgrade, many students reported that the sub-warehouse component release (i.e., AAR release) was very slow, and the overall market indicators did rise a lot.

In addition to the obvious increase in the sub-warehouse component distribution index, the weekly routine analysis of indicators found that the main warehouse packaging index also increased significantly, from 17m to 26m, an increase of about 50%.

Analysis

1. The main warehouse packaging and sub-warehouse component delivery have become single-threaded

The sub-warehouse component distribution indicators and the main warehouse packaging indicators both deteriorated to the peak on 06-17. The 10 slowest builds of the main warehouse packaging on 06-17 were found for analysis.

A big discovery from the initial analysis is that all 10 builds were single-threaded.

The normal build was concurrent.

The same is true for sub-warehouse component publishing, which changes from concurrent publishing to single-threaded publishing.

2. Concurrency changes to single thread are related to JDK upgrade

I checked the properties related to concurrent builds and found that org.gradle.parallel has always been true and has not changed. Then I compared the machine information and found that the concurrent build used JDK 8 with 96 available cores, and the single-threaded build used JDK 11 with 1 available core. Preliminary analysis shows that the problem should be here. After upgrading from JDK 8 to JDK 11, the concurrent build became a single-threaded build, which resulted in a significant increase in build time. In addition, the changes for upgrading to JDK 11 were merged into the mainline on 06-13, and the build time on 06-14 increased significantly, which is consistent with the time.

3. Overall concurrency was restored, but the indicators did not decrease

To restore concurrent builds, it's easy to think of another related property, org.gradle.workers.max.

Since the number of available cores on PCs and servers is different, in order to avoid hard-coding, we tried to dynamically specify the --max-workers parameter during CI packaging. After setting the parameter, the main warehouse packaging resumed concurrent builds, and the sub-warehouse component publishing also resumed concurrency.

However, after observing the market indicators for a week, it was found that the construction time did not drop significantly, and stabilized at 25 m, much higher than the previous level of 17 m.

4. The time spent on key tasks has not decreased

After detailed analysis, we found that the trends of ByteXTransform (ByteX is an open source bytecode processing framework based on AGP Transform launched by Byte. It optimizes Transform performance by integrating multiple Transforms that execute repeated IO serially into one Transform and concurrent processing Class, see relevant materials for details) and DexBuilder are consistent with the overall trend of the build. They have remained at a high level since 06-21 and have not fallen back. ByteXTransform has degraded by about 200 s, and DexBuilder has degraded by about 200 s. Moreover, these two tasks are executed serially, and together they have degraded by about 400 s, which is close to the overall degradation of 9 m for the build. The GC situation has not improved since 06-21.

5. The API for obtaining the number of CPU cores has changed

Further analysis revealed that other Transforms (due to historical reasons, some Transforms have not yet been connected to ByteX) did not degrade, only ByteXTransform significantly degraded by 200s. Considering that ByteXTransform uses concurrency to process classes internally, while other Transforms process classes in a single thread by default, the troubleshooter located a line of code that might cause problems.

When debugging DexBuilder, it was found that the core logic convertToDexArchive was also executed concurrently.

Considering that although the concurrent build was restored using --max-workers, the OsAvailableProcessors field is still 1, and this field is obtained in the source code through the following API ManagementFactory.getOperatingSystemMXBean().getAvailableProcessors()

ManagementFactory.getOperatingSystemMXBean().getAvailableProcessors() has the same effect as Runtime.getRuntime().availableProcessors(), and the underlying method is also a Native method. Based on the above, it may be that the Native implementation of JDK 11 caused the API for obtaining the number of cores to return 1, resulting in the fact that although the overall build has restored concurrency, ByteXTransform and DexBuilder that rely on APIs for concurrency settings still have problems, which in turn causes the time consumption of these two tasks to not decrease.

Calling these two APIs directly in the .gradle script to verify the above inference, we found that the number of cores returned really changed from 96 to 1.

In addition, some students found that not all CI builds were degraded, only the CI builds using Docker containers were significantly degraded, while the builds in the native Linux environment were normal. Therefore, the native implementation of obtaining the number of cores may be related to the Docker container.

The reason for GC degradation is the same. Use -XX:+PrintFlagsFinal to print all JVM parameters to verify the inference. It can be seen that the single-threaded build uses SerialGC, GC becomes single-threaded, and the multi-core advantage cannot be used, and the GC time consumption accounts for a high proportion. The concurrent build uses G1GC, and ParallelGCThreads = 64, ConcGCThreads = 16 (about 1/4 of ParallelGCThreads), GC concurrency is high, taking into account Low Pause and High Throughput, so the GC time consumption is naturally low.

 // GC-related parameter values ​​for single-threaded builds
bool UseG1GC = false {product} {default}
bool UseParallelGC = false {product} {default}
bool UseSerialGC = true {product} {ergonomic}
uint ParallelGCThreads = 0 {product} {default}
uint ConcGCThreads = 0 {product} {default}
 // GC-related parameter values ​​for concurrent builds
bool UseG1GC = true {product} {ergonomic}
bool UseParallelGC = false {product} {default}
bool UseSerialGC = false {product} {default}
uint ParallelGCThreads = 63 {product} {default}
uint ConcGCThreads = 16 {product} {ergonomic}

6.Native source code analysis

Next, we analyze the native implementation of obtaining the number of available cores in JDK 8 and JDK 11. Since AS uses OpenJDK by default, we use the OpenJDK source code for analysis.

JDK 8 Implementation

JDK 11 Implementation

By default, JDK 11 does not set the number of available cores and enables containerization, so the number of available cores is determined by OSContainer::active_processor_count().

By querying the CPU parameters in the Docker environment and substituting them into the calculation logic, it is easy to conclude that the number of available cores is 1, which causes the Native method to return 1

 cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
cat /sys/fs/cgroup/cpu/cpu.shares

5. Repair

1. Set relevant JVM parameters

To summarize the above analysis, the core of the problem is that the return value of the API for obtaining the number of cores in JDK 11 has changed under the default parameter configuration of the Docker container. The default value of the org.gradle.workers.max property during Gradle build, the number of threads of ByteXTransform, the maxWorkers set by DexBuilder, the OsAvailableProcessors field, and the GC method all rely on the API for obtaining the number of cores. When building with JDK 8, the API returns 96, and when building with JDK 11, it returns 1. The idea of ​​​​fixing is to make JDK 11 also return 96 normally.

From the source code, there are two main ways to fix this problem:

Set -XX:ActiveProcessorCount=[count] to specify the number of available cores for the JVM

Set -XX:-UseContainerSupport to disable containerization in JVM

Set -XX:ActiveProcessorCount=[count]

According to Oracle's official documentation and source code, you can specify the number of available cores of the JVM to affect the Gradle build.

This method is suitable for scenarios where processes are resident, to prevent resources from being infinitely occupied by a Docker instance. For example, if resources are not limited for a resident process of a Web service, when there is a bug in the program or a large number of requests appear, the JVM will continue to request resources from the operating system, and the process will eventually be killed by Kubernetes or the operating system.

Set -XX:-UseContainerSupport

According to Oracle official documentation and source code, containerization can be disabled by explicitly setting -XX:-UseContainerSupport. Instead of setting the number of CPUs through Docker container-related configuration information, the number of CPUs is set directly by querying the operating system.

This method is suitable for scenarios where the build task does not take long, and resources should be dispatched to the maximum extent to quickly complete the build task. Currently, CI is a short-term build task. When the task is completed, the Docker instance will be cached or destroyed as appropriate, and the resources will be released.

Selected Parameters

For CI builds, you can query the number of available cores of the physical machine and then set -XX:ActiveProcessorCount. However, based on the usage scenario, we chose the simpler setting -XX:-UseContainerSupport to improve build performance.

2. How to set parameters

Setting via command line

This is the first method that comes to mind, but after executing the command "./gradlew clean, app:lark-application:assembleProductionChinaRelease -Dorg.gradle.jvmargs=-Xms12g -Xss4m -XX:-UseContainerSupport", I found something unexpected. Although the time consumption of the OsAvailableProcessors field and ByteXTransform has returned to normal, the overall build is still single-threaded and the time consumption of DexBuilder has not dropped.

This is related to Gradle's build mechanism.

  • When the above command is executed, the GradleWrapperMain#main method will be triggered to start the GradleWrapperMain process (hereinafter referred to as the wrapper process)
  • The wrapper process will parse the org.gradle.jvmargs property and pass it to the Gradle Daemon process (hereinafter referred to as the daemon process) through Socket, so the above -XX:-UseContainerSupport is only valid for the daemon process, not the wrapper process. At the same time, the wrapper process will also initialize DefaultParallelismConfiguration#maxWorkerCount and pass it to the daemon process.
  • The daemon process has containerization disabled, so it can get the correct number of cores through the API, so that the OsAvailableProcessors field is correctly displayed and ByteXTransform is executed concurrently; but the wrapper process does not disable containerization, so the number of cores obtained is 1, which is passed to the daemon process, resulting in single-threaded execution of the entire build and DexBuilder.

One thing that is hard to understand here is that ByteXTransform and DexBuilder are both tasks executed in the daemon process. Why did ByteXTransform return to normal, but DexBuilder did not?

Because ByteXTransform actively calls the API internally and can obtain the correct number of cores, ByteXTransform can be executed concurrently; but DexBuilder is scheduled by the Gradle Worker API (see related materials for details), and the maxWorkers during execution is set passively (passed by the wrapper process to the daemon process). If you specify the number of cores for the wrapper process through -XX:ActiveProcessorCount=[count] and then breakpoint, you will find that maxWorkers = count. Therefore, when the wrapper process does not disable containerization, the number of cores obtained is 1, and DexBuilder will execute in a single thread, so it will not return to normal.

One point raised above is that since both the overall build and DexBuilder are scheduled by the Gradle Worker API, why did the overall build resume concurrency when "./gradlew clean, app:lark-application:assembleProductionChinaRelease --max-workers=96" was executed on CI before, but DexBuilder still did not return to normal?

Because the concurrency of DexBuilder is affected not only by maxWorkers but also by numberOfBuckets.

For Release packages, the input of DexBuilder is the output (minified.jar) of upstream MinifyWithProguard (not MinifyWithR8, because R8 is explicitly turned off). Minified.jar will be divided into numberOfBuckets ClassBuckets, and each ClassBucket will be set to DexWorkAction as part of DexWorkActionParams. Finally, DexWorkAction is submitted to the thread assigned by WorkerExecutor to complete the conversion from Class to DexArchive.

By default, numberOfBuckets = DexArchiveBuilderTask#DEFAULT_NUM_BUCKETS = Math.max(12 / 2, 1) = 6

Although the maxWorkers of DexBuilder is set to 12 through --max-workers, the daemon process is containerized by default, and the number of available cores obtained through Runtime.getRuntime().availableProcessors() is 1. Therefore, numberOfBuckets is not 6 as expected but 1. Therefore, when converting to dex, classes cannot be grouped and processed concurrently, resulting in the time consumption of DexBuilder not returning to normal. The same logic applies to CI, and numberOfBuckets is changed from 48 to 1, which greatly reduces the concurrency.

Therefore, to restore the overall concurrency of the build and restore the time consumption of DexBuilder to normal, it is also necessary to restore the maxWorkers received by the daemon process to normal, that is, to let the wrapper process obtain the correct number of cores. This effect can be achieved by setting DEFAULT_JVM_OPTS for the gradlew script in the project root directory.

Therefore, when the following build command is finally executed, both the wrapper process and the daemon process can obtain the correct number of cores through the API, so that the overall build, ByteXTransform, DexBuilder, and OsAvailableProcessors fields are restored to normal.

However, the above command works fine when executed in the CI Docker container, but it will report that UseContainerSupport cannot be recognized when executed on the local Mac. This problem can be solved by dynamically setting parameters by judging the build machine and environment (local Mac, CI Linux native environment, CI Docker container), but it is obviously troublesome.

Setting via environment variables

Later I found that the environment variable JAVA_TOOL_OPTIONS will be detected when the JVM is created. After a simple setting, it is effective for both the wrapper process and the daemon process, and can also solve all the above problems.

Select the setting method

Compared with the above two setting methods, a simpler one is chosen here, which is to set -XX:-UseContainerSupport through environment variables.

3. Both new and old branches are available

Due to the business characteristics of Feishu itself, the old branches also need long-term maintenance. The old branches have build logic that is incompatible with JDK 11. In order for both the new and old branches to produce packages normally, it is necessary to dynamically set the JDK version used for the build.

In addition, UseContainerSupport was introduced in JDK 8u191 (that is, higher versions of JDK 8 also have the above problem. When the education team upgraded AGP 4.1.0, they upgraded JDK to 1.8.0_332 and encountered the above problem). If it is directly set to JDK 1.8.0_131, it will not be recognized, resulting in the inability to create JVM.

Therefore, Feishu's final solution is to dynamically set the JDK version used for building according to the branch, and explicitly set JAVA_TOOL_OPTIONS to -XX:-UseContainerSupport only when using JDK 11. For other teams, if the old branch can be built normally with JDK 11, you can choose to use JDK 11 by default and build the Docker image with this environment variable built in, without modifying the build logic.

6. Effect

The modifications were merged after 22:00 on 06-30, and the overall build time on 07-01 dropped significantly, returning to the level before 06-13 (when the JDK 11 upgrade was merged). The time consumption of ByteXTransform and DexBuilder also returned to the previous level, the build indicators returned to normal, the OsAvailableProcessors field also returned to normal, the GC situation returned to normal, and the world was quiet again.

VII. Conclusion

Although the problem of build performance degradation was finally solved, there are still many points that can be improved in the entire process of introducing problems -> discovering problems -> analyzing problems. For example, more thorough testing of changes to basic build tools (including Gradle, AGP, Kotlin, JDK) can detect problems in advance, a complete anti-degradation mechanism can effectively intercept problems, differentiated monitoring alarms can detect degradation in time, and a powerful automatic attribution mechanism can provide more input for problem analysis. We will continue to improve these aspects to provide a better R&D experience.

<<:  How to manage your Android code with Gerrit?

>>:  Third-party libraries are not required in the project

Recommend

8 hot topics in marketing in 2020!

What I want to share with you today is: In the cu...

Data analysis: How to analyze the effectiveness of activities?

Scenario restoration: A music APP offers a 7-day ...

How to start with data analysis to carry out SEM bidding promotion?

SEM is the abbreviation of Search Engine Marketin...

I have summarized 7 data-driven customer acquisition methods, all here!

In early March this year, Wang Xing mentioned in ...

Huaxiaozhu's user growth strategy

In the past few days, a lot of sharing and suppor...

Weird ways to "die" in games

In many video games, you will always encounter th...

How to plan a new media marketing promotion plan?

What role does new media marketing play? How to s...

How to operate an APP software online?

In general, app operations are divided into two a...

How does keep promote itself? What are the points worth learning from?

Keep has currently gone through 5 rounds of finan...

Wandering Emotion "Love Rival Strategy Video" + "Love Rival Manual PDF"

Course Contents: 1. Don't flirt with girls li...