Author: Qin Bingbing & Song Zhiyang I. SummaryThis article starts with the CI build performance degradation caused by the upgrade of Feishu Android to JDK 11. Combining the new features of higher-version JDK in Docker containers and GC, this article digs deep into the source code implementation of JVM and Gradle, and introduces the analysis process and repair methods in detail for reference by other teams that upgrade JDK. II. BackgroundRecently, when Feishu adapted to Android 12, the targetSdkVersion and compileSdkVersion were changed to 31. After the change, the following build problem occurred. Many people on StackOverflow have encountered the same problem. The simple and non-invasive solution is to upgrade the JDK version used for building from 8 to 11. Feishu currently uses AGP 4.1.0. Considering that the upgrade to AGP 7.0 in the future will require JDK 11, and the new version of AS has already laid the foundation, the JDK version used for construction has also been upgraded to 11. 3. ProblemAfter the upgrade, many students reported that the sub-warehouse component release (i.e., AAR release) was very slow, and the overall market indicators did rise a lot. In addition to the obvious increase in the sub-warehouse component distribution index, the weekly routine analysis of indicators found that the main warehouse packaging index also increased significantly, from 17m to 26m, an increase of about 50%. Analysis1. The main warehouse packaging and sub-warehouse component delivery have become single-threadedThe sub-warehouse component distribution indicators and the main warehouse packaging indicators both deteriorated to the peak on 06-17. The 10 slowest builds of the main warehouse packaging on 06-17 were found for analysis. A big discovery from the initial analysis is that all 10 builds were single-threaded. The normal build was concurrent. The same is true for sub-warehouse component publishing, which changes from concurrent publishing to single-threaded publishing. 2. Concurrency changes to single thread are related to JDK upgradeI checked the properties related to concurrent builds and found that org.gradle.parallel has always been true and has not changed. Then I compared the machine information and found that the concurrent build used JDK 8 with 96 available cores, and the single-threaded build used JDK 11 with 1 available core. Preliminary analysis shows that the problem should be here. After upgrading from JDK 8 to JDK 11, the concurrent build became a single-threaded build, which resulted in a significant increase in build time. In addition, the changes for upgrading to JDK 11 were merged into the mainline on 06-13, and the build time on 06-14 increased significantly, which is consistent with the time. 3. Overall concurrency was restored, but the indicators did not decreaseTo restore concurrent builds, it's easy to think of another related property, org.gradle.workers.max. Since the number of available cores on PCs and servers is different, in order to avoid hard-coding, we tried to dynamically specify the --max-workers parameter during CI packaging. After setting the parameter, the main warehouse packaging resumed concurrent builds, and the sub-warehouse component publishing also resumed concurrency. However, after observing the market indicators for a week, it was found that the construction time did not drop significantly, and stabilized at 25 m, much higher than the previous level of 17 m. 4. The time spent on key tasks has not decreasedAfter detailed analysis, we found that the trends of ByteXTransform (ByteX is an open source bytecode processing framework based on AGP Transform launched by Byte. It optimizes Transform performance by integrating multiple Transforms that execute repeated IO serially into one Transform and concurrent processing Class, see relevant materials for details) and DexBuilder are consistent with the overall trend of the build. They have remained at a high level since 06-21 and have not fallen back. ByteXTransform has degraded by about 200 s, and DexBuilder has degraded by about 200 s. Moreover, these two tasks are executed serially, and together they have degraded by about 400 s, which is close to the overall degradation of 9 m for the build. The GC situation has not improved since 06-21. 5. The API for obtaining the number of CPU cores has changedFurther analysis revealed that other Transforms (due to historical reasons, some Transforms have not yet been connected to ByteX) did not degrade, only ByteXTransform significantly degraded by 200s. Considering that ByteXTransform uses concurrency to process classes internally, while other Transforms process classes in a single thread by default, the troubleshooter located a line of code that might cause problems. When debugging DexBuilder, it was found that the core logic convertToDexArchive was also executed concurrently. Considering that although the concurrent build was restored using --max-workers, the OsAvailableProcessors field is still 1, and this field is obtained in the source code through the following API ManagementFactory.getOperatingSystemMXBean().getAvailableProcessors() ManagementFactory.getOperatingSystemMXBean().getAvailableProcessors() has the same effect as Runtime.getRuntime().availableProcessors(), and the underlying method is also a Native method. Based on the above, it may be that the Native implementation of JDK 11 caused the API for obtaining the number of cores to return 1, resulting in the fact that although the overall build has restored concurrency, ByteXTransform and DexBuilder that rely on APIs for concurrency settings still have problems, which in turn causes the time consumption of these two tasks to not decrease. Calling these two APIs directly in the .gradle script to verify the above inference, we found that the number of cores returned really changed from 96 to 1. In addition, some students found that not all CI builds were degraded, only the CI builds using Docker containers were significantly degraded, while the builds in the native Linux environment were normal. Therefore, the native implementation of obtaining the number of cores may be related to the Docker container. The reason for GC degradation is the same. Use -XX:+PrintFlagsFinal to print all JVM parameters to verify the inference. It can be seen that the single-threaded build uses SerialGC, GC becomes single-threaded, and the multi-core advantage cannot be used, and the GC time consumption accounts for a high proportion. The concurrent build uses G1GC, and ParallelGCThreads = 64, ConcGCThreads = 16 (about 1/4 of ParallelGCThreads), GC concurrency is high, taking into account Low Pause and High Throughput, so the GC time consumption is naturally low. // GC-related parameter values for single-threaded builds // GC-related parameter values for concurrent builds 6.Native source code analysisNext, we analyze the native implementation of obtaining the number of available cores in JDK 8 and JDK 11. Since AS uses OpenJDK by default, we use the OpenJDK source code for analysis. JDK 8 Implementation JDK 11 Implementation By default, JDK 11 does not set the number of available cores and enables containerization, so the number of available cores is determined by OSContainer::active_processor_count(). By querying the CPU parameters in the Docker environment and substituting them into the calculation logic, it is easy to conclude that the number of available cores is 1, which causes the Native method to return 1 cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us 5. Repair1. Set relevant JVM parametersTo summarize the above analysis, the core of the problem is that the return value of the API for obtaining the number of cores in JDK 11 has changed under the default parameter configuration of the Docker container. The default value of the org.gradle.workers.max property during Gradle build, the number of threads of ByteXTransform, the maxWorkers set by DexBuilder, the OsAvailableProcessors field, and the GC method all rely on the API for obtaining the number of cores. When building with JDK 8, the API returns 96, and when building with JDK 11, it returns 1. The idea of fixing is to make JDK 11 also return 96 normally. From the source code, there are two main ways to fix this problem: Set -XX:ActiveProcessorCount=[count] to specify the number of available cores for the JVM Set -XX:-UseContainerSupport to disable containerization in JVM Set -XX:ActiveProcessorCount=[count]According to Oracle's official documentation and source code, you can specify the number of available cores of the JVM to affect the Gradle build. This method is suitable for scenarios where processes are resident, to prevent resources from being infinitely occupied by a Docker instance. For example, if resources are not limited for a resident process of a Web service, when there is a bug in the program or a large number of requests appear, the JVM will continue to request resources from the operating system, and the process will eventually be killed by Kubernetes or the operating system. Set -XX:-UseContainerSupportAccording to Oracle official documentation and source code, containerization can be disabled by explicitly setting -XX:-UseContainerSupport. Instead of setting the number of CPUs through Docker container-related configuration information, the number of CPUs is set directly by querying the operating system. This method is suitable for scenarios where the build task does not take long, and resources should be dispatched to the maximum extent to quickly complete the build task. Currently, CI is a short-term build task. When the task is completed, the Docker instance will be cached or destroyed as appropriate, and the resources will be released. Selected ParametersFor CI builds, you can query the number of available cores of the physical machine and then set -XX:ActiveProcessorCount. However, based on the usage scenario, we chose the simpler setting -XX:-UseContainerSupport to improve build performance. 2. How to set parametersSetting via command lineThis is the first method that comes to mind, but after executing the command "./gradlew clean, app:lark-application:assembleProductionChinaRelease -Dorg.gradle.jvmargs=-Xms12g -Xss4m -XX:-UseContainerSupport", I found something unexpected. Although the time consumption of the OsAvailableProcessors field and ByteXTransform has returned to normal, the overall build is still single-threaded and the time consumption of DexBuilder has not dropped. This is related to Gradle's build mechanism.
One thing that is hard to understand here is that ByteXTransform and DexBuilder are both tasks executed in the daemon process. Why did ByteXTransform return to normal, but DexBuilder did not? Because ByteXTransform actively calls the API internally and can obtain the correct number of cores, ByteXTransform can be executed concurrently; but DexBuilder is scheduled by the Gradle Worker API (see related materials for details), and the maxWorkers during execution is set passively (passed by the wrapper process to the daemon process). If you specify the number of cores for the wrapper process through -XX:ActiveProcessorCount=[count] and then breakpoint, you will find that maxWorkers = count. Therefore, when the wrapper process does not disable containerization, the number of cores obtained is 1, and DexBuilder will execute in a single thread, so it will not return to normal. One point raised above is that since both the overall build and DexBuilder are scheduled by the Gradle Worker API, why did the overall build resume concurrency when "./gradlew clean, app:lark-application:assembleProductionChinaRelease --max-workers=96" was executed on CI before, but DexBuilder still did not return to normal? Because the concurrency of DexBuilder is affected not only by maxWorkers but also by numberOfBuckets. For Release packages, the input of DexBuilder is the output (minified.jar) of upstream MinifyWithProguard (not MinifyWithR8, because R8 is explicitly turned off). Minified.jar will be divided into numberOfBuckets ClassBuckets, and each ClassBucket will be set to DexWorkAction as part of DexWorkActionParams. Finally, DexWorkAction is submitted to the thread assigned by WorkerExecutor to complete the conversion from Class to DexArchive. By default, numberOfBuckets = DexArchiveBuilderTask#DEFAULT_NUM_BUCKETS = Math.max(12 / 2, 1) = 6 Although the maxWorkers of DexBuilder is set to 12 through --max-workers, the daemon process is containerized by default, and the number of available cores obtained through Runtime.getRuntime().availableProcessors() is 1. Therefore, numberOfBuckets is not 6 as expected but 1. Therefore, when converting to dex, classes cannot be grouped and processed concurrently, resulting in the time consumption of DexBuilder not returning to normal. The same logic applies to CI, and numberOfBuckets is changed from 48 to 1, which greatly reduces the concurrency. Therefore, to restore the overall concurrency of the build and restore the time consumption of DexBuilder to normal, it is also necessary to restore the maxWorkers received by the daemon process to normal, that is, to let the wrapper process obtain the correct number of cores. This effect can be achieved by setting DEFAULT_JVM_OPTS for the gradlew script in the project root directory. Therefore, when the following build command is finally executed, both the wrapper process and the daemon process can obtain the correct number of cores through the API, so that the overall build, ByteXTransform, DexBuilder, and OsAvailableProcessors fields are restored to normal. However, the above command works fine when executed in the CI Docker container, but it will report that UseContainerSupport cannot be recognized when executed on the local Mac. This problem can be solved by dynamically setting parameters by judging the build machine and environment (local Mac, CI Linux native environment, CI Docker container), but it is obviously troublesome. Setting via environment variablesLater I found that the environment variable JAVA_TOOL_OPTIONS will be detected when the JVM is created. After a simple setting, it is effective for both the wrapper process and the daemon process, and can also solve all the above problems. Select the setting methodCompared with the above two setting methods, a simpler one is chosen here, which is to set -XX:-UseContainerSupport through environment variables. 3. Both new and old branches are availableDue to the business characteristics of Feishu itself, the old branches also need long-term maintenance. The old branches have build logic that is incompatible with JDK 11. In order for both the new and old branches to produce packages normally, it is necessary to dynamically set the JDK version used for the build. In addition, UseContainerSupport was introduced in JDK 8u191 (that is, higher versions of JDK 8 also have the above problem. When the education team upgraded AGP 4.1.0, they upgraded JDK to 1.8.0_332 and encountered the above problem). If it is directly set to JDK 1.8.0_131, it will not be recognized, resulting in the inability to create JVM. Therefore, Feishu's final solution is to dynamically set the JDK version used for building according to the branch, and explicitly set JAVA_TOOL_OPTIONS to -XX:-UseContainerSupport only when using JDK 11. For other teams, if the old branch can be built normally with JDK 11, you can choose to use JDK 11 by default and build the Docker image with this environment variable built in, without modifying the build logic. 6. EffectThe modifications were merged after 22:00 on 06-30, and the overall build time on 07-01 dropped significantly, returning to the level before 06-13 (when the JDK 11 upgrade was merged). The time consumption of ByteXTransform and DexBuilder also returned to the previous level, the build indicators returned to normal, the OsAvailableProcessors field also returned to normal, the GC situation returned to normal, and the world was quiet again. VII. ConclusionAlthough the problem of build performance degradation was finally solved, there are still many points that can be improved in the entire process of introducing problems -> discovering problems -> analyzing problems. For example, more thorough testing of changes to basic build tools (including Gradle, AGP, Kotlin, JDK) can detect problems in advance, a complete anti-degradation mechanism can effectively intercept problems, differentiated monitoring alarms can detect degradation in time, and a powerful automatic attribution mechanism can provide more input for problem analysis. We will continue to improve these aspects to provide a better R&D experience. |
<<: How to manage your Android code with Gerrit?
>>: Third-party libraries are not required in the project
The login page of the Android 11 developer websit...
What I want to share with you today is: In the cu...
Scenario restoration: A music APP offers a 7-day ...
SEM is the abbreviation of Search Engine Marketin...
How to dominate the Baidu screen with 10,000 word...
In early March this year, Wang Xing mentioned in ...
In the past few days, a lot of sharing and suppor...
In many video games, you will always encounter th...
What role does new media marketing play? How to s...
In general, app operations are divided into two a...
Keep has currently gone through 5 rounds of finan...
“How do I optimize my account?” “How to solve the...
Course Contents: 1. Don't flirt with girls li...
The most profitable industry nowadays is the &quo...
If you want to seize the dividends of the industr...