Preface The idea of writing an article about Android GC came from investigating a Meizu phone's image sliding freeze problem. The frame drop and freeze problems caused by continuous GC made us think of many solutions, so we planned to take a detailed look at the principles of memory allocation and GC, why there is continuous GC, what is the difference between GC ALLOC and GC COCURRENT, whether there is a way to expand the heap memory and reduce the frequency of GC, etc. 1. JVM memory recovery mechanism 1.1 Recycling Algorithm Mark and Sweep GC Starting from the "GC Roots" set, the entire memory is traversed once, retaining all objects that can be directly or indirectly referenced by the GC Roots, and the remaining objects are treated as garbage and recycled. This algorithm requires interrupting the execution of other components in the process and may cause memory fragmentation. Copying algorithm The existing memory space is divided into two blocks, and only one of them is used at a time. During garbage collection, the surviving objects in the memory being used are copied to the unused memory block. After that, all objects in the memory block being used are cleared, the roles of the two memories are swapped, and garbage collection is completed. Mark-Compact First, all reachable objects need to be marked starting from the root node. However, after that, it does not simply clean up the unmarked objects, but compresses all live objects to one end of the memory. After that, it cleans up all the space outside the boundary. This method avoids the generation of fragmentation and does not require two identical memory spaces. Therefore, it is more cost-effective. Generation All newly created objects are placed in a memory area called the young generation. The characteristic of the young generation is that objects are quickly recycled. Therefore, a more efficient replication algorithm is selected for the young generation. When an object survives after several recyclings, the object will be placed in a memory space called the old generation. The replication algorithm is suitable for the new generation, while the mark-compression algorithm is used for the old generation. 1.2 Differences between copy and mark-compact algorithms At first glance, there seems to be no big difference between these two algorithms. Both are marked and then moved to another memory address for recycling. So why do different generations use different recycling algorithms? In fact, the biggest difference between the two is that the former exchanges space for time while the latter exchanges time for space. The former does not have independent "mark" and "copy" phases when working, but combines them into one action, called scavenge (or evacuate, or just copy). That is, every time a live object that has not been visited in this collection is found, it is directly copied to a new location and a forwarding pointer is set at the same time. This way of working requires more space. The latter requires separate mark and compact phases when working. The mark phase is used to find and mark all live objects, and then the compact phase moves objects to achieve the purpose of compaction. If the compaction method is sliding compaction, then after marking, objects can be "slid" to one side of the space one by one in order. Because the object graph in the entire space has been traversed first and all live objects are known, they can be moved in the same space without the need for an extra space. Therefore, the recycling of the new generation will be faster, and the recycling of the old generation will take longer. At the same time, the application will be suspended during the compression phase, so we should try to avoid objects appearing in the old generation. 2. Dalvik virtual machine 2.1 Java Heap The Java heap is actually composed of an Active heap and a Zygote heap. The Zygote heap is used to manage various objects preloaded and created by the Zygote process during startup, while the Active heap is created before the Zygote process forks the first child process. All application processes started later are forked by the Zygote process and have their own Dalvik virtual machine. In the process of creating an application, the Dalvik virtual machine uses the COW strategy to copy the address space of the Zygote process. COW strategy: At the beginning (before copying the address space of the Zygote process), the application process and the Zygote process share the same heap for allocating objects. When the Zygote process or the application process writes to the heap, the kernel will perform a real copy operation, so that the Zygote process and the application process each have their own copy, which is called COW. Because copying is very time-consuming, it must be avoided as much as possible or as little as possible. In order to achieve this goal, when the first application process is created, the part of the heap memory that has been used will be divided into one part, and the part of the heap memory that has not been used will be divided into another part. The former is called the Zygote heap, and the latter is called the Active heap. In this way, you only need to copy the contents of the zygote heap to the application process. In the future, whether it is the Zygote process or the application process, when they need to allocate objects, they will do it on the Active heap. In this way, the Zygote heap can be written as little as possible, thereby reducing the copy-on-write operation. The objects allocated in the Zygote heap are actually mainly the classes, resources, and objects preloaded by the Zygote process during the startup process. This means that these preloaded classes, resources, and objects can be shared for a long time in the Zygote process and the application process. This can not only reduce copy operations, but also reduce the demand for memory. 2.2 Some indicators related to GC I remember that when we were optimizing the gc lag problem of a Meizu phone, we found that it was easy to trigger GC_FOR_MALLOC. This GC category will be discussed later. It is caused by insufficient memory for allocating objects. But why is there still insufficient memory when we set a large heap size? Here we need to understand the following concepts: the starting size, maximum size, and growth limit of the Java heap. When starting the Dalvik virtual machine, we can specify the above three values through the three options -Xms, -Xmx and -XX:HeapGrowthLimit. The above three values represent
In addition to the above three indicators, there are several other indicators that are worth our attention, namely the minimum free value (Min Free), the maximum free value (Max Free) and the target utilization (Target Utilization). Assuming that after a certain GC, the size of the memory occupied by the surviving objects is LiveSize, then the ideal size of the heap should be (LiveSize / U). However, (LiveSize / U) must be greater than or equal to (LiveSize + MinFree) and less than or equal to (LiveSize + MaxFree). After each GC, the garbage collector will try to make the heap utilization closer to the target utilization. So when we try to manually generate some objects of several hundred KB and try to expand the available heap size, it will lead to frequent GC, because the allocation of these objects will cause GC, and after GC, the heap memory will return to the appropriate proportion, and the local variables we use will be recycled soon. In theory, there are still so many surviving objects, and our heap size will be reduced and cannot achieve the purpose of expansion. At the same time, this is also a factor in the generation of CONCURRENT GC, which we will talk about in detail later. 2.3 Types of GC
In fact, the three types of GC, GC_FOR_MALLOC, GC_CONCURRENT and GC_BEFORE_OOM, are all triggered during the object allocation process. The main difference between concurrent and non-concurrent GC is that the former conditionally suspends and wakes up non-GC threads during the GC process, while the latter always suspends non-GC threads during the GC process. Parallel GC can make the application more responsive by conditionally suspending and waking up non-GC threads. However, at the same time, parallel GC needs to perform the operations of marking root set objects and recursively marking objects that have been accessed during the GC process one more time, so it also requires more CPU resources. Later in the concurrent and non-concurrent GC of Art, we will also emphasize the difference between the two. 2.4 Object Allocation and GC Triggering Timing
The example picture is as follows: From this process, we can see that GC will be triggered during object allocation. If the first object allocation fails, we will trigger GC but will not recycle the Soft reference. If the allocation fails again, we will also recycle the Soft memory. The GC triggered by the former is the GC_FOR_MALLOC type GC, and the latter is the GC_BEFORE_OOM type GC. When the memory allocation is successful, we will determine whether the current memory usage has reached the GC_CONCURRENT threshold. If it has, GC_CONCURRENT will be triggered again. So how is this threshold determined? As mentioned above, we will record a target utilization after GC. Theoretically, this value should be within the above range. If it is not, we will select the boundary value as the target value. The virtual machine will record this target value as the total memory that can be allocated. At the same time, a fixed value (200~500K) is subtracted from the target value and used as the threshold for triggering the GC_CONCURRENT event. 2.5 Recycling Algorithms and Memory Fragmentation Most mainstream Daviks use the Mark and Sweep recycling algorithm, and some implement copy GC. This is different from HotSpot. The specific algorithm to be used is determined at compile time and cannot be changed dynamically at runtime. If the "WITH_COPYING_GC" option is specified in the command to compile the dalvik virtual machine, compile the "/dalvik/vm/alloc/Copying.cpp" source code - this is the implementation of the copy GC algorithm in Android, otherwise compile "/dalvik/vm/alloc/HeapSource.cpp" - it implements the Mark and Sweep GC algorithm. Due to the shortcomings of the Mark and Sweep algorithm, it is easy to cause memory fragmentation. Therefore, under this algorithm, when we have a large number of discontinuous small memories, it is still very easy to cause GC when allocating a larger object. For example, when we decode a picture on the phone, the specific situation is as follows: Therefore, for mobile phones using the Dalvik virtual machine, we must first try to avoid frequently generating many small temporary variables (for example, getView, onDraw and other functions), and secondly, we must try to avoid generating many large objects with long life cycles. 3. ART memory recycling mechanism 3.1 Java Heap The main components of the Java heap used inside the ART runtime include Image Space, Zygote Space, Allocation Space and Large Object Space. Image Space is used to store some preloaded classes. Zygote Space and Allocation Space have the same functions as the Zygote heap and Active heap in the Dalvik virtual machine garbage collection mechanism. Large Object Space is a collection of discrete addresses used to allocate large objects, thereby improving GC management efficiency and overall performance, similar to the following figure: In the GC Log below, we can also see that the LOS information is included in the art GC Log, which helps us check the large memory situation. 3.2 Types of GC
3.3 Object Allocation and GC Triggering Timing Since there is basically no difference between memory allocation under Art and under Dalvik, I will just paste the picture and go over it. 3.4 Concurrent and non-concurrent GC Unlike Dalvik, which has only one recycling algorithm, Art uses different recycling algorithms in different situations. For example, when Alloc memory is insufficient, non-concurrent GC will be used, and when the memory reaches a certain threshold after Alloc, concurrent GC will be triggered. At the same time, the GC strategies in the foreground and background are also different, which we will explain to you one by one later. Non-concurrent GC Step 1. Call the member function InitializePhase implemented by the subclass to execute the GC initialization phase. Step 2. Suspend all ART runtime threads. Step 3. Call the member function MarkingPhase implemented by the subclass to execute the GC marking phase. Step 4. Call the member function ReclaimPhase implemented by the subclass to execute the GC recovery phase. Step 5. Resume the ART runtime thread suspended in step 2. Step 6. Call the member function FinishPhase implemented by the subclass to execute the GC end phase. Concurrent GC Step 1. Call the member function InitializePhase implemented by the subclass to execute the GC initialization phase. Step 2. Acquire the lock for accessing the Java heap. Step 3. Call the member function MarkingPhase implemented by the subclass to execute the GC parallel marking phase. Step 4. Release the lock used to access the Java heap. Step 5. Suspend all ART runtime threads. Step 6. Call the member function HandleDirtyObjectsPhase implemented by the subclass to handle objects modified during the GC parallel marking phase. Step 7. Resume the ART runtime thread suspended in step 4. Step 8. Repeat steps 5 to 7 until all objects modified during the GC parallel phase are processed. Step 9. Acquire the lock for accessing the Java heap. Step 10. Call the member function ReclaimPhase implemented by the subclass to execute the GC recovery phase. Step 11. Release the lock used to access the Java heap. Step 12. Call the member function FinishPhase implemented by the subclass to execute the GC end phase. Therefore, whether it is concurrent or non-concurrent, it will cause the stopworld situation to occur. In the concurrent case, the time of a single stopworld will be shorter, which is the basic difference. 3. Differences between ***rt concurrent and Dalvik concurrent GC First, you can compare the following two pictures: Dalvik GC: Art GC: What is the difference between Art's concurrent GC and Dalvik's concurrent GC? At first glance, they seem to be similar. Although they do not suspend threads all the time, they will also suspend threads to execute the process of marking objects. By reading relevant documents, we can understand that Art's concurrent GC has three main advantages over Dalvik: 1. Tag yourself When allocating objects, Art pushes the newly allocated objects into the Allocation Stack described by the member variable allocationstack of the Heap class, thereby reducing the object traversal range to a certain extent. 2. Pre-reading When marking the memory of the Allocation Stack, the object to be traversed next will be pre-read, and after the object is taken out, other objects referenced by the object will be pushed into the stack until the traversal is completed. 3. Reduce Pause Time Other threads will not be blocked in the Mark phase. There will be dirty data in this phase. For example, data that Mark finds will not be used but is used by other threads at this time will also be processed in the Mark phase instead of leaving it for processing in the ***Block. This will also reduce the time for processing dirty data in the subsequent Block phase. 3.6 Foreground and Background GC Foreground refers to when the application is running in the foreground, and Background refers to when the application is running in the background. Therefore, Foreground GC is the GC executed when the application is running in the foreground, and Background is the GC executed when the application is running in the background. When the application is running in the foreground, responsiveness is the most important, so the GC executed must be efficient. On the contrary, when the application is running in the background, responsiveness is not the most important, so it is suitable to solve the heap memory fragmentation problem. Therefore, Mark-Sweep GC is suitable as Foreground GC, and Mark-Compact GC is suitable as Background GC. Due to the Compact capability, fragmentation can be well avoided on Art, which is also a good capability of Art. 3.7 Art is great In general, Art is much better than Dalvik in GC. Not only is GC more efficient and reduces pause time, but also there is a separate allocation area for large memory in memory allocation. At the same time, there is an algorithm to do memory sorting in the background to reduce memory fragmentation. For developers, we can basically avoid many similar GC-induced lag issues under Art. In addition, according to Google's own data, Art's memory allocation efficiency is 10 times higher than Dalvik, and GC efficiency is 2-3 times higher. 4. GC Log When we want to use GC logs to track down some possible GC-induced jams, we need to understand the composition of GC logs and what different information means. 4.1 Dalvik GC log The log format of dalvik is basically as follows:
4.2 Art GC Log
The basic situation is no different from Dalvik, but there are more GC Reasons and one more OS_Space_Status
Written in ***: The pictures are from the Internet, special thanks to Lao Luo. |
>>: Best Practices for Customizing Android BaseAdapter
Plane mirrors can help us adjust our attire and a...
[Mobile software: Bo Ke Yuan] An international re...
Cui Dongshu, secretary general of the China Passe...
Training course content: This set is the most sui...
Chapter 1: Learning Taobao Express from scratch 1...
So far, the list of members of the "Lunar So...
There is a kind of yellow ant that makes some irr...
After reading this article, I hope you can improv...
Modern life is noisy in most situations. If you d...
Industry insiders pointed out that there is no su...
To make it easier to understand this article, let...
Welcome to the 75th issue of the Nature Trumpet c...
The main purpose of push is to promote activation...
Teacher KEEN's self-study course on short vid...
As time goes by, it is already the Lantern Festiv...