Android GC Principle Exploration

Android GC Principle Exploration

[[191646]]

Preface

The idea of ​​writing an article about Android GC came from investigating a Meizu phone's image sliding freeze problem. The frame drop and freeze problems caused by continuous GC made us think of many solutions, so we planned to take a detailed look at the principles of memory allocation and GC, why there is continuous GC, what is the difference between GC ALLOC and GC COCURRENT, whether there is a way to expand the heap memory and reduce the frequency of GC, etc.

1. JVM memory recovery mechanism

1.1 Recycling Algorithm

Mark and Sweep GC

Starting from the "GC Roots" set, the entire memory is traversed once, retaining all objects that can be directly or indirectly referenced by the GC Roots, and the remaining objects are treated as garbage and recycled. This algorithm requires interrupting the execution of other components in the process and may cause memory fragmentation.

Copying algorithm

The existing memory space is divided into two blocks, and only one of them is used at a time. During garbage collection, the surviving objects in the memory being used are copied to the unused memory block. After that, all objects in the memory block being used are cleared, the roles of the two memories are swapped, and garbage collection is completed.

Mark-Compact

First, all reachable objects need to be marked starting from the root node. However, after that, it does not simply clean up the unmarked objects, but compresses all live objects to one end of the memory. After that, it cleans up all the space outside the boundary. This method avoids the generation of fragmentation and does not require two identical memory spaces. Therefore, it is more cost-effective.

Generation

All newly created objects are placed in a memory area called the young generation. The characteristic of the young generation is that objects are quickly recycled. Therefore, a more efficient replication algorithm is selected for the young generation. When an object survives after several recyclings, the object will be placed in a memory space called the old generation. The replication algorithm is suitable for the new generation, while the mark-compression algorithm is used for the old generation.

1.2 Differences between copy and mark-compact algorithms

At first glance, there seems to be no big difference between these two algorithms. Both are marked and then moved to another memory address for recycling. So why do different generations use different recycling algorithms?

In fact, the biggest difference between the two is that the former exchanges space for time while the latter exchanges time for space.

The former does not have independent "mark" and "copy" phases when working, but combines them into one action, called scavenge (or evacuate, or just copy). That is, every time a live object that has not been visited in this collection is found, it is directly copied to a new location and a forwarding pointer is set at the same time. This way of working requires more space.

The latter requires separate mark and compact phases when working. The mark phase is used to find and mark all live objects, and then the compact phase moves objects to achieve the purpose of compaction. If the compaction method is sliding compaction, then after marking, objects can be "slid" to one side of the space one by one in order. Because the object graph in the entire space has been traversed first and all live objects are known, they can be moved in the same space without the need for an extra space.

Therefore, the recycling of the new generation will be faster, and the recycling of the old generation will take longer. At the same time, the application will be suspended during the compression phase, so we should try to avoid objects appearing in the old generation.

2. Dalvik virtual machine

2.1 Java Heap

The Java heap is actually composed of an Active heap and a Zygote heap. The Zygote heap is used to manage various objects preloaded and created by the Zygote process during startup, while the Active heap is created before the Zygote process forks the first child process. All application processes started later are forked by the Zygote process and have their own Dalvik virtual machine. In the process of creating an application, the Dalvik virtual machine uses the COW strategy to copy the address space of the Zygote process.

COW strategy: At the beginning (before copying the address space of the Zygote process), the application process and the Zygote process share the same heap for allocating objects. When the Zygote process or the application process writes to the heap, the kernel will perform a real copy operation, so that the Zygote process and the application process each have their own copy, which is called COW. Because copying is very time-consuming, it must be avoided as much as possible or as little as possible.

In order to achieve this goal, when the first application process is created, the part of the heap memory that has been used will be divided into one part, and the part of the heap memory that has not been used will be divided into another part. The former is called the Zygote heap, and the latter is called the Active heap. In this way, you only need to copy the contents of the zygote heap to the application process. In the future, whether it is the Zygote process or the application process, when they need to allocate objects, they will do it on the Active heap. In this way, the Zygote heap can be written as little as possible, thereby reducing the copy-on-write operation. The objects allocated in the Zygote heap are actually mainly the classes, resources, and objects preloaded by the Zygote process during the startup process. This means that these preloaded classes, resources, and objects can be shared for a long time in the Zygote process and the application process. This can not only reduce copy operations, but also reduce the demand for memory.

2.2 Some indicators related to GC

I remember that when we were optimizing the gc lag problem of a Meizu phone, we found that it was easy to trigger GC_FOR_MALLOC. This GC category will be discussed later. It is caused by insufficient memory for allocating objects. But why is there still insufficient memory when we set a large heap size? Here we need to understand the following concepts: the starting size, maximum size, and growth limit of the Java heap.

When starting the Dalvik virtual machine, we can specify the above three values ​​through the three options -Xms, -Xmx and -XX:HeapGrowthLimit. The above three values ​​represent

  • Starting Size: When the Dalvik virtual machine starts, an initial heap memory is allocated for the virtual machine to use.
  • Growth Limit: This is the maximum heap size limit that the system gives to each program. If this limit is exceeded, the program will be OOM
  • Maximum Size: The maximum heap memory size in an uncontrolled situation. The maximum heap size that can be obtained from the system when we use the largeheap attribute

In addition to the above three indicators, there are several other indicators that are worth our attention, namely the minimum free value (Min Free), the maximum free value (Max Free) and the target utilization (Target Utilization). Assuming that after a certain GC, the size of the memory occupied by the surviving objects is LiveSize, then the ideal size of the heap should be (LiveSize / U). However, (LiveSize / U) must be greater than or equal to (LiveSize + MinFree) and less than or equal to (LiveSize + MaxFree). After each GC, the garbage collector will try to make the heap utilization closer to the target utilization. So when we try to manually generate some objects of several hundred KB and try to expand the available heap size, it will lead to frequent GC, because the allocation of these objects will cause GC, and after GC, the heap memory will return to the appropriate proportion, and the local variables we use will be recycled soon. In theory, there are still so many surviving objects, and our heap size will be reduced and cannot achieve the purpose of expansion. At the same time, this is also a factor in the generation of CONCURRENT GC, which we will talk about in detail later.

2.3 Types of GC

  • GC_FOR_MALLOC: Indicates that GC is triggered by insufficient memory when allocating objects on the heap.
  • GC_CONCURRENT: When the heap memory of our application reaches a certain amount, or it can be understood that it is almost full, the system will automatically trigger a GC operation to release memory.
  • GC_EXPLICIT: Indicates that the GC is triggered when the application calls the System.gc or VMRuntime.gc interface or receives the SIGUSR1 signal.
  • GC_BEFORE_OOM: Indicates that the GC is triggered by the utmost effort before the OOM exception is thrown.

In fact, the three types of GC, GC_FOR_MALLOC, GC_CONCURRENT and GC_BEFORE_OOM, are all triggered during the object allocation process. The main difference between concurrent and non-concurrent GC is that the former conditionally suspends and wakes up non-GC threads during the GC process, while the latter always suspends non-GC threads during the GC process. Parallel GC can make the application more responsive by conditionally suspending and waking up non-GC threads. However, at the same time, parallel GC needs to perform the operations of marking root set objects and recursively marking objects that have been accessed during the GC process one more time, so it also requires more CPU resources. Later in the concurrent and non-concurrent GC of Art, we will also emphasize the difference between the two.

2.4 Object Allocation and GC Triggering Timing

  1. Call the function dvmHeapSourceAlloc to allocate memory of the specified size on the Java heap. If the allocation is successful, the allocated address is returned directly to the caller. The function dvmHeapSourceAlloc allocates memory without changing the current size of the Java heap, which is a lightweight memory allocation action.
  2. If the memory allocation in the previous step fails, a GC needs to be executed at this time. However, if the GC thread is already running, that is, the value of gDvm.gcHeap->gcRunning is equal to true, then the function dvmWaitForConcurrentGcToComplete is directly called to wait until the GC is completed. Otherwise, the function gcForMalloc needs to be called to execute a GC. The parameter false means not to recycle the objects referenced by the soft reference objects.
  3. After the GC is executed, the function dvmHeapSourceAlloc is called again to try a lightweight memory allocation operation. If the allocation is successful, the allocated address is returned directly to the caller.
  4. If the memory allocation in the previous step fails, you have to consider setting the current size of the Java heap to the maximum value of the Java heap specified when the Dalvik virtual machine is started before allocating memory. This is achieved by calling the function dvmHeapSourceAllocAndGrow.
  5. If the memory allocation is successful by calling the function dvmHeapSourceAllocAndGrow, the allocated address is returned directly to the caller.
  6. If the memory allocation in the previous step still fails, a drastic measure must be taken at this time. The function gcForMalloc is called again to perform GC. The parameter true indicates that the objects referenced by the soft reference object are to be recycled.
  7. After the GC is executed, the function dvmHeapSourceAllocAndGrow is called again to allocate memory. This is the last effort, and the success and success end here.

The example picture is as follows:

From this process, we can see that GC will be triggered during object allocation. If the first object allocation fails, we will trigger GC but will not recycle the Soft reference. If the allocation fails again, we will also recycle the Soft memory. The GC triggered by the former is the GC_FOR_MALLOC type GC, and the latter is the GC_BEFORE_OOM type GC. When the memory allocation is successful, we will determine whether the current memory usage has reached the GC_CONCURRENT threshold. If it has, GC_CONCURRENT will be triggered again.

So how is this threshold determined? As mentioned above, we will record a target utilization after GC. Theoretically, this value should be within the above range. If it is not, we will select the boundary value as the target value. The virtual machine will record this target value as the total memory that can be allocated. At the same time, a fixed value (200~500K) is subtracted from the target value and used as the threshold for triggering the GC_CONCURRENT event.

2.5 Recycling Algorithms and Memory Fragmentation

Most mainstream Daviks use the Mark and Sweep recycling algorithm, and some implement copy GC. This is different from HotSpot. The specific algorithm to be used is determined at compile time and cannot be changed dynamically at runtime. If the "WITH_COPYING_GC" option is specified in the command to compile the dalvik virtual machine, compile the "/dalvik/vm/alloc/Copying.cpp" source code - this is the implementation of the copy GC algorithm in Android, otherwise compile "/dalvik/vm/alloc/HeapSource.cpp" - it implements the Mark and Sweep GC algorithm.

Due to the shortcomings of the Mark and Sweep algorithm, it is easy to cause memory fragmentation. Therefore, under this algorithm, when we have a large number of discontinuous small memories, it is still very easy to cause GC when allocating a larger object. For example, when we decode a picture on the phone, the specific situation is as follows:

Therefore, for mobile phones using the Dalvik virtual machine, we must first try to avoid frequently generating many small temporary variables (for example, getView, onDraw and other functions), and secondly, we must try to avoid generating many large objects with long life cycles.

3. ART memory recycling mechanism

3.1 Java Heap

The main components of the Java heap used inside the ART runtime include Image Space, Zygote Space, Allocation Space and Large Object Space. Image Space is used to store some preloaded classes. Zygote Space and Allocation Space have the same functions as the Zygote heap and Active heap in the Dalvik virtual machine garbage collection mechanism.

Large Object Space is a collection of discrete addresses used to allocate large objects, thereby improving GC management efficiency and overall performance, similar to the following figure:

In the GC Log below, we can also see that the LOS information is included in the art GC Log, which helps us check the large memory situation.

3.2 Types of GC

  • kGcCauseForAlloc, GC caused by insufficient memory when allocating memory, in this case GC will stop the world
  • kGcCauseBackground, when the memory reaches a certain threshold, GC will be triggered. This is a background GC and will not cause stop world
  • kGcCauseExplicit, display the GC performed when calling. If art turns on this option, GC will be performed when system.gc is called.
  • More

3.3 Object Allocation and GC Triggering Timing

Since there is basically no difference between memory allocation under Art and under Dalvik, I will just paste the picture and go over it.

3.4 Concurrent and non-concurrent GC

Unlike Dalvik, which has only one recycling algorithm, Art uses different recycling algorithms in different situations. For example, when Alloc memory is insufficient, non-concurrent GC will be used, and when the memory reaches a certain threshold after Alloc, concurrent GC will be triggered. At the same time, the GC strategies in the foreground and background are also different, which we will explain to you one by one later.

Non-concurrent GC

Step 1. Call the member function InitializePhase implemented by the subclass to execute the GC initialization phase.

Step 2. Suspend all ART runtime threads.

Step 3. Call the member function MarkingPhase implemented by the subclass to execute the GC marking phase.

Step 4. Call the member function ReclaimPhase implemented by the subclass to execute the GC recovery phase.

Step 5. Resume the ART runtime thread suspended in step 2.

Step 6. Call the member function FinishPhase implemented by the subclass to execute the GC end phase.

Concurrent GC

Step 1. Call the member function InitializePhase implemented by the subclass to execute the GC initialization phase.

Step 2. Acquire the lock for accessing the Java heap.

Step 3. Call the member function MarkingPhase implemented by the subclass to execute the GC parallel marking phase.

Step 4. Release the lock used to access the Java heap.

Step 5. Suspend all ART runtime threads.

Step 6. Call the member function HandleDirtyObjectsPhase implemented by the subclass to handle objects modified during the GC parallel marking phase.

Step 7. Resume the ART runtime thread suspended in step 4.

Step 8. Repeat steps 5 to 7 until all objects modified during the GC parallel phase are processed.

Step 9. Acquire the lock for accessing the Java heap.

Step 10. Call the member function ReclaimPhase implemented by the subclass to execute the GC recovery phase.

Step 11. Release the lock used to access the Java heap.

Step 12. Call the member function FinishPhase implemented by the subclass to execute the GC end phase.

Therefore, whether it is concurrent or non-concurrent, it will cause the stopworld situation to occur. In the concurrent case, the time of a single stopworld will be shorter, which is the basic difference.

3. Differences between ***rt concurrent and Dalvik concurrent GC

First, you can compare the following two pictures:

Dalvik GC:

Art GC:

What is the difference between Art's concurrent GC and Dalvik's concurrent GC? At first glance, they seem to be similar. Although they do not suspend threads all the time, they will also suspend threads to execute the process of marking objects. By reading relevant documents, we can understand that Art's concurrent GC has three main advantages over Dalvik:

1. Tag yourself

When allocating objects, Art pushes the newly allocated objects into the Allocation Stack described by the member variable allocationstack of the Heap class, thereby reducing the object traversal range to a certain extent.

2. Pre-reading

When marking the memory of the Allocation Stack, the object to be traversed next will be pre-read, and after the object is taken out, other objects referenced by the object will be pushed into the stack until the traversal is completed.

       3. Reduce Pause Time

Other threads will not be blocked in the Mark phase. There will be dirty data in this phase. For example, data that Mark finds will not be used but is used by other threads at this time will also be processed in the Mark phase instead of leaving it for processing in the ***Block. This will also reduce the time for processing dirty data in the subsequent Block phase.

3.6 Foreground and Background GC

Foreground refers to when the application is running in the foreground, and Background refers to when the application is running in the background. Therefore, Foreground GC is the GC executed when the application is running in the foreground, and Background is the GC executed when the application is running in the background.

When the application is running in the foreground, responsiveness is the most important, so the GC executed must be efficient. On the contrary, when the application is running in the background, responsiveness is not the most important, so it is suitable to solve the heap memory fragmentation problem. Therefore, Mark-Sweep GC is suitable as Foreground GC, and Mark-Compact GC is suitable as Background GC.

Due to the Compact capability, fragmentation can be well avoided on Art, which is also a good capability of Art.

3.7 Art is great

In general, Art is much better than Dalvik in GC. Not only is GC more efficient and reduces pause time, but also there is a separate allocation area for large memory in memory allocation. At the same time, there is an algorithm to do memory sorting in the background to reduce memory fragmentation. For developers, we can basically avoid many similar GC-induced lag issues under Art. In addition, according to Google's own data, Art's memory allocation efficiency is 10 times higher than Dalvik, and GC efficiency is 2-3 times higher.

4. GC Log

When we want to use GC logs to track down some possible GC-induced jams, we need to understand the composition of GC logs and what different information means.

4.1 Dalvik GC log

The log format of dalvik is basically as follows:

  1. D/dalvikvm: <GC_Reason> <Amount_freed>, <Heap_stats>, <Pause_time>, <Total_time>
  • gc_reason: It is what we mentioned above, whether it is gc_alloc or gc_concurrent. Knowing different reasons will help us to do different treatments.
  • amount_freed: Indicates how much memory the system has freed through this GC operation
  • Heap_stats: It will show the current free memory ratio and usage (memory occupied by active objects/total memory of the current program)
  • Pause_time: indicates the time the application is paused due to this GC operation. Regarding this pause time, GC operations cannot be performed concurrently before 2.3, that is, when the system is performing GC, the application can only be blocked and wait for the GC to end. Since 2.3, GC operations have been changed to concurrent, that is, the GC process will not affect the normal operation of the application, but it will be briefly blocked at the beginning and end of the GC operation, so there is a subsequent total_time.
  • Total_time: Indicates the total time spent on this GC and the Pause_time above, that is, stop all is different. The freeze time mainly depends on the pause_time above.

4.2 Art GC Log

  1. I/art: <GC_Reason> <Amount_freed>, <LOS_Space_Status>, <Heap_stats>, <Pause_time>, <Total_time>

The basic situation is no different from Dalvik, but there are more GC Reasons and one more OS_Space_Status

  • LOS_Space_Status: Large Object Space, the space occupied by large objects. This part of memory is not allocated on the heap, but still belongs to the application memory space. It is mainly used to manage objects that occupy large memory, such as bitmaps, to avoid frequent heap GC due to large memory allocation.

Written in ***: The pictures are from the Internet, special thanks to Lao Luo.

<<:  AI in Plain Language: Is it really that difficult to understand deep learning? Junior high school math in just 10 minutes

>>:  Best Practices for Customizing Android BaseAdapter

Recommend

2019 New Version of Taobao Express Hot Selling Tutorial for Beginners

Chapter 1: Learning Taobao Express from scratch 1...

How difficult is it to softly land on an alien planet?

So far, the list of members of the "Lunar So...

Are crazy yellow ants so "crazy" because they have two sets of DNA?

There is a kind of yellow ant that makes some irr...

How to use Xiaohongshu to create a hot-selling online celebrity product

To make it easier to understand this article, let...

6 tips to improve push opening rate of APP!

The main purpose of push is to promote activation...