Author | yeconglu A complete Android Native memory leak detection tool mainly consists of three parts: proxy implementation, stack backtrace and cache management. Proxy implementation is the key part to solve the access problem on the Android platform, and stack backtrace is the core element of performance and stability. This article will introduce how to implement Native memory leak monitoring from three aspects:
1. Implementation of proxy memory management functionFirst, let's introduce three solutions for implementing proxy memory management functions:
1. Native Hook(1) Solution comparison: Inline Hook and PLT/GOT HookThere are currently two main Native Hook solutions: Inline Hook and PLT/GOT Hook. Instruction relocation refers to the process of adjusting the relative addresses in a program to point to the correct memory location during the linking and loading process of a computer program. This is because when a program is compiled, it is impossible to predict where in the memory it will be loaded at runtime, so relative addresses are often used to represent memory locations in the compiled program. However, during actual runtime, the program may be loaded to any location in the memory, so during the loading process, all relative addresses in the program need to be adjusted according to the memory address where the program is actually loaded. This process is called relocation. When performing Inline Hook, if the machine code of the target function is modified directly, the relative address of the original jump instruction may be changed, causing the program to jump to the wrong location. Therefore, instruction relocation is required to ensure that the modified instruction can correctly jump to the expected location. (2) Example: Hooking the malloc function in an Android applicationTo better understand the application scenarios of Native Hook, let's look at a real case: Hook malloc function in Android application to monitor the opening operation of the file. ① Inline Hook Implementation In my_malloc, we need to execute the backed-up instructions first, and then jump execution flow back to the remainder of the original malloc function:
There are three difficulties here, which are explained in detail below: ● How to modify the protection attributes of memory pages orig_func_addr & (~(page_size - 1)) This code is used to get the starting address of the memory page containing the orig_func_addr address. Here is a trick: page_size is always a power of 2, so the binary representation of page_size - 1 is that the low bits are all 1 and the high bits are all 0. After inversion, the low bits are all 0 and the high bits are all 1. By performing an AND operation on orig_func_addr and ~(page_size - 1), the low bits of orig_func_addr can be cleared to zero, thereby obtaining the starting address of the memory page. mprotect((void *)page_start, page_size, PROT_READ | PROT_WRITE | PROT_EXEC); This line of code is used to modify the protection attributes of a memory page. The mprotect function can set the protection attributes of a memory area. It accepts three parameters: the starting address of the memory area to be modified, the size of the memory area, and the new protection attributes. Here, we set the protection attributes of the memory page containing the orig_func_addr address to readable, writable, and executable (PROT_READ | PROT_WRITE | PROT_EXEC) so that we can modify the code in this memory page. ● How to restore the original function To restore the original function, we need to save the original machine code before Hook, and then write the saved machine code back to the entry point of the function when we need to restore it. The backup array in the code is used to save the original machine code. In the inline_hook function, we copy the original machine code to the backup array before modifying the machine code. Then, we provide an unhook function to restore the original machine code. When you need to restore the malloc function, you can call the unhook function. It should be noted that this example assumes that the machine code length of the function entry point is 8 bytes. In actual use, you need to determine the length of the machine code according to the actual situation and adjust the size of the backup array and the parameters of the memcpy function accordingly. ●How to implement instruction relocation We take a simple ARM64 assembly code as an example to demonstrate how to relocate instructions. Assume that we have the following target function: We need to insert a jump instruction at the beginning of TargetFunction to jump the execution flow to our HookFunction. To achieve this goal, we need to do the following:
After the above steps, we have successfully inserted a jump instruction to HookFunction in TargetFunction and relocated the jump and data references in the target function. In this way, when executing to TargetFunction, the program will jump to HookFunction for execution, and after executing the overwritten instructions and other custom operations, it will return to the unmodified part of the target function. (2) PLT/GOT Hook ImplementationPLT (Procedure Linkage Table) and GOT (Global Offset Table) are two important tables used to resolve dynamic symbols in shared libraries under Linux. PLT (Procedure Linkage Table): Procedure Linkage Table, used to store the entry address of the function in the dynamic link library. When the program calls a function in a dynamic link library, it will first jump to the corresponding entry in the PLT, and then find the actual function address through the GOT and execute it. GOT (Global Offset Table): Global offset table, used to store the actual addresses of functions and variables in dynamic link libraries. When the program is running, the dynamic linker will fill the actual addresses of functions and variables into the GOT as needed. The entries in the PLT will find the actual addresses of functions and variables through the GOT. In PLT/GOT Hook, we can modify the function address in GOT so that when the program calls a certain function, it actually calls our custom function. In this way, we can add additional logic (such as detecting memory leaks) in the custom function and then call the original function. This method can achieve non-intrusive modification of the program without recompiling the program. RTLD_DEFAULT in the above code is a special handle value, which means to search for symbols in all dynamic link libraries loaded by the current process. When RTLD_DEFAULT is used as the handle parameter of dlsym(), dlsym() will search for the specified symbol in all dynamic link libraries loaded by the current process, not just a specific dynamic link library. (3) Let’s look at the difference between Inline Hook and Got HookThe key point is that the address returned by dlsym has different meanings in the two Native Hook implementations: ① Inline Hook The address returned by dlsym is the actual address of the function in memory, which usually points to the entry point of the function (that is, the first instruction of the function). ②Got Hook dlsym returns the address of the malloc function in the GOT. Note that void **got_func_addr is a double pointer. 2. Using LD_PRELOADUsing LD_PRELOAD, you can overload memory management functions without modifying the source code. Although this method has many limitations on the Android platform, we can also understand the relevant principles. LD_PRELOAD is an environment variable used to preload dynamic link libraries when a program is running. By setting LD_PRELOAD, we can force the loading of a specified library when the program is running, thereby changing the behavior of the program without modifying the source code. This method is often used in scenarios such as debugging, performance analysis, and memory leak detection. The principle and method of using LD_PRELOAD to detect memory leaks are as follows: (1) PrincipleWhen the LD_PRELOAD environment variable is set, the program will load the specified library before loading other libraries. This allows us to overload some functions in the original library (such as glibc) in the custom library. In the scenario of memory leak detection, we can overload memory allocation and release functions (such as malloc, calloc, realloc and free) to record relevant information when allocating and releasing memory. (2) Methods:
By using LD_PRELOAD to detect memory leaks, we can dynamically change the behavior of the program without modifying the program source code, record the information of memory allocation and release, and thus detect memory leaks and find out the source of memory leaks. 3. SummaryFinally, we summarize the advantages and disadvantages of the three proxy implementation methods in this section in a table: 2. Detecting Natie memory leaksIn this section, we will introduce the overall idea of detecting Native layer memory leaks based on the proxy implementation of PLT/GOT Hook. 1. Principle IntroductionIn Android, to detect memory leaks in the Native layer, you can rewrite memory allocation and release functions such as malloc, calloc, realloc, and free to record relevant information each time memory is allocated and released. For example, we can create a global memory allocation table to store all allocated memory blocks and their metadata (such as allocation size, allocation location, etc.). Then, when the memory is released, delete the corresponding entry from the memory allocation table. Check the memory allocation table regularly to find out the memory that has not been released. 2. Code ExamplesThe main technical principle of the following code is to rewrite the memory management function and use weak symbols to reference the original memory management function so that relevant information can be recorded each time memory is allocated and released, and these functions can be dynamically found and called when the program is running. Here is a code example: The core logic of the above code includes:
(1) Use weak symbols: Prevent calls to the dlsym function from causing infinite recursionThe dlsym function is used to find symbols in a dynamic link library. However, in glibc and eglibc, the dlsym function may call the calloc function internally. If we are redefining the calloc function and calling the dlsym function in the calloc function to get the original calloc function, infinite recursion will occur. Functions such as __libc_calloc are declared as weak symbols to avoid conflicts with the strong symbol definitions of these functions in glibc or eglibc. Then in the init_original_functions function, we check whether functions such as __libc_calloc are nullptr. If so, it means that glibc or eglibc has not defined these functions, so use the dlsym function to get the addresses of these functions. If not, it means that glibc or eglibc has already defined these functions, so use those definitions directly. (2) Explanation of RTLD_NEXTRTLD_NEXT is a special "pseudo handle" used to find the next symbol in a dynamic link library function. It is often used with the dlsym function to find and call the original (overwritten or intercepted) function. In Linux, if a program links multiple dynamic link libraries, and there are multiple functions with the same name defined in these libraries, then by default, the program will use the first function found. But sometimes, we may need to overwrite a function in another library in one library, and at the same time need to call the original function. At this time, you can use RTLD_NEXT. dlsym(RTLD_NEXT, "malloc") will find the next symbol named "malloc", which is the original malloc function. Then we can call the original malloc function in the custom malloc function. (3) NotesDetecting memory leaks may increase the runtime overhead of the program and may cause some problems related to thread safety. When using this method, we need to ensure that the code is thread-safe and perform memory leak detection without affecting program performance. At the same time, manual detection of memory leaks may not find all memory leaks, so it is recommended that you use other tools (such as AddressSanitizer, LeakSanitizer or Valgrind) to assist in detecting memory leaks. 3. Get the Android Native stackYou may have noticed that in the implementation of Native memory leak detection in the second part, the implementation of record_call_stack was omitted. So we are left with a question: how to record the call stack when allocating memory? In the last section, we will explain how to obtain the Android Native stack. 1. Use the unwind function(3) Tools and methodsFor Android system, we cannot use the backtrace_symbols function directly because it is not implemented in Android Bionic libc. However, we can use the dladdr function instead of backtrace_symbols to get symbol information. Android NDK provides the unwind.h header file, which defines the unwind function, which can be used to obtain the stack information of any thread. (2) Get the stack information of the current threadIf we need to obtain the stack information of the current thread, we can use the unwind function in Android NDK. The following is a sample code for using the unwind function to obtain the stack information: In the above code, the capture_backtrace function uses the _Unwind_Backtrace function to obtain the stack information, and then we use the dladdr function to obtain the base address of the SO library where the function is located (info.dli_fbase), and then calculate the relative address of the function (relative_addr). Then when printing the stack information, the relative address of the function is also printed. (3) libunwind related interfaces① _Unwind_Backtrace _Unwind_Backtrace is a function of the libunwind library that is used to obtain the current thread call stack. It traverses the stack frames and calls the user-defined callback function on each stack frame to obtain the stack frame information (such as function address, parameters, etc.). The function prototype is as follows: parameter:
② _Unwind_GetIP _Unwind_GetIP is a function of the libunwind library, which is used to obtain the instruction pointer of the current stack frame (that is, the return address of the current function). It depends on the underlying hardware architecture (such as ARM, x86, etc.) and the operating system implementation. The function prototype is as follows: parameter:
③ Availability in different Android versions The _Unwind_Backtrace and _Unwind_GetIP functions are defined in the libunwind library, which is part of the GNU C Library (glibc). However, the Android system uses a lightweight C library, Bionic libc, instead of glibc. Therefore, the availability of these two functions in the Android system depends on the Bionic libc and Android system versions. In earlier Android versions (such as Android 4.x), Bionic libc does not fully implement the libunwind library functions, which may cause the _Unwind_Backtrace and _Unwind_GetIP functions to not work properly. In this case, other methods are required to obtain stack information, such as manually traversing the stack frame or using a third-party library. Starting from Android 5.0 (Lollipop), Bionic libc provides more complete libunwind library support, including _Unwind_Backtrace and _Unwind_GetIP functions. Therefore, in Android 5.0 and higher, you can directly use these two functions to obtain stack information. Although these two functions are available in newer versions of Android, their behavior may be affected by compiler optimization, debugging information, etc. In actual use, we need to choose the most appropriate method according to the specific situation. 2. Manually traverse the stack frame to obtain stack informationIn the Android system, the specific implementation of _Unwind_Backtrace depends on the underlying hardware architecture (such as ARM, x86, etc.) and the operating system. It uses architecture-specific registers and data structures to traverse the stack frame. For example, on the ARM64 architecture, _Unwind_Backtrace uses the Frame Pointer (FP) register and the Link Register (LR) register to traverse the stack frame. If _Unwind_Backtrace is not used, we can manually traverse the stack frame to obtain the stack information. (1) Sample code for ARM64 architectureThe following is a sample code based on the ARM64 architecture, showing how to manually traverse the stack frame using the Frame Pointer (FP) register: In the above code, we first get the current FP (x29) and LR (x30) register values. Then, by traversing the FP chain, we get the return address of each stack frame (stored in the LR register). Finally, we use the dladdr function to get the symbol information corresponding to the function address and print the stack information. In this code, *(uintptr_t*)(fp) means to get the value at the memory address pointed to by fp. fp is an unsigned integer, which represents a memory address. (uintptr_t*)(fp) converts fp into a pointer, and then the * operator gets the value pointed to by the pointer. In the ARM64 architecture, a new stack frame is created when a function is called. Each stack frame contains the function's local variables, parameters, return address, and other information related to the function call. Among them, the Frame Pointer (FP) register (x29) saves the FP register value of the previous stack frame, and the Link Register (LR) register (x30) saves the return address of the function. In this code, the fp variable stores the FP register value of the current stack frame, which is the frame base address of the previous stack frame. Therefore, *(uintptr_t*)(fp) takes the FP register value of the previous stack frame, which is the frame base address of the previous stack frame. This value is used to update the fp variable when traversing the stack frame so that the previous stack frame can be processed in the next loop. (2) Sample code for ARM architectureIn the ARM architecture, we can use the Frame Pointer (FP) register (R11) and the Link Register (LR) register (R14) to manually traverse the stack frame. The following is an example code based on the ARM architecture, showing how to manually traverse the stack frame to obtain stack information: In this sample code, we first get the current FP (R11) and LR (R14) register values. Then, by traversing the FP chain, we get the return address of each stack frame (stored in the LR register). Finally, we use the dladdr function to get the symbol information corresponding to the function address and print the stack information. From the above sample code, we can see that the method of manually traversing the stack frame to obtain stack information on different architectures is roughly the same, except that the registers and data structures are different. This method provides a way to obtain stack information without using _Unwind_Backtrace, which helps us better understand and debug the program. (3) RegisterIn the function call process, fp (Frame Pointer), lr (Link Register) and sp (Stack Pointer) are three key registers. The relationship between them is as follows:
Fp, lr and sp work together during the function call process to achieve correct function calls and return. Fp is used to locate data in the stack frame, lr saves the return address of the function, and sp is responsible for managing the stack space. When traversing the stack frame to obtain stack information, we need to use the relationship between these three registers to locate the position and content of each stack frame. (4) Stack frameStack Frame is an important concept in function calling. Each time a function is called, a new stack frame is created on the stack. The stack frame contains local variables, parameters, return addresses, and some other information related to function calls. The following figure is a standard function calling process:
Each function call saves EBP and EIP to restore function stack frames when returned. All saved EBPs here are like a linked list pointer, constantly pointing to the EBP that calls the function. In Android system, the basic principle of stack frame is the same as that of other operating systems. Through the stack frame defined by SP and FP, the SP and FP of the parent function can be obtained, thereby obtaining the stack frame of the parent function (PC, LR, SP, FP will press the stack at the first time of the function call). In this way, you can obtain the call order of all functions. In ARM64 and ARM architectures, we can use FP chains (frame pointer chains) to traverse the stack frames. The specific method is: start from the current FP register and traverse upward along the FP chain until we encounter a null pointer (NULL) or an invalid address. During the traversal process, we can extract the return address (stored in the LR register) and other related information from each stack frame. (5) Name ManglingThe symbol information of the Native stack may be somewhat different from the function names defined in the code, because the symbol table generated by GCC has some modification rules. C++ supports function overloading, that is, the same function name can have different parameter types and numbers. In order to distinguish these functions during compilation, GCC will modify the function name to generate a unique symbol name. The modified name contains information such as function name, parameter type, etc. For example, for the following C++ functions: After GCC modification, the generated symbol may be similar to: _ZN4test3fooEid, where:
IV. Practical suggestionsThrough the detailed introduction in the previous article, we have learned about how to implement three aspects of Android Native memory leak monitoring: including proxy implementation, detection of Native memory leaks and obtaining Android Native stack. Finally, let’s take a look at the comparison of some existing memory leak detection tools and give some practical suggestions. 1. Comparison of Native memory leak detection toolsIn practical applications, we need to choose the most suitable solution based on the specific scenario. The first three tools in the table below are ready-made, but have certain limitations, especially not suitable for online use. 2. Practical adviceIn actual projects, we can combine multiple memory leak detection solutions to improve the detection effect. Here are some suggestions:
V. ConclusionDuring the development and testing phase, we can use tools such as ASan, LSan and Valgrind to detect memory leaks. In online environments, these tools are not suitable for direct use because of their performance overhead. In this case, we can use manual detection methods, combined with code review and good programming habits to minimize the occurrence of memory leaks. However, these tools do not guarantee that all memory leaks will be detected. The discovery and repair of memory leaks requires a deep understanding of the code and good programming habits. Only in this way can we effectively prevent and resolve memory leaks, thereby improving the stability and performance of our applications. |
<<: How to use scroll offset of ScrollView in SwiftUI
>>: iOS 18 has been updated again, bringing many new features!
Startup companies have few resources, a shortage ...
If a product is created to solve a certain pain p...
Wu Xiaobo once proposed that " private domai...
Now more and more companies are paying attention ...
[[127166]] On January 30, the award ceremony of &...
A marketing activity that uses video as the main ...
Recently, Douyin released the "2020 Spring F...
As early as in native Android 9.0, the traditiona...
There are three ultimate questions in philosophy:...
Faced with the dog-abuse Qixi Festival that has l...
1. Mini Programs combined with official accounts ...
Data analysis is difficult, as difficult as climb...
Who doesn’t have hundreds of friends on WeChat th...
[51CTO.com Quick Translation] iOS 10 looks good, ...
Today, taking Volcano as an example, let’s take a...