one, IntroductionIn the Android development world, developers often use third-party libraries (such as game engines, database engines, or mobile payment engines) to develop their applications. Usually, these third-party libraries are closed-source libraries, so developers cannot change them. Sometimes, third-party libraries bring certain security issues to applications. For example, internal log printing for debugging purposes may leak credential information when users log in and pay, or some resources and scripts in the game engine that are stored locally in plain text may be easily obtained by attackers.
In this article, I would like to share with you some recent research results; specifically, the use of hooking technology to provide a simple and effective protection against certain offline attacks on Android applications. two, Common security issues in Android applications (one) Android App Packaging Overview Android applications are usually written in the Java programming language. When developers have high performance requirements or low-level API access, they can use C/C++ code and compile it into native libraries, and then call them through the Java Native Interface (JNI). After that, the Android SDK tool will package all the compiled code, data, and resource files into an Android package (APK). Android applications are packaged and distributed in APK format, which is actually a standard ZIP file format that can be decompressed using any ZIP tool. Once decompressed, the APK file may contain the following folders and files (see Figure 1): 1. META-INF Directory
2.classes.dex: Java classes compiled into DEX file format, understood and executed by the Dalvik virtual machine 3.lib: This directory contains the compiled code for the processor-specific software layer, and generally includes the following subdirectories:
4.assets: This directory contains application resources, which can be retrieved through AssetManager 5.AndroidManifest.xml: Android configuration file, which describes the program name, version, access rights, library files referenced by the application, etc. 6.res: All application resources are placed in this directory 7.resources.arsc: This file contains precompiled resources Figure 1: Contents of a typical Android APK package Once the package is installed on the user's device, its files are extracted and placed in the following directories: 1.Copy the entire application package file to the path /data/app 2. The Classes.dex file is extracted and optimized, and the optimized file is copied to the path /data/dalvik-cache 3. The native library is extracted and copied to the path /data/app-lib/<package-name> 4. Create a folder named /data/data/<package-name> and assign it to the application to store its private data. (two) Risk Awareness in Android Development By analyzing the folder and file structure in the previous section, you as a developer must be aware of several weaknesses in the application. Attackers can exploit these weaknesses to obtain a lot of valuable information. For example, the first vulnerability is that applications often store raw data resources used by game engines in the assets folder. This includes audio and video materials, game logic script files, and texture resources for sprites and scenes. Because Android application packages are not encrypted, attackers can easily obtain these resources by obtaining the corresponding package from the App Store or another Android device. Another vulnerable point is weak file access controls for rooted devices and external storage. An attacker can obtain the private data files of an application with the root privileges of the victim's device, or write the application data to external storage such as an SD card. If private data is not well protected, the attacker can obtain information such as user accounts and passwords from the file. Finally, debug information may be visible. If the developer forgets to comment out the debug code before releasing the application, an attacker can retrieve the debug output information by using the Logcat tool. three, Hook Technology Overview (one) What is a hook? Hook is a series of terms used to change the code technology, which is used to modify the behavior of the original code running sequence by inserting certain instructions into the runtime code segment. Figure 2 shows the basic implementation process of hook technology. Figure 2: Hooks can change the order in which programs run In this post, we will look at two types of hooking techniques: 1. Symbol table redirection By analyzing the symbol table of the dynamic link library, we can find the relocation addresses of all external call functions Func1(). Then, we modify each relocation address to the starting address of the hook function Hook_Func1() (see Figure 3). Figure 3: Schematic diagram of symbol table redirection process 2. Inline redirects Unlike symbol table redirection, which must modify every redirection address, inline hooking only overwrites the starting bytes of the target function we want to hook (see Figure 4). Inline redirection is more robust than symbol table redirection because it is only modified once at any time. The disadvantage is that if the original function is called anywhere in the application, it will also execute the code in the hooked function. Therefore, we must carefully identify the caller in the redirected function. Figure 4: Schematic diagram of the inline redirection process Four, Implementing the Hook Because the Android operating system is based on the Linux* kernel, many Linux research techniques are applicable to the Android system. The detailed examples given in this article are based on Ubuntu* 12.04.5 LTS. (one) Inline redirects The simplest way to create an inline redirection is to insert a JMP instruction at the start address of the function. When the code calls the target function, it will immediately jump to the redirected function. See Figure 5 for an example. In the main process, the code runs func1() to process some data and then returns to the main process. Here, the starting address of func1() is 0xf7e6c7e0. Figure 5: Function using the first five bytes to insert a JMP instruction in an inline hook The inline hook injection process replaces the first five bytes of the address with 0xE9 E0 D7 E6 F7. This process will create a jump instruction that jumps to the address 0xF7E6D7E0, which happens to be the entry point of the function my_func1(). Therefore, all code calls to func1() will be redirected to my_func1(). The data input to my_func(1) goes through a preprocessing stage, and then the processed data is passed to func1() to complete the original process. Figure 6 shows the code execution sequence after hooking func1(), and the following Figure 7 shows the pseudo C code of func1() after the hook is established. Figure 6: Using the hook: inserting my_func1() in func1() Using this method, the original code will not be aware of the change in the data processing flow. However, more processing code is appended to the original function func1(). Developers can use this technique to add program patches at runtime. Figure 7: Using the hook - Pseudo C code for Figure 6 (two) Symbol table redirection Symbol table redirection is more complex than inline redirection. The hook code must parse the entire symbol table, handle all possible cases, search and replace the addresses of the relocated functions one by one. The symbol table in the DLL (dynamic link library) will be very different, depending on what compiler parameters were used and how the developer called the external functions. In order to study all the situations related to the symbol table, you need to create a test project containing two dynamic libraries with different compiler parameters, namely: 1. Position Independent Code (PIC) object: libtest_PIC.so 2. Non-PIC object: libtest_nonPIC.so Figure 8 shows the code execution flow of the test program, as well as the source code of libtest1()/libtest2() (note: they have almost exactly the same functionality, except that they are compiled with different compiler parameters), and the output of the program. Figure 8: Software workflow for a test project The function printf() is used to implement hooks. It is the most commonly used function to print information to the console. It is defined in the file stdio.h, and the function code is located in the library file glibc.so. In the libtest_PIC and libtest_nonPIC libraries, three external function calling conventions are used: 1. Direct function call 2. Indirect function call
Figure 9: libtest1() code Figure 10: The code of libtest2() is the same as libtest1() Figure 11: Output of the test program five, Study on non-PIC code in libtest_nonPIC.so library A standard DLL object file is composed of multiple sections. Each section has its own function and definition. For example, the Rel.dyn section contains the dynamic relocation table information. The section information of the file can be obtained by decompiling with the command objdump -D libtest_nonPIC.so. In the relocation section rel.dyn of the library file libtest_nonPIC.so (see Figure 12), there are four places containing the relocation information of the function printf(). Each entry in the dynamic relocation section includes the following types: 1. The value in Offset identifies the position of the object to be adjusted. 2. The Type field identifies the relocation type. R_386_32 corresponds to the relocation data that places the symbol's 32-bit absolute address at the specified memory location, while R_386_PC32 corresponds to the relocation data that places the symbol's 32-bit PC-relative address at the specified memory location. 3. The Sym part points to the index of the referenced symbol. Figure 13 shows the generated assembly code of the function libtest1(). The entry address of printf() marked in red is marked in the relocation section rel.dyn in Figure 12. Figure 12: Relocation section information of the library file libtest_nonPIC.so Figure 13: Disassembled code of libtest1() compiled in non-PIC format To redirect the function printf() to another function called hooked_printf(), the hook function writes the address of hooked_printf() to these four offset addresses. Figure 14: Workflow of the statement printf("libtest1: 1st call to the original printf()\n"); Figure 15: Workflow of the statement global_printf1("libtest1: global_printf1()\n"); Figure 16: Workflow of the statement local_printf("libtest1: local_printf()\n"); As shown in Figure 14-16, when the linker loads the dynamic library into memory, it first finds the name of the relocated symbol printf, and then writes the real address of printf to the corresponding addresses (offsets 0x4b5, 0x4c2, 0x4cf, and 0x200c). These corresponding addresses are defined in the relocation section rel.dyn. After that, the code in libtest1() can correctly jump to the printf() function. [Translated by 51CTO. Please indicate the original translator and source as 51CTO.com when reprinting on partner sites] |
<<: 2007-2015: Two tables to show the competition history between iPhone and Android
Today we are discussing the data problems encount...
The optimization of mobile applications mainly de...
Last week, when everyone was immersed in the grie...
Software Information Name: Internet TV_Double Sta...
Christmas marketing , see how it is done abroad F...
What I’m going to teach you today is not some cro...
The article takes NetEase Yanxuan as an example t...
I want to sell goods on Douyin, but my Douyin acc...
New Oriental KIDS+ is a new early childhood educa...
The creative videos of African friends have becom...
Introduction: ASO , which is the abbreviation of ...
Short video operation is specially opened by many...
1. Event Analysis The application field of event ...
This article will talk to you about the article r...