Recommendation This article introduces the technique of restoring the symbol table, and uses this technique to implement symbol breakpoint debugging of the target program in Xcode, which can significantly reduce the time of reverse analysis. At the beginning of the article, the author takes Alipay as an example to show the process of obtaining the call stack of Alipay by setting a breakpoint on the show method of UIAlertView. The code involved in this article is also open source at: https://github.com/tobefuturer/restore-symbol. Stars and issues are welcome. Thanks to the author for authorization. About the author: Yang Jun, a graduate student in the Department of Computer Science at Sun Yat-sen University, an iOS developer, specializes in iOS security and reverse engineering. His personal blog is http://blog.imjun.net. Preface The symbol table has always been a battleground in reverse engineering, and iOS applications will remove the symbol table before going online to avoid reverse analysis. This article will introduce a tool I wrote to restore the symbol table of iOS applications. Let’s take a look at the effect directly. This is what Alipay looks like after restoring the symbol table: The article is a bit long, please read to the end patiently, the highlights are at ***. Why restore the symbol table? In reverse engineering, dynamic analysis of the debugger is indispensable, and Xcode + lldb is indeed a very good debugging tool. For example, we can easily view the call stack in Xcode. As shown in the picture above, we can clearly see the RPC call process of Alipay login. In fact, if we don't restore the symbol table, the debug page you see should look like this: For the same function call process, Xcode displays completely different results. The reason is that when Xcode displays symbols in the call stack, it only displays symbols in the symbol table. In order to make our debugging process smooth, we need to restore the symbol table in the executable file. What is a symbol table? If we want to restore the symbol table, we must first know what the symbol table is and how it exists in the Mach-O file. The symbol table is stored in the __LINKEDIT segment of the Mach-O file, involving the symbol table (Symbol Table) and the string table (String Table). Here we use MachOView to open the executable file of Alipay and find the Symbol Table item. The structure of the symbol table is a continuous list, each item of which is a struct nlist.
Here we focus on the first and last items. The first item is the offset of the symbol name in the string table, which is used to represent the function name. The last item is the address of the symbol in memory, which is similar to a function pointer (only the general structure is described here, for detailed information, please refer to the official Mach O file format document). That is to say, if we know the correspondence between symbol names and memory addresses, we can reverse construct the symbol table data based on this structure. Knowing how to construct a symbol table, the next step is to collect the correspondence between symbol names and memory addresses. Get the symbol table of the OC method Because of the characteristics of the OC language, the compiler will compile class names, function names, etc. into the executable file, so we can reverse restore all the classes in the project based on the structure of the Mach-O file. This is the famous reverse tool class-dump. The header file generated by class-dump contains the function address: So we only need to slightly modify the source code of class-dump to obtain the information we want. Symbol table recovery tool After sorting out the data format and clarifying the data source, we can write the tool. I won’t go into detail about the implementation process. The tool is open source on my Github, link: https://github.com/tobefuturer/restore-symbol Let's see how to use this tool: 1. Download source code and compile
2. Restore OC's symbol table, very simple
origin_AlipayWallet is a Mach-O file without symbol table after Clutch is shelled -o followed by the output file location 3. Re-sign and package the Mach-O file to see the effect After the file restores the symbol table, there is an extra 20M of symbol table information View the call stack in Xcode It can be seen that the symbols of this part of the OC function have been restored, and the general calling process can be seen in the function call stack. However, in Alipay, the block callback form is used, so a large part of the symbols cannot be displayed correctly. Next, let's see how to restore the symbols of this part of the block. Get the symbol information of the block Still with the same idea, to restore the symbol information of a block, we must know how the block is stored in the file. The structure of a block in memory First, let's analyze how blocks exist in memory during runtime. Blocks exist in memory as a structure, and the general structure is as follows:
The isa pointer in the block can have three different values depending on the actual situation to represent different types of blocks:
When a block is created on the stack, a block structure is allocated on the stack, and then variables such as isa are assigned values. 2._NSConcreteMallocBlock When a block on the heap is added to GCD or held by an object, the block on the stack is copied to the heap. At this time, the type of the copied block becomes _NSConcreteMallocBlock. 3._NSConcreteGlobalBlock For a global static block, when the block does not depend on the context, for example, it does not hold variables outside the block and only uses variables inside the block, the memory allocation of the block can be completed at compile time and allocated in the global static constant area. The second type of block will only appear at runtime. We will only focus on types 1 and 3. The following will analyze the relationship between these two isa pointers and block symbol addresses. block isa The association between pointers and symbol addresses To analyze this part, we need to use the disassembly software IDA. Here are two practical examples to illustrate: 1._NSConcreteStackBlock Suppose our source code is a very simple block like this:
After compiling, the actual assembly looks like this: In actual operation, the construction process of the block is as follows:
So we can sort out such a feature: Here comes the point!!! Whenever a block on the stack is used in the code, __NSConcreteStackBlock will be obtained as the isa pointer, and a function address will be obtained immediately afterwards. That function address is the function address of the block. Combined with the following picture, carefully understand the above sentence (This picture is the same file as the one above, but the symbol table has been cropped) Using this feature, we can make the following inferences during reverse analysis: If you find that the variable __NSConcreteStackBlock is referenced in an OC method, then there must be a function address nearby, and this function address is a block in this OC method. For example, in the figure above, we found that __NSConcreteStackBlock was referenced in viewDidLoad, and the function address of sub_100049D4 was loaded immediately afterwards. Then we can determine that sub_100049D4 is a block in viewDidLoad, and the symbolic name of the sub_100049D4 function should be viewDidLoad_block. 2. _NSConcreteGlobalBlock A global static block is a block that does not reference variables outside the block. Because it does not reference external variables, it can perform memory allocation operations at compile time and does not need to worry about block copying and other operations. It exists in the constant area of the executable file. If you don’t understand, here’s an example: We change the source code to this:
Then after compilation it will become like this: Then, referring to the above ideas, when doing reverse analysis, we can infer that
3. Nested structure of blocks In actual use, there may be a situation where a block is embedded in a block:
Therefore, there is a parent-child relationship in the block. If we collect these parent-child relationships, we can find that these relationships will form a forest structure in graph theory. Here we can simply use recursive depth-first search to handle it, and the detailed process will not be described again. Block symbol table extraction script (IDA+python) After sorting out the above ideas, we found that the search process depends on IDA to provide various reference information, and IDA provides programming interfaces that can be used to extract reference information. IDA provides a Python SDK, and the completed script is also in the repository search_oc_block/ida_search_block.py (https://github.com/tobefuturer/restore-symbol/blob/master/search_oc_block/ida_search_block.py. Extract block symbol table Here is a brief introduction on how to use the above script:
4. Wait for the script to finish running, which is expected to take 30s to 60s. During the running process, a pop-up window like this will appear 5. When the pop-up window disappears, the block symbol table extraction is completed 6. In the directory where IDA opens the file, a json format block symbol table named block_symbol.json will be output Restore symbol table & actual analysis Use the previous symbol table recovery tool to import the block's symbol table into the Mach-O file
-j is followed by the json symbol table obtained previously ***Get an executable file with both OC function symbol table and block symbol table Here is a brief analysis case, and you can experience the power of this tool. 1. Set a breakpoint on -[UIAlertView show] in Xcode 2. Run the program, enter your phone number and incorrect password on the Alipay login page, and click Login 3. Xcode will stop when the 'Wrong Password' warning box pops up, and the call stack will be displayed on the left A picture to see the Alipay login process Project open source address: https://github.com/tobefuturer/restore-symbol Welcome to raise various issues above, or you can directly email ([email protected]) if you have any questions. |
<<: TensorFlow 1.0: Unlocking machine learning on smartphones
>>: Understanding iOS memory management
On December 9, on the Bund in Shanghai, accompani...
Apple fought back in court on Thursday against a ...
Some time ago, when Tu Zi was browsing the websit...
[[440213]] Apple today released iOS 15.2 and iPad...
[[151096]] If you are planning to use AngularJS t...
Four years ago during the Spring Festival, the mo...
Chasing hot topics is something that operators of...
First of all, for enterprises, it is obviously no...
The Ningxia Hui Autonomous Region Health Commissi...
...
[[125861]] The recent months have been turbulent ...
What is the price of being an agent of Hohhot Foo...
At the Great Wall Motors 2025 Strategy Release an...
520 is here again, are you ready to confess your ...
On March 27 , an offline press conference with th...