About iOS Private API Scanning

About iOS Private API Scanning

Introduction: I recently studied the topic of private API scanning. After reading the existing relevant articles in the industry, I found that many of them are simple excerpts without any comments on the existing fallacies. After reading the open source iOS private api checker project of NetEase Games, I explained how to build a private API library, how the project identifies private APIs in APPs, and what problems the solution has.

Audit Cases

Custom methods and private APIs have the same name

The APP was not rejected, but Apple reminded us to modify the relevant API names during the next update.

However, many years ago, the widely used Three20 contained methods with the same names as private APIs, which caused many apps using this framework to fail review.

Use of non-public methods

Apple discovered that the app used the non-public method allowsAnyHTTPSCertificateForHost:, and while rejecting it, it also provided a method for developers to check themselves.

Private API calls not executed

Qzone had defined its own interface _define: but it was never called, and Apple found out and refused to put it on the shelves. This method is included in the exported header file of UITextView.

Tim Cook threatens to remove Uber app

Uber used a private API to obtain the serial number of the device, and Apple's CEO severely criticized the behavior and threatened to remove it from the shelves.

Calling method

Direct call

  1. [ self.view recursiveDescription];

Because the private API is not exposed, the compiler will report an error. You can add an anonymous Category to declare the private method.

  1. @interface UIView()  
  2. -(id)recursiveDescription;  
  3. @ end  

Character splicing

  1. NSArray *parts = @[@ "_priva" , @ "teMethod" ];  
  2. NSString *selectorString = [parts componentsJoinedByString:@ "" ];  
  3. [self performSelector:NSSelectorFromString(selectorString) withObject:nil];

Code Obfuscation

  1. //statusBar  
  2. NSData *data = [NSData dataWithBytes:(unsigned char []){0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x42, 0x61, 0x72} length:9];  
  3. NSString * key = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];

Detection Methods

Symbol Table

Use tools such as nm and otool to export the function symbol table of the binary package to check the calls of private APIs. The disadvantage is that it is impossible to detect private API calls of string concatenation methods.

Dynamic Scan

Dynamic scanning requires the application to be running and determines whether it is a private API every time a method is called. However, the efficiency is very low and it cannot guarantee complete code coverage.

Static Analysis

Based on the disassembly results of the binary file, static analysis is performed:

  • Find out the dynamic call API method such as performSelector: and the class of the calling object
  • Check the parameters. If the parameters are generated by the splicing method, deduce the splicing result.

For more information on how to deduce this, please read the paper titled Static Analysis of Binary Code to Isolate Malicious Behaviors published by Laval University in Canada. If the concatenated string is sent by the server, it can still avoid inspection.

NetEase Solution

Building a private API library

After downloading iOS-private-api-checker from Github, you can upload an IPA for scanning using the web. We can use virtualenv to create a virtual environment to install the required dependency libraries to avoid affecting the system-level Python environment.

  1. # Create a virtual environment
  2. virtualenv venv
  3.  
  4. # Enable virtual environment
  5. .venv/bin/activate
  6.  
  7. # Install dependent libraries
  8. pip install -r requirements.txt
  9.  
  10. # Start monitoring service
  11. python run_web.py

The principle of the check is introduced in the project's app/templates/main/index_page.html:

  1. Export the header files of Frameworks and PrivateFrameworks through class-dump and set them as PU and PR respectively
  2. Query all documented APIs through the SQLite database in the Xcode code prompt and set it as a collection DA
  3. Then PU-DA is a private API in the public Framework, set to A
  4. PR is the API in the private framework and cannot be used. The private API set PRAPI = A + PR
  5. Use class-dump to decompile the APP file in the ipa, and then take the intersection with the PRAPI set to get

However, the README.md in the project root directory says:

  • Private api = (api in the header file generated by the library under class-dump Framework - (api in the header file under Framework = documented api + undocumented api)) + api under PrivateFramework

When I first saw this formula, I was not very sure about the meaning of each of the operands, and I also had doubts about the equal sign in the brackets. In addition, this formula also mentioned the API in the header file under the Framework, but it was not mentioned at all in index_page.html. Therefore, it is recommended to ignore this formula for now, and don't worry about the text in index_page.html.

When reading build_api_db.py, I saw the comment in the method rebuild_private_api:

  1. set_E private api  
  2. undocument_api = set_B - set_C  
  3. set_E = set_A - set_C - undocument_api = set_A - set_B  
  4. if include_private_framework: set_E = set_E + set_D

From the perspective of set operations, can set_E = set_A - set_C - undocument_api be equated with set_A - set_B? Logically, it should be set_E = set_A - (set_B + set_C). The + here is a simplified notation by the original author, referring to the ∪ operation of the set. Therefore, it is recommended to ignore this comment.

Although there are some problems with the comments, after reading the code, I found that the actual implementation logic is correct. Now I will briefly explain the principle of building a private API library based on build_api_db.py and related codes:

  1. set_A, which indicates the API set parsed from the header files dumped from all .framework files in the system Frameworks directory. It corresponds to the framework_dump_apis table record in ios_private.db.
  2. set_B, which represents the API set parsed from the header files of all .framework files in the system Frameworks directory, corresponds to the framework_header_apis table record in ios_private.db.
  3. set_C, which indicates the API set parsed from the index file in docSet, corresponds to the record of document_apis table in ios_private.db.
  4. set_D, which indicates the API set parsed from the header files dumped from all .framework files in the system PrivateFrameworks directory. It corresponds to the private_framework_dump_apis table record in ios_private.db.
  5. set_E, indicates private API. The private API identified from set_A corresponds to the record in framework_private_apis. The record in table private_apis is added with set_D.
  6. If the second parameter of the rebuild_sdk_private_api function is False, set_D will not be added to the private_apis table.

Constructing Set A

The header file for exporting .framework using class-dump is already encapsulated in api_utils.py. Therefore, external scripts such as DumpFrameworks.pl are not needed, and the header file directory structure generated by DumpFrameworks.pl does not match this project. There is no need to download the header file exported by Nicolas Seiot based on RuntimeBrowser.

What we need to do is to ensure that the simulator of the target system (such as 8.1) has been installed on the local machine and know the path of Frameworks and PrivateFrameworks.

For the former, you only need to download the iOS 8.1 Simulator in Xcode / Preferences / Components / Simulator. For the latter, you can create an Xcode project and set the startup parameter DYLD_PRINT_INITIALIZERS = 1 to find the full path of the .framework in the console, for example:

  1. /Library/Developer/CoreSimulator/Profiles/Runtimes/iOS 8.1.simruntime/Contents/Resources/RuntimeRoot/System/Library/Frameworks

It should be noted that there is a space between the path iOS and 8.1 above. This space will cause problems when executing the class-dump script. Specific suggestions on how to fix it will be given later.

According to my experimental results, changing 8.1 in the above path to 9.3 or 10.3 is the path under different systems. The path for iOS 11.4 is:

  1. /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/Library/CoreSimulator/Profiles/Runtimes/iOS.simruntime/Contents/Resources/RuntimeRoot/System/Library/Frameworks

We don't need to remember these paths, what we need is to master the method of obtaining the paths, and using the find command is also OK.

Construct set B

The Frameworks path has been introduced in the part of building set A. The framework_header_apis method in api_utils.py is used to build the API set parsed from the header files in all .framework files in the Frameworks directory. Can you see the difference from set A? One directly processes the header files contained in .framework, and the other exports the corresponding header files from the Mach-O files in .framework.

Constructing set A/D actually has one more step than constructing set B, namely the dump process. This is why when dumping, the directory of the exported header file is consistent with the internal structure of the system framework file, so that the code for the subsequent construction process of the set can be universal.

Construct set C

The main obstacle to generating a documented API set is the lack of a docSet on the local machine. This article was written in early September 2018. I only have Xcode 9 on my work machine, and the new version of Xcode has used a new documentation format and is directly integrated into Xcode. In fact, Apple officially provides an XML containing information such as document links for each version. You can find the document download links for iOS 8.1 and other versions by downloading the XML to your local machine.

  1. # Meta information of each version of iOS docSet  
  2. https://developer.apple.com/library/downloads/docset-index.dvtdownloadableindex
  3.  
  4. # iOS 8.1 docSet  
  5. https://devimages-cdn.apple.com/docsets/20141020/031-0773***.dmg
  6.  
  7. # iOS 9.3.5 docSet  
  8. https://devimages-cdn.apple.com/docsets/20160321/031-52212-A.dmg

After installing the downloaded dmg, the docSet file will appear in the root directory of Mac OS. You can move it anywhere you like. The Contents/Resources/docSet.dsidx inside docSet is the data source for us to get set C.

I am used to using SQLPro for SQLite to view sqlite database files. I can rename docSet.dsidx to docSet.sqlite and then double-click to open it. The five types of func, instm, clm, intfm, and intfcm in the ZTOKENTYPE table are what we need to pay attention to:

  • func represents a global C function
  • instm stands for instance method
  • clm stands for class method
  • intfm indicates a protocol method, starting with -
  • intfcm indicates a protocol method, starting with +

I guess intf is the abbreviation of interface, which is the interface of OOP rather than the interface of Obj-C class definition.

As for how to obtain the documented API of the latest version of iOS, I have not studied it. Since the author of Dash can generate Apple API Reference, it should be possible to generate a dsidx file in theory. I have recorded some valuable Dash Release Notes as a clue for future research:

  • "Xcode 8 doesn't come with docsets anymore and that means Dash won't automatically support the iOS 10, macOS 10.12, watchOS 3 and tvOS 10 docs. I'm working on a version of Dash that supports the new docs and will release an update as soon as possible." -- Jun 14th, 2016
  • "Apple API Reference Support. Apple has new API docs. You can use them in Dash by installing the Apple API Reference docset." -- Jul 2nd, 2016
  • "The Apple API Reference docset now reads the docs from within Xcode 8. This reduces disk space usage while also allowing me to modify & improve the docs at display-time. Thanks a lot to the Xcode team at Apple for helping me understand the new documentation format!" -- Oct 25th, 2016

Construct set D

Build set A in the same way, but change the Frameworks in the path to PrivateFrameworks:

  1. /Library/Developer/CoreSimulator/Profiles/Runtimes/iOS 8.1.simruntime/Contents/Resources/RuntimeRoot/System/Library/PrivateFrameworks/

Construct set E

Take set_A as the processing object:

  1. All methods starting with _ are added to set_E;
  2. Other APIs, if not in set_B or set_C, are added to set_E
  3. The comparison benchmarks for not being in set_B / set_C are the three values ​​of api_name, class_name, and sdk

Step 3 is implemented based on db query

Code defects

1. In build_api_db.py, rebuild_sdk_private_api(sdk_version, False) needs to be changed to True

2. In build_api_db.py, after include_private_framework, private_framework_apis should be inserted into the database table instead of framework_dump_private_apis

3. all_headers_path += iterate_dir(framework, "", os.path.join(framework_folder, header_path)) in api/api_utils.py should be changed to all_headers_path += iterate_dir(framework, "", header_path)

4. In db/dsidx_dbs.py, sql = balabala. You need to confirm the IDs corresponding to the five TOKEN types in the dsidx file.

For example, the IDs for the 8.1 docSet I downloaded from Apple that correspond to the same ZTOKENTYPE are not (3,9,12,13,16) but (11,13,1,8,19).

If you downloaded someone else's ios_private.db directly from Baidu Netdisk or other places, please open this db and check whether the data in the document_apis table are really APIs.

In addition, the original author wrote (3,9,12,13,16) because these were the IDs in the iOS 7.0 docSet at the time, which I confirmed by looking back at the commit records. So a more flexible way to write is to filter data based on ZTYPENAME.

5. ret = subprocess.call(cmd.split()) in dump/class_dump_utils.py is not robust enough

After I installed the iOS 8.1 simulator in Xcode 9, I found that the Frameworks path had spaces in it, which caused the path to be split into two parts after split. Changing it to ret = subprocess.call([class_dump_path, '-H', frame_path, '-o', out_path]) should be able to avoid this problem.

Scanning Private APIs

Main logic

Reading iOS_private.py, we can figure out the main logic of identifying private APIs in APP as follows:

  1. Export strings from Mach-O files based on the strings tool, and split them by spaces to get set 1
  2. Use otool -L to get the list of Frameworks and PrivateFrameworks used from the Mach-O file
  3. Based on the header file information exported from the Mach-O file by class-dump, class name variable name set 2 and method set 3 are parsed.
  4. Set 4 = Set 1 - Set 2 (the comparison base is api_name)
  5. In the table framework_private_apis, group by api_name, class_name to get the set of class name and method name combinations 5
  6. Match collection 5 and collection 4 by api_name to get collection 6
  7. Set 6 and Set 3 are matched by api_name, class_name to get the final private API set

The result of step 2 can be used as part of the condition of step 5. The data in the whitelist table will be excluded from the result set, and corresponding to the code logic, it will also be filtered out in step 5.

Code defects

Because the matching condition in step 6 is based on the combination of api_name and class_name, the group by clause in the SQL statement of the original code should include not only api_name but also class_name.

Improvement Suggestions

It is highly likely that the private API will not be discovered if the NetEase solution is used directly. The detection logic only considers full matching of api_name and class_name, which is too limited.

  1. In the construction of the private API database, TSRC Lab further added conditions, such as some APIs with pure lowercase letters, most of which are C functions, and then filtered out a batch of
  2. In the design of the scanning algorithm, if step 5 only groups by api_name, and step 6 only matches api_name, and there is a string like @selector(XXX) in the source code, it can be basically determined that api_name is a private API.
  3. For static splicing or encryption and decryption APIs, dynamic hooks can be used for identification, but there are some limitations.
  4. Added scanning of prefs: and App-Prefs: protocols

Verify specific API

If Apple's review points out that a certain API is used that should not be used, then we must support screening where the API is used, whether it is in our app or a third-party SDK. In the root directory of the code project, execute:

  1. find . -type f | grep ".a" | grep -v ".app" | xargs grep advertisingIdentifier

Legacy Themes

When studying NetEase Games' open source solutions, I skipped the question of how to build a documented API dataset for iOS 10+. I will conduct further research on this later.

<<:  Google is testing a new operating system, Fuchsia, which is expected to replace Android this year

>>:  Is this Apple’s “disaster” point or turning point?

Recommend

Social live broadcast SOP and formula!

Brand traffic and conversion have always been ver...

It took several weeks to remove the parasite from the patient's body.

The species calendar I wrote before was about the...

How to disinfect your phone?

We always touch our eyes, nose, mouth and other b...

How to create the correct user portrait?

Many companies actually do not have a clear user ...

Creative formula for marketing promotion, master these 6 methods!

Starting from user-oriented thinking, this articl...

Microsoft complains about Android phones: Stuck in the air

iOS, Android and WP can be said to have their own...

How can humans achieve "banana freedom"? First, cure the "cancer" of bananas!

1. Overview 1.1 Banana’s function and value Banan...

How can low-frequency App operations improve user activity?

APP operation and promotion should be carried out...