iOS multithreaded recursive lock: exploration of the underlying principle of @synchronized

Author: Zhang Xuan, Zhang Kai

Review | Chonglou

1. Background and Significance

In system application development, as the complexity of applications increases, multithreaded programming has become one of the key technologies to improve user experience and application performance, and thread synchronization plays a pivotal role. Given that iOS applications generally involve multitasking and concurrent operations, ensuring thread safety has become a vital task. The Objective-C language provides an efficient and reliable synchronization mechanism for this purpose, namely the @synchronized keyword. This mechanism ensures that only one thread can execute the code block at the same time by marking a specific code segment as a critical section, thereby effectively protecting the consistency and integrity of the data.

In addition, the @synchronized keyword also has the ability to recursively lock in a multi-threaded environment. This means that the same thread can acquire the same lock multiple times without causing a deadlock, greatly improving the security and feasibility of recursive calls.

2. Key technologies

In the iOS multithreaded environment, the @synchronized keyword is a key technology for implementing thread synchronization and recursive locking. Its internal implementation principle is based on the Objective-C runtime and the underlying lock mechanism.

First, the @synchronized keyword is converted into a function call in the Objective-C runtime library at compile time, namely objc_sync_enter and objc_sync_exit. These two functions are used to acquire and release locks, respectively. When a thread tries to enter the @synchronized code block, the objc_sync_enter function is called and tries to acquire the lock associated with the given object. If the lock is already held by another thread, the current thread will be blocked until the lock is released. If the lock is successfully acquired, the thread can safely execute the code in the @synchronized code block.

Secondly, the recursive locking capability of the @synchronized keyword is implemented by maintaining a mapping of threads to lock counters at runtime. When the same thread enters the same @synchronized code block multiple times, the lock counter is incremented instead of reacquiring the lock. This way, even if the thread enters the critical section multiple times, it will not cause deadlock. When the thread leaves the @synchronized code block, the lock counter is decremented until the counter reaches zero, at which point the lock is released, allowing other threads to enter the critical section.

2.1 @synchronized source code entry

First, let's take a look at the specific structure of @synchronized through a demo:

Figure 1 demo Objective-C language version example

In Xcode, you can use clang - rewrite - objc command to rewrite the above demo code into C++ code . Here we extract the key code as follows:

Figure 2 demo C++ language version example

Obviously, from the given information, it is clear that the _sync_exit() function triggers the construction process of _SYNC_EXIT, and ~_SYNC_EXIT is the destructor of _SYNC_EXIT. Fundamentally speaking, the underlying implementation of the @synchronized instruction depends on the calls to the objc_sync_enter and objc_sync_exit functions. Next, we will analyze the specific implementation of objc_sync_enter and objc_sync_exit in depth.

2.2, objc_sync_enter and objc_sync_exit function implementation

objc_sync_enter and objc_sync_exit are key functions for handling synchronization locks in the Objective-C runtime library. These two functions use the underlying lock mechanism (such as mutex locks or spin locks) to ensure that only one thread can execute a specific code block at the same time. Below, we will analyze the implementation of these two functions and the principles behind them in detail.

2.2.1 Implementation of objc_sync_enter function

The main function of the objc_sync_enter function is to try to acquire the lock associated with a given object. If the lock is already held by another thread, the current thread will be blocked until the lock is released. The implementation of this function usually includes the following steps:

1. Calculate the hash value of the lock object: First, a hash value is calculated based on the object passed in (usually as the unique identifier of the lock in the @synchronized statement). This hash value is used to quickly locate the corresponding lock object in the internal data structure.

2. Find or create a lock object: In the internal data structure (such as a hash table or linked list), find out whether the corresponding lock object exists according to the calculated hash value. If it does not exist, create a new lock object and insert it into the data structure. This lock object may be a structure that encapsulates the underlying synchronization mechanism such as a mutex or spin lock.

3. Try to acquire the lock: Use the found lock object to try to acquire the lock. This usually involves calling the underlying synchronization mechanism, such as calling the lock method of a mutex or related operations of a spin lock. If the lock is already held by another thread, the current thread will be blocked.

4. Record the relationship between threads and locks: In order to support recursive locking, the Objective-C runtime also needs to record which threads have held which locks and the number of times they have held the locks. This is usually achieved through a mapping from a thread to a lock counter.

Figure 3 objc_sync_enter function source code

2.2.2 Implementation of objc_sync_exit function

The main function of the objc_sync_exit function is to release the lock previously acquired by objc_sync_enter. The implementation of this function usually includes the following steps:

1. Calculate the hash value of the lock object: Same as objc_sync_enter. First, calculate the hash value based on the passed in object to find the corresponding lock object.

2. Find the lock object: Find the corresponding lock object in the internal data structure.

3. Release the lock: Use the found lock object and call its release method (such as the unlock method of a mutex lock). This will allow other blocked threads to enter the critical section.

4. Update the relationship between threads and locks: If the current thread is the last to release the lock (that is, the lock counter is reduced to zero), remove the record of the thread from the mapping of threads to lock counters. This ensures the correctness of recursive locking.

Figure 4 objc_sync_exit function source code

2.3 Summary

By deeply exploring the source code of the objc_sync_enter and objc_sync_exit functions, we can clearly see the core role of these two functions in the synchronization mechanism. They ensure that threads can safely enter and exit synchronized blocks by cleverly using the mutex data->mutex. This mechanism is crucial in concurrent programming, preventing multiple threads from accessing shared resources at the same time, thereby avoiding the risk of data competition and inconsistency.

Further, we found that the variable data is actually a pointer to an instance of the SyncData type. Here, SyncData plays a crucial role, encapsulating all necessary information related to synchronization operations, including mutexes, synchronization status, etc. Through the carefully designed SyncData structure, we can more flexibly manage and control the access to synchronization resources and ensure the coordinated operation of the program between different threads.

At the same time, the id2data component also caught our attention. Judging from the name, it seems to be a function or method used to map a certain identifier (such as an object ID) to the corresponding SyncData instance. In concurrent programming, such a mapping relationship is crucial for quickly locating and managing synchronization resources. Through id2data, we can quickly find the corresponding data (that is, the synchronization data of the object) based on the passed in obj (which may be the unique identifier of an object). This mapping mechanism greatly improves the efficiency and accuracy of synchronization operations and provides strong support for concurrent execution of programs.

In order to have a deeper understanding of the specific implementation of the SyncData structure and id2data, we will discuss the structure and member variables of SyncData in detail in the following section 3, as well as its specific application in the synchronization mechanism. At the same time, in the fourth section, we will analyze the implementation details of id2data, including how it receives input parameters, performs mapping lookups, and returns the corresponding SyncData instance. With a deep understanding of these core components, we will be able to better master the synchronization mechanism of Objective-C and flexibly apply it in actual development.

3. Introduction and analysis of SyncData structure

First, let's understand the implementation of the SyncData structure:

Figure 5 SyncData basic structure

From this source code, we can see the basic structure of SyncData, which essentially stores a one-way linked list of the passed object obj, a recursive lock, and the number of threads used. You can take a look at this recursive lock from the following source code. This recursive lock is essentially a wrapper based on os_unfair_lock. Here is an additional note: os_unfair_lock is a mutex lock in iOS. In previous versions, recursive _mutex_t was wrapped by pthread _mutex_t , so here we only need to understand it as a mutex lock .

Figure 6 recursive_mutex_t basic structure

4. Analysis and research on the underlying implementation of id2data

Before formally analyzing the details of id2data, in order to build a clearer cognitive framework, this article will first conduct a macro-level review and overall explanation of the data structure involved. This is to lay a solid background knowledge foundation for the subsequent in-depth discussion of id2data, ensuring that readers can rely on this solid foundation to more accurately grasp the relevant concepts and logical context.

4.1 SyncCache's underlying implementation logic

Figure 7 SyncCache basic structure

It can be clearly observed that the SyncCache container stores data of the SyncData type. The SyncData structure has been explained in detail in the third part of the article, so it will not be repeated here. In fact, from this detail, we can infer that SyncCache seems to be a cache mechanism specially designed to store SyncCacheItem objects containing SyncData. From this perspective, the function and purpose of SyncCache become very obvious. It is a temporary storage area used to speed up data access. When data needs to be accessed, it can be directly obtained from SyncCache, thereby improving the running efficiency of the program. At the same time, this also reflects the designer's emphasis on data storage and reading efficiency. By introducing a cache mechanism, frequently accessed data can be quickly obtained, thereby improving overall performance.

4.2 Fast Cache

Figure 8 Fast storage internal storage structure

The fast cache here is actually very similar to SyncCache in essence. They are both a cache mechanism used to improve data reading speed. The main difference between the two is that SyncCache stores data in a list, while the fast cache only stores a single SyncCacheItem. Each Item obtains its corresponding data and lockCount through two key keys. This design enables the fast cache to read data faster because it does not need to traverse the entire list, but directly obtains data through the key. At the same time, this also makes the storage space of the fast cache more economical because it does not need to allocate a large amount of memory space for a list.

4.3 The underlying implementation logic of sDataLists

sDataLists is a global static variable, which means that it can only have one instance in the entire application. This global variable contains a special list called SyncList. This SyncList list has a unique position in the program, and it is responsible for managing and synchronizing all data that needs to be shared and updated. Since it is static, no matter where in the program you need to access sDataLists, you can directly reference it by its name without creating a local variable first. This design makes data management and synchronization more efficient and convenient.

Figure 9 sDataLists storage structure

Figure 10 SyncList basic structure

4.4 Design and implementation of StripedMap

When deeply analyzing the source code architecture of sDataLists, we can clearly identify that sDataLists essentially follows the data structure design of hash tables. Behind this carefully planned architecture, there are far-reaching purposes and careful considerations. As an efficient way to organize data, the core advantage of hash tables is that they significantly improve the speed and efficiency of data retrieval. However, in the current implementation framework, the functions and roles carried by hash tables far exceed this single scope.

If we abandon the StripedMap strategy, the system will have to rely on a single global SyncList instance to coordinate the locking and unlocking process of all objects. Although this design is simple, it has a significant performance bottleneck. Specifically, whenever any object attempts to access or modify the hash table, its operation will be forced to pause until all other objects complete their unlocking operations before it can continue to execute. This type of serial processing will undoubtedly greatly increase memory usage and have an adverse impact on the overall performance of the system.

The introduction of StripedMap provides an effective solution to existing problems. Its core value lies in the implementation of sharding for a single SyncList, a mechanism that ensures that multiple objects can operate different SyncList instances in parallel and independently. By pre-configuring and managing a certain number of SyncLists through StripedMap, and adopting a balanced distribution strategy during actual calls, the system can significantly improve its concurrent processing capabilities. Specifically, when performing a locking operation, each object can independently select an independent SyncList to operate, thereby successfully avoiding the performance bottleneck problem that may be caused by the global lock.

4.5 TLS (Thread Local Storage)

TLS is thread local storage, which is a private space provided by the operating system for threads and can store some data that belongs only to the current thread.

TLS, or thread local storage, is a mechanism provided by the operating system that allocates a private memory space to each thread to store data that belongs only to that thread. Since the thread is the smallest unit for task scheduling and resource allocation in the operating system, TLS can ensure that each thread has an independent storage space, avoiding data interference and conflicts between threads, and improving the concurrency and stability of the program.

In a multi-threaded program, each thread may access shared resources, such as global variables or shared memory areas, which can lead to data contention and race conditions, thus affecting the correctness and reliability of the program. To solve this problem, developers usually need to use synchronization mechanisms, such as mutexes and semaphores, to protect access to shared resources. However, these synchronization mechanisms bring additional overhead and reduce program performance. With TLS, some data that needs to be accessed frequently and does not involve shared resources can be stored in the private space of the current thread, avoiding the use of synchronization mechanisms and improving the running efficiency and performance of the program.

The use of TLS requires developers to make appropriate declarations and initializations when writing programs to ensure that each thread can correctly access the data in its private space. At the same time, since TLS is a mechanism provided by the operating system, its specific implementation and interface may vary depending on the operating system, so developers need to adapt and adjust accordingly based on the specific operating system and compiler environment.

4.6 Implementation analysis and research of id2Data

Before discussing our current topic, let's turn to the specific implementation details of id2Data. Given the redundancy of the source code, the following will be analyzed one by one according to Sections 4.6.1 to 4.6.4. First, the system will locate and retrieve the starting node of the lock and linked list based on the passed obj parameter.

Figure 11 id2Data basic structure input parameters

4.6.1 FastCache Fast Lookup

A FastCache mechanism is deployed in the thread local storage (TLS) of each thread. The implementation process of this mechanism mainly includes the following rigorous and orderly steps:

First, the system checks whether there is SyncData data. If so, the fastCacheOccupied flag is immediately set to YES to clearly indicate that there is valid data in the fast cache.

Next, the system will perform a critical check: verify whether the current SyncData object is completely consistent with the target object obj targeted by the current operation. If the two match, it means that the lock data matching the target object has been found in the fast cache.

After confirming that the matching lock data is found, the system will read the lockCount value of the current lock and perform the corresponding operation based on the current incoming operation type (such as locking or unlocking). After the operation is completed, the system will store the updated lockCount value back into TLS so that it can be quickly located in subsequent search operations.

The core advantage of this FastCache mechanism is that it effectively improves the execution efficiency of locking and unlocking operations by pre-caching lock data in the local storage of each thread. This not only avoids frequent global scope search and update operations, but also significantly reduces the contention of lock resources, thereby achieving a significant improvement in concurrent performance.

Figure 12 id2Data basic structure of FastCache

4.6.2 SyncCache cache traversal search

When the FastCache fast search mechanism fails to successfully locate the target, the program will trigger the execution of the fetch_cache function to access the cached data of the current thread and continue the search operation. It is worth noting that SyncCache is essentially designed as an array structure that stores SyncData elements. This process is essentially traversing the array to find items that match the current operation object obj. Once a match is found, the program will perform the corresponding locking or unlocking operation based on the type of operation passed in, and synchronously update the value of lockCount. If lockCount decreases to 0, it is considered that the thread has completed the use of the lock, and the lock instance is immediately removed from the thread cache.

To ensure the correctness and consistency of the above operations in a multi-threaded concurrent environment and avoid potential conflicts, this mechanism uses the OSAtomicDecrement32Barrier function to atomically reduce the value of threadCount in the result structure. This is to ensure that the process of reducing the counter will not be interfered by the locking operations of other threads, thereby maintaining the stability and accuracy of the system state.

Figure 13 SyncCache of the basic structure of id2Data

4.6.3 sDataLists Search

If the target item is not retrieved from either the fast cache or the regular cache, this thread performs the @synchronized operation for the first time. In this case, the system turns to retrieve the corresponding SyncData object from sDataLists. First, the global linked list is locked by executing the lock operation. This is to prevent multiple threads from creating new locks on the same object in parallel and ensure thread safety. Then, the global linked list is traversed to find the SyncData instance that matches the current object.

If a matching SyncData is successfully located during the traversal process, the instance is assigned to the result variable, and the OSAtomicDecrement32Barrier atomic operation is used to increment the value of result->threadCount. This is to prevent potential conflicts with concurrent release operations and ensure data consistency and thread safety. After completing the above steps, the relevant information of the SyncData can be subsequently accessed in TLS (thread local storage). If the system detects that there are unused SyncData instances, it will give priority to reusing such resources to optimize resource utilization efficiency and reduce unnecessary object creation and destruction overhead.

Figure 14 sDataLists of the basic structure of id2Data

4.6.4 Create SyncData

If the required data cannot be found in sDataLists, you need to build SyncData yourself and cache it in the thread cache for subsequent queries. Looking at the lock acquisition process of id2data, it uses a mechanism similar to a three-level cache, that is, searching from the fast cache, regular cache to the hash table in sequence. This design aims to efficiently manage lock resources in a multi-threaded environment and ensure that threads can quickly obtain locks to perform locking and unlocking operations, thereby improving system efficiency. In the face of a situation where sDataLists retrieval is fruitless, a corresponding strategy should be adopted, that is, to create a new instance of SyncData and properly incorporate it into the cache system to ensure data integrity and stable system operation.

Figure 15 SyncData creation of id2Data basic structure

Figure 16 id2Data basic structure: SyncData adds cache

Summarize

Figure 17 id2Data three-level cache mechanism

From the top-down analysis above, we can see that @synchronized uses the three-level cache mechanism of id 2 Data to quickly and timely obtain locks for the passed-in object obj, and clearly records the lockCount and usage of these locks, ensuring that only one thread can execute at the same time, thereby preventing data competition and ensuring thread safety . The following points can be summarized:

@synchronized builds a recursive lock for each thread based on the object passed in, and records the number of times each thread is locked. Based on this, each thread is locked and unlocked with a different recursive lock, thus achieving the purpose of multi-threaded recursive calls.
@synchronized is a recursive mutex lock encapsulated based on os_unfair_lock . When @synchronized creates a lock internally, it uses spinlock_t to ensure thread safety in order to ensure uniqueness.
@synchronized uses fast cache, thread cache, and global linked list to enable threads to obtain locks more quickly and improve efficiency.
Another thing to note is that if nil is passed in when using @synchronized, locking cannot be completed, so it should be avoided when using it.

In short, in the iOS development process, @synchronized provides a convenient thread synchronization mechanism that allows developers to easily implement thread synchronization. Especially when dealing with recursive locks, using @synchronized can effectively avoid deadlocks and greatly improve the security and reliability of the code. This feature is very important for developers because it not only ensures orderly access to shared resources between threads, but also reduces complexity, allowing developers to focus more on the implementation of business logic. Finally, this article explores the process of iOS implementing thread synchronization mechanisms, aiming to provide developers with reference guidance in the field of synchronization system development.

About the Author

Zhang Xuan , a software development engineer at the R&D Center of Agricultural Bank of China , is familiar with iOS development, has relevant development experience in SpringBoot and React, and has a solid reserve of basic computer knowledge.

Zhang Kai is a software development engineer at the R&D Center of Agricultural Bank of China Co., Ltd. He is good at SpringBoot+Vue full-stack development and loves programming and learning cutting-edge technology information.

<<: Meta withdraws! Apple Vision Pro sells so badly that Zuckerberg calls it quits: no more high-end hybrid headsets! Netizens: tech companies have no new jobs

>>: Apple products to watch in 2024: iPhone 16, Apple Intelligence, and other expected "Glowtime" releases