Using Clang Address Sanitizer directly on Xcode 7

Using Clang Address Sanitizer directly on Xcode 7

[[143029]]

At WWDC 2015, in addition to Swift 2.0, there was another exciting news: Clang's Address Sanitizer can be used directly on Xcode 7. In this article, we will discuss this feature in detail, such as how it works and how to use it. This is a topic proposed by Konstantin Gonikman.

An unusually dangerous situation in C

C is a great programming language in many ways. The fact that it's still going strong, more than 40 years after its invention, speaks volumes about its greatness. It wasn't the first (or second) programming language I learned, but it was the first language that really opened my eyes to the mystery of how computers work. And it's the only language I still use today.

However, C is also a very dangerous programming language, and many pains in the code world are born from it. It causes many weird bugs that cannot be expressed in other programming languages.

Memory safety is a major problem. There is no memory safety in C. Code like the following will compile fine and may work fine:

  1. char *ptr = malloc( 5 );
  2. ptr[ 12 ] = 0 ;

This code only allocates 5 bytes of array space, but writes data to the 13th byte through the pointer. At this address, hidden data corruption may occur, or it may be safe (for example, on Apple platforms, the malloc function always allocates at least 16 bytes, even if you allocate less than 16 bytes, so this code runs normally on Apple platforms, but don't rely on this feature of the system). This erroneous code may not cause much harm, or it may cause endless trouble.

Smarter languages ​​keep track of array sizes and verify the validity of subscripts when performing operations. The same Java code will throw exceptions more reliably. With exceptions, debugging these "magical" problems is much easier. For example, if a variable should be 4, but its value is 5, we know that there is a problem with a piece of code that modifies the value of the variable (at least this way we can focus on debugging the program instead of staring at the compiler, because it generally does not make mistakes). But with C language, we can't make assumptions at all. The bug may be caused by a piece of code that "intentionally" modifies the value of the variable, or it may be caused by a piece of code that uses a "bad pointer" to accidentally modify the value of the variable.

The industry has begun to work on this problem. For example, Clang's static code analysis can find certain types of memory safety issues in the code. Programs such as Valgrind can detect unsafe memory accesses at runtime.

Address Sanitizer is another solution. It uses a new approach, which has pros and cons. But it is still a powerful tool for finding code problems.

Memory access verification

Many of these tools find problems by verifying the validity of memory accesses at runtime. The theory is that by comparing the memory accessed with the memory actually allocated by the program, the validity of the memory access can be verified, thus detecting bugs when they occur, rather than waiting until side effects occur.

Ideally, each pointer would contain the size of the data and the location in memory it points to, so each memory access can be verified against these. There is no specific reason why the C compiler was not designed with verification features in mind. However, the metadata attached to the pointer will make the program incompatible with code compiled by the standard C compiler. This means that you cannot simply use system libraries, which will inevitably severely limit the use of the system to detect code.

Valgrind solves the above problems by running the entire program on the simulator. In this way, the binary file generated by the standard C compiler can be run directly without any additional modifications. Then the program is analyzed while it is running, checking every block of memory handled by the program. This method allows it to run all programs efficiently, including system libraries, without making any modifications. The cost of doing this is that the speed becomes very slow, so it is not practical in some programs that require high efficiency. In addition, this method requires a deep understanding of the meaning of a certain platform system call.

Only then can memory changes be properly tracked. This necessarily requires deep integration into the specific host system. For many years, Valgrind had no clear plans to support the Mac. As of the time of this posting, it does not support Mac 10.10.

Protective memory allocation takes advantage of the CPU's built-in memory checking facility. It replaces the standard malloc function. When used, the end of each allocated memory is marked as non-readable or non-writable. When the program tries to access the following memory, an error occurs. This approach has a drawback: the hardware memory protection is not precise enough. Memory can only be marked as readable or non-readable at the memory page scale, and in modern operating systems, memory pages are at least 4kB in size. This means that each memory allocation requires at least 8kB of memory: one page of memory to store the data, and another page to restrict out-of-bounds memory access. This is required even if only a few bytes of memory are requested. In addition, this approach also causes small-scale out-of-bounds accesses to go undetected. In order to store the protection for the standard malloc memory, the memory needs to be allocated within a 16-byte range, so if the allocated memory size is not an integer multiple of 16 bytes, the remaining bytes will not be protected.

Memory sanitizers attempt to handle memory constraints at a smaller granularity. In essence, such memory allocation protection mechanisms are slower, but more practical.

Tracking restricted memory

Since hardware-level memory protection is not possible, software must be used to implement it. Because additional data cannot be passed through pointers, memory tracking must be done through some kind of "global table". This table needs to be able to be read and modified quickly.

Memory Sanitizer uses a simple but clever method: it saves a fixed area in the process's memory space, called the "shadow memory area". In the terminology of Memory Sanitizer, a memory marked as restricted is called "poisoned" memory. The "shadow memory area" records which memory bytes are poisoned. Through a simple formula, the memory space in the process can be mapped to the "shadow memory area", that is, every 8-byte normal memory block is mapped to one byte of shadow memory. In the shadow memory, the "poison status" of these 8 bytes will be tracked.

Since every 8 bytes of memory is mapped to 8 bits (1 byte) of shadow memory, we naturally think that the "poison status" of each byte of memory can only be marked by one bit on the shadow memory. However, the actual situation is that the memory sanitizer uses an integer value to record each byte when tracking the memory status. It assumes that all "poisoned memory" blocks are continuous and ordered from back to front, so a byte of shadow memory can be used to represent the number of "poisoned" memory in the normal memory block. For example: 0 means all memory is normal; 1 means the last byte is problematic; 2 means the last two bytes are problematic, and so on, 7 means all of these bytes are problematic. If all 8 bytes are "poisoned", this value will be negative. In this way, you can check when accessing memory. The starting positions of allocated memory are generally not too close, so assuming that the "poisoned" memory is continuous and ordered from back to front will not cause any problems.

With this table structure, Address Sanitizer generates additional code in the program to check every read and write operation using pointers, and throw errors in the case of memory poisoning. This feature is integrated into the compiler, not just in the external library and runtime environment, which brings a lot of benefits: every pointer access can be reliably identified and the appropriate memory checks are added to the machine code.

The compiler integration also supports some neat tricks, such as being able to track protected local and global variables in addition to memory allocated on the heap. There are gaps between local and global memory allocations that can cause overflows if the memory is "poisoned". Protected memory allocation is powerless in this regard, and Valgrind is also struggling to cope with it.

Compiler-integrated features also have their drawbacks. Specifically, address sanitizer cannot catch bad memory accesses in system libraries. Of course, it is "compatible" with system libraries. You can turn on the memory sanitizer feature when using system libraries. For example, you can build an application that links Cocoa and run it normally. But it will not catch bad memory accesses caused by Cocoa, nor can it detect memory allocated when your code calls Cocoa.

Memory sanitizers can also be used to catch use-after-free errors. Memory is marked as "poisoned" after it is freed, and it cannot be accessed again. Use-after-free errors are particularly harmful when reusing memory, because you can corrupt unrelated data. Memory sanitizers prevent such errors when reusing memory by placing the newly freed memory in a collection queue, where it cannot be requested for a period of time. Of course, adding checks to every pointer access is expensive. It depends on what the code does, because different types of code access pointer contents at different frequencies. On average, memory checks slow down your program by about 2-5 times, which is a lot of overhead, but not enough to make the program unusable.

How to use?

Using Address Sanitizer on Xcode 7 is simple. When compiling from the command line, you need to add the -fsanitize=address parameter to the clang command call. Here is a test program:

Compile and run through Address Sanitizer:

The program crashes immediately, outputting a lot of content:

There is a lot of information here, and in real life, this information can be a huge help in tracking down the problem. Not only does it show where the bad memory was written to, but it also identifies where the memory was originally allocated. Plus, there is a lot of other additional information.

Using Memory Sanitizer in Xcode is even easier: edit the scheme, click the Diagnostics tab, and check the "Enable Address Sanitizer" option. Then you can build and run normally, and you will see a lot of diagnostic information.

Additional feature: Undefined behavior sanitizer

Incorrect memory access is just one of many "interesting" undefined behaviors in C. Clang also provides other sanitizers that can catch many undefined behaviors. Here is an example program:

  1. #include #include int main( int argc, char **argv) {
  2.  
  3. int value = 1 ;
  4.  
  5. for ( int x = 0 ; x < atoi(argv[ 1 ]); x++) {
  6.  
  7. value *= 10 ;
  8.  
  9. printf( "%d\n" , value);
  10.  
  11. }
  12.  
  13. }

Run the code:

The result is a bit weird at the end. Signed integer overflow is, of course, undefined behavior in C. It would be nice to catch this error instead of generating bad data. Undefined behavior sanitizer can help, pass -fsanitize=undefined-trap -fsanitize-undefined-trap-on-error to enable it:

There is no additional output like address sanitizer does, but the program execution stops immediately when an error occurs, and we can easily find the problem with the debugging tools.
The undefined behavior sanitizer is not yet integrated into Xcode, but you can use it by adding compiler flags in your project's build settings.

in conclusion

Address Sanitizer is a great technology that can help us find many problems in C code. It is not perfect and cannot find all errors, but it can still provide very useful diagnostic information. Here, I strongly recommend that you try it in your own code, and you will find the results that surprise you.

<<:  The Value of Code Review — Why, When, and How?

>>:  Only know how to breakpoint with the left mouse button? It's time to try this and that breakpoint

Recommend

How to use Android image resources to create a more sophisticated APP

Preface Due to the openness of the Android system...

LeTV TV "9·19": All doubts turned into jokes

The development of enterprises in the Internet er...

How did the first batch of seed users of the product come from?

Many startup products are struggling to find the ...

Zhihu summary: What would you do if Tencent copies you!

[[143948]] Everyone who wants to start a business...

2020-2021, why do you think marketing is so difficult?

According to our observation, brands mostly face ...

Four ways of Android multithreading

When we start an App, the Android system will sta...

How to make a dying community forum popular again within 4 months?

Regarding how to operate a community, the author ...

What are the correct steps to develop an app?

In iOS development, it is easy to write an App, b...

Why did map manufacturers’ O2O dreams all fall apart?

Recently, Yu Yongfu, the new chairman of UC, wrot...

Childhood memories are not gone, we just can’t read them|Tech Weekly

Compiled by Zhou Shuyi and Pingsheng Why do anima...

User system construction: analysis of user grouping methodology!

During this period, Tik Tok has become popular an...