How Facebook reduces FOOMs in its app

How Facebook reduces FOOMs in its app

At Facebook, we're committed to making apps stable, fast, and reliable. On Facebook's iOS app, we've done a lot of work to reduce crash rates and improve overall app stability. Previously, most crashes were due to routine errors, usually accompanied by a stack traceback of the corresponding line of code and a hint of what might have caused the problem.

As we continued to address crashes, we observed that the percentage of crashes that needed to be addressed was decreasing, but we noticed that the App Store indicated that the community continued to experience frustrating app crashes. We dug deeper into user reports and theorized that out-of-memory events (OOMs) might be occurring. OOMs typically occur when the system is running low on memory and the OS terminates the app in order to reclaim memory. It can happen in the foreground or background. We call these FOOMs and BOOMs internally — it sounds funny when we say the app BOOMed.

From a user's perspective, a foreground low memory crash is not always clear from a regular crash. There are several scenarios where the app terminates abnormally, seemingly disappears, and the user returns to the device home screen. If the memory consumption rate increases rapidly, the app will be terminated without any notification. In iOS, the OS will send memory warnings to the app, but there is no guarantee that the OS will send a warning message to the app before terminating the app. This makes it difficult to easily know whether the app was terminated by the OS due to memory pressure.

Analyze the problem

To understand how often the application terminates due to OOM crashes, we list all known ways that the application may terminate and log them. This turns the question into "what causes the application to restart?"

The reasons why the application needs to be restarted are as follows:

The app has been updated

Application exit or termination

App crashes

User force quits the app

Device reboot (including OS upgrade)

The application is out of memory (OOM) in the foreground or background

By eliminating instances that are distinct from other restart reasons, we can find out when the OOM occurs. In addition, we track when the app enters the background and foreground, which allows us to accurately separate OOMs into BOOMs and FOOMs.

The logs showed that the rate of OOMs was high when the device was in a low memory state. It was very frustrating when the app process was killed as if it was evicted on a memory-constrained device. Looking at the relevant log records helped us verify the effectiveness of the elimination method and continue to improve the logging (we cannot accurately verify all cases, such as app upgrades).

Our initial efforts to reduce OOMs were to try to proactively shrink the application's memory footprint as quickly as possible when the application no longer needed the memory. Unfortunately, we saw no tangible change in the number of OOM crashes, so we shifted our focus to large memory allocations and began looking at memory that might be leaking (not cleaned up), especially potential circular references.

Memory usage analysis

As we started to address the memory leak, we saw a reduction in OOM crashes, but still not as high as we expected. We then delved into the memory profiler in Apple's Instruments app and noticed that a repetitive UIWebView would allocate a large amount of memory whenever the app opened any web page. We also found that memory was often not reclaimed, even when the user left the page and the web view was closed.

We tried a lot of optimizations, like clearing caches and content, but the memory footprint of the app process always grew significantly when switching to the web view. iOS includes a new class — WKWebView — that does most of the work in a separate process, which means that most memory-related web view usage will not be allocated to our process. In the event of low memory, the web view process will be killed, but our app will most likely survive. After we migrated our app to WKWebView, we actually saw a significant reduction in the rate of OOMs. Yay!

Memory allocation ratio

When analyzing memory usage with Instruments, we also found that the application allocates a large amount of temporary memory (~30 MB) and then releases it immediately. If the CPU is idle during this allocation, the OS will terminate the program. We want to disable such temporary allocations, which can help us reduce OOM crashes in 30% of the scenarios. We also experimented and found that allocating once and then managing memory is better for application reliability than repeatedly allocating and releasing memory.

Stop memory corruption

Even with WKWebView, we still found that even small memory leaks could significantly impact OOM rates. With our typical release schedule and the many teams contributing to the app, it was very important to catch and prevent memory leaks in released apps. We changed the scanner, which is fundamentally designed for testing mobile performance, to record resident memory in large processes, allowing the scanner to flag deteriorations as soon as they were added. This has helped us keep OOM rates much lower than when we first addressed the problem.

In-app memory profiler

The key technology we used in the previous project was to build an in-app memory analyzer to quickly analyze the application by tracking the memory allocation of all Objective-C objects. We configured this on the scanner and then built our application in it.

How it works: For each class in the system, a count of currently active instances is maintained. At any point we can ask it to print out the current count of each class object. We can then analyze this data for any unusual release-to-release patterns to identify overall memory allocation patterns in our application, and if the counts change dramatically, this can generally be identified as a memory leak. We're going to do this in a way that's performant enough and doesn't cause any impact to users.

Here's a quick summary of our strategy and how we track NSObject memory allocations.

We start by creating a memory allocation tracking class. This is a super straightforward and simple class with public methods to count instances as they are added and subtracted. We use C++ instead of Objective-C because that minimizes the memory allocation and CPU usage of the tracker.

  1. class AllocationTracker {
  2. static AllocationTracker* tracker();
  3.  
  4. void incrementInstanceCountForClass(Class aCls);
  5. void decrementInstanceCountForClass(Class aCls);
  6.  
  7. std::vector countsSnapshot();
  8. ...
  9. }

We can then use the iOS method swizzling technique (called "swizzling", using the runtime's class_replaceMethod method) to replace the standard iOS methods +alloc and +dealloc with -fb_originalAlloc and -fb_originalDealloc.

We then replace +alloc and +dealloc with our new implementations of increasing and decreasing the number of allocated and deallocated instances, respectively.

  1. @implementation NSObject (AllocationTracker)
  2.  
  3. + (id)fb_newAlloc
  4. {
  5. id object = [self fb_originalAlloc];
  6. AllocationTracker::tracker()->incrementInstanceCountForClass([object class ]);
  7. return object;
  8. }
  9.  
  10. - ( void )fb_newDealloc
  11. {
  12. AllocationTracker::tracker()->decrementInstanceCountForClass([object class ]);
  13. [self fb_originalDealloc];
  14. }
  15.  
  16. @end  

Then, while the application is running, we can call the snapshot method regularly to print the number of currently alive instances.

Application Reliability

Once we implemented changes to address memory issues in the Facebook iOS app, we saw a significant reduction in (F)OOMs and app crash reports from users. OOM crashes were a blind spot for us because there was no formal system or API to detect them at will. No one likes an app crashing unexpectedly. But using some tools, best practices for iOS, and some smart ways to address this issue can make our app more reliable and ensure that it doesn't crash when you open a web view to view an interesting article (like this one).

Additional thanks to Linji Yang, Anoop Chaurasiya, Flynn Heiss, Parthiv Patel, Justin Pasqualini, Cloud Xu, Gautham Badrinathan, Ari Grant, and many others for helping reduce the FOOM rate.

<<:  Common knowledge that all Android developers should know

>>:  Experts: Windows Phone is the most secure mobile system

Recommend

Analysis of the principles of advertising bidding mechanism!

On a media platform, there will be tens of thousa...

Experience and tips on how to increase followers and monetize on Toutiao!

In mid-September 2016, I registered a Toutiao acc...

How to plan group buying activities on mini programs?

The author was recently responsible for designing...

In-depth analysis: Operational strategies behind product cold start

Cold start - The state of a social product before...

Douyin Blue V certification failed, how to submit successfully?

What’s hot right now? Of course, it’s the self-me...

A mysterious area that covers 60% of the Earth hides a new discovery in ecology!

Although human technology has enabled us to trave...