The latest Android keep-alive implementation principle in 2020

Keep-alive implementation principle

This article is provided by the great Hong Yang and quoted and shared by the author.

Keeping App processes alive has always been the eternal pursuit of major manufacturers, especially leading application developers.

After all, if the App process dies, it can no longer do anything; once the App process dies, it will no longer be able to conduct any business on the user's phone, and all business models will be useless on the [user side].

The early Android system was imperfect, which led to many loopholes for many apps to exploit, so they had various ways to stay alive.

For example, before Android 5.0, the process forked from the App in native mode was not controlled by the system. When the system killed the App process, it would only kill the Java process started by the App.

As a result, a large number of "cancers" were born. They forked the native process and started themselves up through the am command when the App's Java process was killed, thus achieving immortality.

At that time, Android was full of evil spirits and demons; the system could not control the applications at all, so it has long been criticized for its power consumption and lag.

At the same time, the weakness of the system has led to the emergence of a series of frameworks and apps that control system background processes, such as the Xposed framework, blocking operation, Green Guardian, Black Domain, and Refrigerator.

However, with the development of the Android system, everything is evolving in a positive direction.

In Android 5.0 and above, the system kills processes using uid as an identifier, killing the entire process group. Therefore, native processes cannot escape the system's eyes.
Android 6.0 introduced doze, which allows users to enter a low-power mode when the device is unplugged and inactive for a period of time after the screen turns off. The device attempts to keep the system in a doze state.
Android 7.0 strengthens the previously useless standby mode (no longer requires the device to be stationary), and also enables Project Svelte, a project specifically designed to optimize the background of the Android system. Some implicit broadcasts are directly removed on Android 7.0, and apps can no longer start themselves by listening to these broadcasts.
Android 8.0 further strengthens the restrictions on background execution of apps: once an app enters the cached state, if there are no active components, the system will release all wakeup locks held by the app. In addition, the system will restrict certain behaviors of apps that are not running in the foreground, such as restricting access to background services of apps, and not being able to register most implicit broadcasts using Mainifest.
Android 9.0 further improves the power saving mode function and adds application standby grouping. Apps that have not been used for a long time will be put into cold storage. In addition, when the system detects that an application consumes too many resources, the system will notify the user and ask whether the background activity of the application needs to be limited.

However, the devil is always better than the good. As systems continue to evolve, so do methods for keeping alive. About 4 years ago, MarsDaemon appeared. This library uses a dual-process daemon approach to keep alive, and it was very popular for a while.

However, the good times did not last long. After entering the Android 8.0 era, this library gradually died out.

Generally speaking, Android process keep alive is divided into two aspects:

Keep the process from being killed by the system.
After a process is killed by the system, it can be revived.

As the Android system becomes more and more complete, it is becoming increasingly impossible to keep yourself alive by yourself; therefore, there are basically two ways to "keep yourself alive":

Increase the priority of your own process so that the system will not kill you easily;
Apps form alliances with each other, and when one brother dies, the other brothers will pull it up again.

Of course, there is another ultimate method, which is to establish a PY [friend, but I always think it is (pi yan)] relationship with major system manufacturers and add yourself to the system memory cleanup whitelist; for example, the national application WeChat. Of course, ordinary people are not qualified to take this path.

About a year ago, the great gityuan published on his blog a method used by TIM to keep alive, which can be called the "ultimate immortality technique"; this method can greatly improve the survival rate of processes in the current Android kernel implementation. The author studied the implementation principle of this keep alive idea and provided a reference implementation Leoric.

Next, I will share with you the implementation principle of this ultimate black technology to keep people alive.

The underlying technical principle of keep-alive

Know yourself and know your enemy, and you can fight a hundred battles with no danger of defeat.

Since we want to stay alive, we must first know how we died.

Generally speaking, there are two ways for the system to kill a process, both of which are provided by ActivityManagerService:

killBackgroundProcesses
forceStopPackage

On native systems, processes are often killed using the first method, unless the user actively clicks "Force Stop" in the App's settings interface.

Force Stop

However, domestic manufacturers and ROMs such as OnePlus and Samsung now generally use the second method.

The first method is too gentle and cannot control applications that want to cause trouble.

The second method is more powerful. Generally speaking, after being force-stopped, the App can only wait to die.

Therefore, to achieve keep alive, we need to know how force-stop works. In this case, let's track the execution process of the system's forceStopPackage method:

First is the forceStopPackage method in ActivityManagerService:

ActivityManagerService forceStopPackage

Here we can see that the system force-stops the process by uid, so whether you are a native process or a Java process, force-stop will kill you all. Let's continue tracking the forceStopPackageLocked method:

forceStopPackageLocked

This method implementation is very clear:

First, kill all processes in this App, and then clean up the four major component information remaining in system_server; we are concerned about how the process is killed, so continue to track killPackageProcessesLocked, this method will eventually call the removeProcessLocked method in ProcessList, removeProcessLocked will call the kill method of ProcessRecord, let's take a look at this kill:

Here we can see that the target process is killed first, and then the target process group is killed by uid.

If only the target process is killed, we can keep it alive by using dual-process daemonization.

The key lies in this killProcessGroup. After further tracking, it is found that this is a native method. Its final implementation is in libprocessgroup. The code is as follows:

Note the strange number here: 40 .

We continue to track:

Look at what our system does. It loops 40 times and kills processes continuously. It waits 5ms after each kill . The time is over after the loop is completed.

Seeing this code, I think anyone will have a question: If the App still has a process after killing the process 40 times in a row, then isn’t it lucky enough to escape?

Implementation

So, how to achieve this goal?

Let's look at this critical 5ms. Assuming that after the App process is killed, it can start a bunch of new processes at a fast enough speed (within 5ms), then after the system kills all the old processes in one cycle, it will encounter a bunch of new processes after sleeping for 5ms; this cycle will repeat 40 times. As long as we can start a new process every time, our App can escape the system's pursuit and achieve immortality.

Yes, the purgatory-like 200ms, as long as we can survive 200ms, we can successfully overcome the tribulation and achieve enlightenment and ascend.

I don't know if you have ever played the game of Whack-A-Mole. The whole process is very similar. You press one and another one pops up. As long as it pops up quickly enough every time, we win.

Now the crux of the problem lies in:

How to start a bunch of new processes within 5ms?

Looking back at the original keep-alive method, they start the process through the am command. This command is actually a java program. It will start a process and then start an ART virtual machine, then obtain the binder agent of ams, and then communicate with ams for binder synchronization.

This process is really too slow. In this 5ms race against death, its speed is really not satisfactory.

Later, MarsDaemon proposed a new method, which uses binder reference to send Parcel directly to ams. This process is much faster than the am command, thus greatly improving the success rate. In fact, there is still room for improvement here. After all, this is still called at the Java layer. The Java language has a very criticized feature in such a situation with extremely high real-time requirements:

Garbage Collection (GC); although the possibility of encountering a GC pause directly within 5ms is very small, due to the existence of GC, there are many checkpoints in the Java code in ART;

Imagine that you are a courier who has important military information to report, but you encounter many obstacles on the way and may be ordered to stop temporarily. This situation is unacceptable. Therefore, the best way is to send a binder call to AMS through native code;

Of course, if we go a little lower level, we can even send data directly to the binder driver through ioctl to complete the call, but this method has poor compatibility and is not as worry-free as the native method.

By sending a binder message to AMS at the native layer to start the process, we have solved the problem of "quickly starting the process". But this is still not enough. Let's go back to the game of whack-a-mole. If you press a mole, a new mole will pop up. If you can press it every time, the probability of winning is still relatively high.

But what if every time you press a mole, all the other moles pop up? That's much more difficult. If our process can pull up all the other processes after any one of them dies, it will be difficult for the system to kill us.

The new keep-alive technology uses two mechanisms to ensure that processes are pulled together:

The two processes sense each other's death by monitoring each other's file locks.
A child process is generated by fork. The forked processes belong to the same process group. When one process is killed, it will trigger the killing of another process, which will be sensed by the file lock.

Specifically, two processes p1 and p2 are created. These two processes are associated with each other through file locks. When one is killed, the other is started. At the same time, p1 generates an orphan process c1 after two forks, and p2 generates an orphan process c2 after two forks. A file lock association is established between c1 and c2. In this way, if p1 is killed, p2 will immediately sense it. Since p1 and c1 belong to the same process group, the killing of p1 will trigger the killing of c1. After c1 dies, c2 will immediately sense it and start p1. Therefore, the four processes form an iron triangle, thus ensuring the survival rate.

After this analysis, we have a clear idea of the general principle of this solution.

Based on the above principles, I wrote a simple PoC, the code is here:

https://github.com/xinjianteng/Leoric

Those who are interested can take a look.

AMS kills processes one by one using ProcessRecord (https://android.googlesource.com/platform/frameworks/base/+/4f868ed/services/core/java/com/android/server/am/ActivityManagerService.java#5766), which means that killProcessgroup in libprocessgroup will be executed multiple times.

In this way, when killing a process belonging to a cgroup, the other process can survive as long as it successfully starts once android:process is another process. Because the new process corresponds to a new ProcessRecord, it will not be killed in the above loop. In addition, the 40-time loop gives a very long time to start a new one. By observing the log, we can find that the interval of killProcessgroup is as long as tens to more than one hundred milliseconds.

Room for improvement

The principle of this solution is relatively simple and intuitive, but to achieve stable keep-alive, many details need to be supplemented; especially the 5ms race against death, which needs to be optimized at all costs to increase the success rate.

Specifically, the current implementation is called by binder in the Java layer, and we should complete it in the native layer. I have implemented this solution before, but this library is essentially detrimental to the interests of users, so I do not intend to make the code public. Here is a brief introduction to the implementation ideas for everyone to learn:

How to perform binder communication at the native layer?

libbinder is an NDK public library. Get the corresponding header file and link it dynamically.

Difficulty: There are many dependencies, and stripping header files is a manual job.

How to organize the data for binder communication?

The communication data is actually a binary stream, which is specifically represented by a (C++/Java) Parcel object. The native layer does not have a corresponding Intent Parcel, so the compatibility is poor.

plan:

The Java layer creates a Parcel (including the Intent), obtains the mNativePtr (native peer) of the Parcel object, and passes it to the Native layer.
The native layer directly converts mNativePtr into a structure pointer.
Fork the child process, establish a pipeline, and prepare to transfer parcel data.
The child process reads the pipe, gets the binary stream, and reassembles it into a parcel.

How to deal with it?

Today I am making this implementation principle public and providing PoC code. This is not to encourage everyone to use this method to keep the system alive, but to hope that major system manufacturers can be aware of the existence of this black technology and promote their own systems to completely solve this problem.

I knew about this program two years ago, but it was little known at the time.

In the past month, I found that many apps have used this solution, which has caused terrible trouble for my Android phone. After all, I have nearly 800 apps installed on my phone. If every app uses this solution to stay alive, then the system will be unusable.

How does the system respond?

If we compare the system killing of processes to beheading, then the essence of this survival solution is to quickly grow a new head; therefore, the solution is also very simple, as long as we kill a process, we let other processes stay still and not cause trouble. There are many specific implementation methods, which will not be elaborated here.

How do users respond?

Until manufacturers come up with a solution, users can have some options to mitigate rogue apps that use this solution to stay alive.

Here are two applications recommended to you:

refrigerator
Island

The freezer and Island's deep sleep can completely prevent the App from keeping alive. Of course, if you like other "freezing" apps, such as the Black Room or Tai Chi's Yin Yang Gate, that's fine too.

Other applications that do not suppress background activities through the "freezing" mechanism will theoretically have very limited effects on this keep-alive solution.

Summarize

1. There is nothing wrong with black technology. It is just a means to fight against the system by deeply understanding the underlying principles of the system. Many people would ask, what is the use of understanding the underlying principles of the system? This article should be able to give an answer: it can realize functions that others can never achieve, promote products through technology, and thus generate huge commercial value.

2. Although black technology is powerful, it should not exist in this world. Without rules, there will be no order. Black technology can be black for a while, but it cannot be black forever. To improve the survival rate of products, it ultimately depends on the products themselves. Respecting users and improving the experience is the right way.

<<: From the Father of Android to the Voice of Rock, why did those mobile phone brands that started off strong but ended up failing?

>>: Mobile app development trends you must know in 2020

Popular Science Comic | What is "Carbon Finance"? What does it have to do with us?

The latest Android keep-alive implementation principle in 2020

The underlying technical principle of keep-alive

Implementation

Room for improvement

How to deal with it?

Summarize

Popular Science Comic | What is "Carbon Finance"? What does it have to do with us?

Why should Android phones fully popularize 64-bit applications in 2022?

The secret of a hit product: creating influence

A Brief Analysis of Content Distribution Feed Flow

Champollion - a genius who rivaled Thomas Young

Example: How to design an operational activity H5?

Kugou Music product analysis!

Is calcium supplementation useful for the elderly and children? Do young people need calcium supplementation?

There have been multiple cases of hemorrhagic fever in Xi'an. Is it related to eating strawberries?

What is the SEO Manager Workflow? What are the steps in SEO marketing training process?

Recommend

4 core methods for event operation and promotion

We discovered these secrets from the public account comments of Alipay, Durex, and Baidu PR

Using drones to provide Internet services? Unfortunately, Google and Facebook have failed

Core methodology of community fission activities

YaYaYa! The 280th Danxia flyer turned out to be this one!

Bread crumbs scattered on Mars: Hansel and Gretel, please reply!

4 carefully selected WeChat mini-programs to make your phone omnipotent

Zhihu's latest recommendation algorithm

The marketing tricks that were misled by Pinduoduo

Analyze the differences between Zhihu and Jianshu from five dimensions of communication and channels!

The Revelation of Mao Xiaobai's Loss of Money and the Trap Ebook PDF

Manganese ore is not "fierce", but a little "cute"

Learn Photoshop from scratch, 18 lessons to go from a novice to a master

If you catch a cold 6 times a year, your immune system is weak? Wrong! Here are 4 things you should do →

Unlocking the "treasure bowl" of oil and gas: Providing a "Chinese sample" for global shale gas