[[434595]] Preface Linux introduces Watchdog. In the Linux kernel, when Watchdog is started, a timer is set. If no write operation is performed to /dev/Watchdog within the timeout period, the system will restart. Watchdog implemented by timer belongs to the software level; Android has designed a software-level Watchdog to protect some important system services. When a failure occurs, the Android system will usually restart. Due to the existence of this mechanism, some system_server processes are often killed by Watchdog, causing the phone to restart. Today we will analyze the principle; 1. Detailed explanation of WatchDog startup mechanism The ANR mechanism is for applications. For system processes, if they are "unresponsive" for a long time, the Android system has designed a WatchDog mechanism to control them. If the "unresponsive" delay is exceeded, the system WatchDog will trigger the suicide mechanism; Watchdog is a thread that inherits from Thread. In SystemServer.java, get the watchdog object through getInstance; 1. Start in SystemServer.java - private void startOtherServices() {
- ······
- traceBeginAndSlog( "InitWatchdog" );
- final Watchdog watchdog = Watchdog.getInstance();
- watchdog.init(context, mActivityManagerService);
- traceEnd();
- ······
- traceBeginAndSlog( "StartWatchdog" );
- Watchdog.getInstance().start();
- traceEnd();
- }
Because it is a thread, all you need to do is start it; 2. View the construction method of WatchDog - private Watchdog() {
- super( "watchdog" );
- // Initialize handler checkers for each common thread we want to check . Note
- // that we are not currently checking the background thread, since it can
- // potentially hold longer running operations with no guarantees about the timeliness
- // of operations there.
- // The shared foreground thread is the main checker. It is where we
- // will also dispatch monitor checks and do other work .
- mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
- "foreground thread" , DEFAULT_TIMEOUT);
- mHandlerCheckers.add (mMonitorChecker) ;
- // Add checker for main thread. We only do a quick check since there
- // can be UI running on the thread.
- mHandlerCheckers.add (new HandlerChecker(new Handler (Looper.getMainLooper()),
- "main thread" , DEFAULT_TIMEOUT));
- // Add checker for shared UI thread.
- mHandlerCheckers.add (new HandlerChecker(UiThread.getHandler(),
- "ui thread" , DEFAULT_TIMEOUT));
- // And also check IO thread.
- mHandlerCheckers.add (new HandlerChecker( IoThread.getHandler (),
- "i/o thread" , DEFAULT_TIMEOUT));
- // And the display thread.
- mHandlerCheckers.add (new HandlerChecker(DisplayThread.getHandler(),
- "display thread" , DEFAULT_TIMEOUT));
- // Initialize monitor for Binder threads.
- addMonitor(new BinderThreadMonitor());
- mOpenFdMonitor = OpenFdMonitor. create ();
- // See the notes on DEFAULT_TIMEOUT.
- assert DB ||
- DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;
- // mtk enhance
- exceptionHWT = new ExceptionLog();
- }
Focus on two objects: mMonitorChecker and mHandlerCheckers The source of the mHandlerCheckers list elements: Import of construction objects: UiThread, IoThread, DisplatyThread, FgThread added External import: Watchdog.getInstance().addThread(handler); Source of mMonitorChecker list elements: External import: Watchdog.getInstance().addMonitor(monitor); Special note: addMonitor(new BinderThreadMonitor()); 3. Check the run method of WatchDog - public void run() {
- boolean waitedHalf = false ;
- boolean mSFHang = false ;
- while ( true ) {
- ······
- synchronized (this) {
- ······
- for ( int i=0; i<mHandlerCheckers. size (); i++) {
- HandlerChecker hc = mHandlerCheckers.get(i);
- hc.scheduleCheckLocked();
- }
- ······
- }
- ······
- }
Check the mHandlerCheckers list elements; 4. Check scheduleCheckLocked of HandlerChecker - public void scheduleCheckLocked() {
- if (mMonitors. size () == 0 && mHandler.getLooper().getQueue().isPolling()) {
- // If the target looper has recently been polling, then
- // there is no reason to enqueue our checker on it since that
- // is as good as it not being deadlocked. This avoid having
- // to do a context switch to check the thread. Note that we
- // only do this if mCheckReboot is false and we have no
- // monitors, since those would need to be executed at this point.
- mCompleted = true ;
- return ;
- }
- if (!mCompleted) {
- // we already have a check in flight, so no need
- return ;
- }
- mCompleted = false ;
- mCurrentMonitor = null ;
- mStartTime = SystemClock.uptimeMillis();
- mHandler.postAtFrontOfQueue(this);
- }
When mMonitors.size() == 0: Mainly to check whether the elements in mHandlerCheckers have timed out, the method used is: mHandler.getLooper().getQueue().isPolling(); The list elements of the mMonitorChecker object must be greater than 0. At this time, the focus is on mHandler.postAtFrontOfQueue(this); - public void run() {
- final int size = mMonitors.size () ;
- for ( int i = 0 ; i < size ; i++) {
- synchronized (Watchdog.this) {
- mCurrentMonitor = mMonitors.get(i);
- }
- mCurrentMonitor.monitor();
- }
- synchronized (Watchdog.this) {
- mCompleted = true ;
- mCurrentMonitor = null ;
- }
- }
Listen to the monitor method, here is to monitor mMonitors, and the only one that can meet the conditions is: mMonitorChecker, for example: various services are added to the list through addMonitor; - ActivityManagerService.java
- Watchdog.getInstance().addMonitor(this);
- InputManagerService.java
- Watchdog.getInstance().addMonitor(this);
- PowerManagerService.java
- Watchdog.getInstance().addMonitor(this);
- ActivityManagerService.java
- Watchdog.getInstance().addMonitor(this);
- WindowManagerService.java
- Watchdog.getInstance().addMonitor(this);
The monitor method executed is very simple, for example ActivityManagerService: - public void monitor() {
- synchronized (this) { }
- }
Here we just check whether the system service is locked; Watchdog's inner class;
- private static final class BinderThreadMonitor implements Watchdog.Monitor {
- @Override
- public void monitor() {
- Binder.blockUntilThreadAvailable();
- }
- }
- android.os.Binder.java
- public static final native void blockUntilThreadAvailable();
- android_util_Binder.cpp
- static void android_os_Binder_blockUntilThreadAvailable(JNIEnv* env, jobject clazz)
- {
- return IPCThreadState::self()->blockUntilThreadAvailable();
- }
- IPCThreadState.cpp
- void IPCThreadState::blockUntilThreadAvailable()
- {
- pthread_mutex_lock(&mProcess->mThreadCountLock);
- while (mProcess->mExecutingThreadsCount >= mProcess->mMaxThreads) {
- ALOGW( "Waiting for thread to be free. mExecutingThreadsCount=%lu mMaxThreads=%lu\n" ,
- static_cast<unsigned long>(mProcess->mExecutingThreadsCount),
- static_cast<unsigned long>(mProcess->mMaxThreads));
- pthread_cond_wait(&mProcess->mThreadCountDecrement, &mProcess->mThreadCountLock);
- }
- pthread_mutex_unlock(&mProcess->mThreadCountLock);
- }
Here we just check that the number of executable threads contained in the process does not exceed mMaxThreads. If it exceeds the maximum value (31), we need to wait; - ProcessState.cpp
- #define DEFAULT_MAX_BINDER_THREADS 15
- But systemserver.java sets
- // maximum number of binder threads used for system_server
- // will be higher than the system default
- private static final int sMaxBinderThreads = 31;
- private void run() {
- ······
- BinderInternal.setMaxThreads(sMaxBinderThreads);
- ······
- }
5. Exit after timeout - public void run() {
- ······
- Process.killProcess(Process.myPid());
- System.exit(10);
- ······
- }
Kill the process you are in (system_server) and exit; 2. Principle Explanation 1. All services that need to be monitored in the system call Watchdog's addMonitor to add Monitor Checker to the mMonitors List or addThread method to add Looper Checker to the mHandlerCheckers List; 2. When the Watchdog thread is started, it begins an infinite loop and its run method begins to execute; - The first step is to call HandlerChecker#scheduleCheckLocked to process all mHandlerCheckers
- The second step is to regularly check whether it has timed out. The interval between each check is set by the CHECK_INTERVAL constant, which is 30 seconds. Each check will call the evaluateCheckerCompletionLocked() method to evaluate the completion status of HandlerChecker:
- COMPLETED means it has been completed;
- WAITING and WAITED_HALF indicate that the system is still waiting but has not timed out. A trace will be dumped once during WAITED_HALF.
- OVERDUE means timeout has occurred. By default, timeout is 1 minute;
3. If the timeout is reached and the HandlerChecker is still in an unfinished state (OVERDUE), get the blocked HandlerChecker through the getBlockedCheckersLocked() method, generate some descriptive information, save the log, including some runtime stack information. 4. Finally, kill the SystemServer process; Summarize Watchdog is a thread used to monitor whether the system services are running normally and no deadlock occurs; HandlerChecker is used to check Handler and monitor; Monitor uses locks to determine whether there is a deadlock; If the timeout is 30 seconds, the log will be output, and if the timeout is 60 seconds, the system will restart. Watchdog will kill its own process, which means that the system_server process id will change at this time; This article is reproduced from the WeChat public account "Android Development Programming" |