Summary of Android hot fix technology

Summary of Android hot fix technology

Pluginization and hotfixing are relatively advanced knowledge points in Android development, and are skills that intermediate developers must master on their way to advanced development. For knowledge about pluginization, you can refer to my previous introduction: Android pluginization. This article focuses on hotfixing and gives a brief summary of the currently popular hotfixing technologies.

Hotfixes

What is a hotfix?

Simply put, a patch solution is proposed to fix online problems, and the program patching process does not require a new version!

Technical Background

In the normal software development process, offline development -> online -> bug discovery -> emergency fix and online. However, this method is too costly.

The hot fix development process is more flexible. There is no need to re-release the version. Hot fixes can be performed efficiently in real time without downloading new applications. The cost is low and most importantly, bugs can be fixed in a timely manner.

Current hot fix technology

The current popular hot fix technologies are:

  • QQ Space Super Patch, WeChat [Tinker]
  • Alibaba's Sophix and Alibaba Hotfix
  • Ele.me Amigo
  • Meituan Robust
  • 360RePlugin

Thermal Repair Technology

To understand the principle of hot fix technology, we must first understand the Android ClassLoader mechanism. For related articles, you can read the previous introduction: ClassLoader class loading mechanism. Android ClassLoader is divided into PathClassLoader and DexClassLoader, both of which inherit from BaseDexClassLoader. PathClassLoader is used to load system classes and application classes; DexClassLoader is used to load jar, apk, and dex files. For example, the principles of Alibaba's Andfix and Sophix to be introduced below are as follows:

AndFix

AndFix: The patch class is loaded by the classLoader of the patch class, and the corresponding replaceMethod method is called in the native layer for different ArtMethod structures in different Android architectures to replace all the method information such as the class, access rights, code memory address, etc. according to the defined ArtMethod structure.

The stability is poor and will be affected by the changes in the ArtMethod structure made by domestic ROM manufacturers, so this is why AndFix does not support many models.

Sophix

Sophix: The patch class is loaded by the classLoader of the patch class, and the entire artMethod structure is directly replaced by memcpy(smeth,dmth,sizeof(ArtMethod)) in the native layer. Space will be allocated for this class when the class is initialized, and AllocArtMethodArray will be newly created and put into the method array in art. The size of the artMethod structure can be calculated by calculating the starting addresses of the first and second methods of the auxiliary class.

Note: When the patch class is initialized, it will also allocate its own artMethod space, and use the repaired new ArtMethod to replace the content of the old ArtMethod, regardless of the structure of the ArtMethod. Stability is greatly improved!

java

Internal class compilation

The difference between static inner class and non-static inner class

The compiler will generate the same top-level class as the outer class for the inner class. However, the non-static inner class will hold a reference to the outer class. This is why Android performance optimization recommends that Handler use static inner classes to prevent the outer class Activity from not being recycled, causing OOM.

Inner and outer classes access each other

When inner classes and outer classes access each other's private methods and fields, public access and ** methods are automatically generated for each other in the corresponding classes.

Hot deployment solution

If the outer class has an inner class, change the private access rights of all fields/methods to protected or public. If the outer class has an inner class, change the private access rights of all fields/methods to protected or public.

Anonymous inner class compilation

Naming rules for anonymous inner classes

Outer class & number. The compiler adds the number in the order in which the anonymous inner classes appear in the outer class.

Hot deployment solution

Adding/reducing anonymous inner classes is not a solution for hot deployment, because the patch repair tool gets the class file and cannot distinguish between DexFileDemo&1 and DexFileDemo&2, which will cause the order of classes to be messed up. If the anonymous inner class is inserted at the end, it is allowed.

Domain Compilation

Static field, non-static field compilation

Hot deployment does not support adding and deleting fields/methods and modifying the clinit method. The initialization of static fields and static code blocks will be compiled into the clinit method synthesized by the compiler, and the initialization of non-static fields will be compiled into the init no-parameter constructor generated by the compiler.

Static field, static code block

The clinit method is called when the class is initialized during the class loading phase. The order in which static fields and static code blocks appear in the clinit is the order in which they appear in the source code. Because the class has already been loaded, even if the clinit method is fixed, it will not take effect.

dvmResolveClass->dvmLinkClass->dvmInitClass, then execute the clinit method

A class will be loaded in the following situations

  1. New an object of a class is new instance
  2. Calling a static method of a class (invoke static)
  3. Get the value of a static field of a class (sget)

Non-static field, non-static code block

The class constructor will be translated into an init method by the compiler, which will initialize non-static fields and non-static code blocks first. The order in which they appear is the same as the order in the source code.

When executing the new instance instruction, if the class has not been loaded, it will try to load the class. Then it will allocate memory for the object, and then execute the invoke direct instruction to call the init constructor of the class for initialization.

Hot deployment solution

Modification of static fields and static code blocks is not supported, which will cause hot deployment to fail and can only be effective in cold start. Modification of non-static fields and non-static code blocks is supported, and hot deployment only changes the init constructor as a normal method.

final static field compilation

Final static domain compilation rules

Final static reference type initialization is still in clinit. Final static basic types and String types, class loading initialization dvminitClass executes initSFields before executing the clinit method. This method assigns a default value to the static field. The reference type defaults to NULL. The basic types and String types modified by final static will be initialized and assigned values ​​here.

Final static domain optimization principle

  • The inal static basic type executes the const/4 instruction, and the position of the operand in dex (encoded_array_item) is one byte after the opcode.
  • The final static String type executes the const-string instruction, which is essentially the same as above, except that it gets the index id of the string constant in the string constant area in the dex file structure. The dex file has an area to store all string constants that will be fully loaded into the virtual machine memory - the string constant area.
  • The final static reference type executes the sget instruction, first calling dvmDexGetResolveField to see if the field has been resolved before. If not, call dvmDexResolveField to try to resolve the field. If the class to which the static field belongs has not been resolved, try calling dvmResolveClass to get the sField, and then get the static value through dvmDexGetResolveField(sField).

Hot deployment solution

  • The type finally referenced by the final static basic type/string type will be replaced by hot deployment.
  • The final static reference type will be translated into the clinit method, so hot deployment fails.

Generic compilation

Why do we need generics?

Java generics are fully implemented by the compiler, which performs type checking and type inference to generate non-generic bytecode, which is called erasure.

Before generics, if you want to implement class generics, you can use the parent class of all classes to perform a forced conversion to Object. This is completely dependent on the programmer's autonomy and is prone to ClassCastException. The emergence of generics solves the problems of type checking and type inference.

Generic type erasure

Java bytecode does not contain generic type information. To distinguish type definitions, you can limit the generic type.

Conflict and resolution between type erasure and polymorphism

The parent class is a generic class with setNumber(T value), and the subclass wants to override setNumber(Number value). However, the actual method of the parent class is setNumber(Object value), and the subclass wants to override it but it becomes an overload, which leads to a conflict between type erasure and polymorphism. However, the compiler automatically helps us synthesize the Bridge method to implement overloading, and generates a bridge method with the same signature in the subclass, which actually calls the overridden method of the subclass internally.

Generic type conversion

If the compiler finds that the variable declaration has generic information, the compiler automatically adds a check-cast forced conversion. Because the compiler will do type checking for generics, the automatic forced conversion will not cause a ClassCastException.

Hot deployment solution

If the parent class patch is changed to add generics, the Bridge method will be added, causing hot deployment to fail.

Changing the method from void get(B t) to B extends Number void get(B t) will not change the method logic, but the method signature will change. In this case, hot fix is ​​meaningless and needs to be avoided.

Lambda Expression Compilation

Lambda expression compilation rules

Lambda expression has the characteristics of functional programming and is the concept closest to closure in Java. Functional interface: an interface has only one abstract method

Runable and Comparator in Java are both typical functional interfaces

The difference between Lamada expressions and anonymous inner classes:

  1. The this keyword refers to the class surrounding the Lamada expression rather than the anonymous inner class itself.
  2. Compilation method: Java compiler compiles Lamda expression into private method of class, and uses invokedynamic of Java7 to dynamically bind this private method. Anonymous inner class generates new class of outer class &number. The compiler will generate lamdamain*{ } private static method under the class, which implements the logic of lamda expression, and the referenced variables will become parameters of the method.

Interpret the lamda expression of the class file under HostSpot VM:

  • The invokeDynamic instruction calls the static method metafactory of java/lang/invoke/LamdaMetafactory. This method will generate a concrete class that implements the functional interface at runtime, and this concrete class will call the static private method.
  • Explain the lamda expression in the dex file under the Android virtual machine: this specific class is generated when it is optimized into a dex file.

Hot deployment solution

Adding a lamada expression will cause an auxiliary method to be added to the external class. The modified lamda expression logic references an external variable, which will cause the auxiliary class to hold an external object and add a variable for the external object. This will also cause the hot fix to fail.

Comparison between Sophix, QQ Super Patch and Tinker Technology

For the popular hot fix solutions on the market, Sophix, QQ Super Patch and Tinker are briefly introduced here. As mentioned earlier, the implementation methods similar to QQ Space and WeChat require restarting to fix the bug, while Alibaba's Sophix adopts a non-immersive method that does not require a cold start.

QQ Space Super Patch

The QQ Space Super Patch uses the plugging method to invade the packaging process, put a helper class in an independent dex for other classes to call, and prevent the class from being marked as CLASS_ISPREVERIFIED during dexopt. The principle is as follows:

Load the patch dex and get the dexFile object as a parameter to build an Element object and insert it at the front of the dexElement array.

Tinker provides a differential package and a solution to replace the entire dex. It merges patch.dex with the application's class.dex to generate a complete dex. Load the complete dex to get the dexFile object as a parameter to build an Element object to replace the dexElements array.

There is no patch query update for the official multiDex. The downloaded patch will take effect the next time it is started.

The process can be summarized as shown in the following figure:

However, careful readers will find that there are still the following problems in the use of QQ Space Super Patch:

  1. It does not support immediate effect and must be restarted to take effect.
  2. In order to implement the repair process, two dex must be added to the application! There is only one class in dalvikhack.dex, which has little impact on performance, but for patch.dex, when the number of repaired classes reaches a certain level, it will take a lot of time to load. For an aircraft carrier-level application like Taobao, it is unacceptable to increase the startup time by more than 2 seconds.
  3. In ART mode, if the class structure is modified, memory corruption will occur. To solve this problem, all related calling classes, parent classes, child classes, etc. must be loaded into patch.dex, which makes the patch package abnormally large, further increasing the time it takes to start loading the application.

In response to the above problems, Tencent has launched the QFix solution.

Calling dvmResolveClass in advance in the native layer ensures that dvmDexGetResolve is not null when called in dvmResolve, which also avoids the problem of verification consistency.

This solution requires that in the case of multiple dex, the referrer class must be the same dex as the patch class, fromUnverifiedConstant must be true, and the referrer must be loaded in advance.

This solution has some problems, which can be bypassed after dexopt, but dexopt will change a lot of the original logic, and many odex-level optimizations will hard-code the offsets of fields and access methods, which will cause serious bugs.

WeChat Tinker

In response to the shortcomings of the QQ Space Super Patch technology, WeChat proposed a solution to provide a DEX differential package and replace DEX as a whole. The main principle is basically the same as the QQ Space Super Patch technology, the difference is that patch.dex is no longer added to the elements array, but patch.dex is given in a differential way, and then patch.dex is merged with the application's classes.dex, and then the old DEX file is replaced as a whole to achieve the purpose of repair. The principle diagram is as follows:

The process of WeChat hot fix is ​​shown in the figure:

However, WeChat's solution still has the following problems:

  1. Like the super patch technology, it does not support immediate effect and must be restarted to take effect.
  2. A new process needs to be started for the application to merge, and the merge may easily fail due to memory consumption and other reasons.
  3. It takes up extra disk space when merging. For multi-DEX applications, if multiple DEX files are modified, multiple patch.dex files need to be sent down for merging with the corresponding classes.dex. This situation will be more serious, so the failure rate of the merging process will also be higher.

HotFix

Compared with QQ Space Super Patch Technology and WeChat Tinker, Alibaba's HotFix solution is positioned in the scenario of emergency BUG repair. It can repair BUGs in the most timely manner, and the pull-down patch takes effect immediately without waiting.

AndFix is ​​different from QQ Space Super Patch Technology and WeChat Tinker's solution of adding or replacing the entire DEX. It provides a way to modify the Field pointer in Native during runtime to achieve method replacement, so that it takes effect immediately without restarting and has no performance consumption for the application. Its principles are as follows:

To replace the implementation method, you need to operate at the Native layer, which mainly involves three steps:

However, HotFix also has shortcomings:

  1. It does not support adding new fields, modifying methods, or replacing resources.
  2. Due to the manufacturer's custom ROM, some models are not supported yet. Poor compatibility.

In summary, the above framework technologies are summarized as follows:

Hot fix summary

There are two main solutions for code repair: one is Alibaba's bottom-level replacement solution, and the other is Tencent's class loading solution. The bottom-level replacement solution has many restrictions, but it has the best timeliness, is easy to load, and takes effect immediately. The class loading solution has poor timeliness and requires a cold restart to take effect, but it has a wide repair range and fewer restrictions.

Bottom layer replacement solution

The underlying replacement solution is to directly replace the original method in the loaded class, which is modified based on the original class. Therefore, it is impossible to add or remove methods and fields from the original class, because this will destroy the structure of the original class.

Once the number of methods in the patch class increases or decreases, the number of methods in the class and the entire Dex will change. The change in the number of methods is accompanied by a change in the method index, so that the correct method cannot be indexed normally when accessing the method.

If fields are added or reduced, the indexes of all fields will change, just like when methods are changed. A more serious problem is that if a class suddenly adds a field while the program is running, the instances of this class that have already been generated will still have the original structure and cannot be changed. When new methods use these old instance objects, accessing the newly added fields will produce unexpected results.

This is an inherent limitation of this type of solution, and the most criticized aspect of the underlying replacement solution is the instability of the underlying replacement.

Traditional low-level replacement methods, whether Dexposed, Andfix or other Hook solutions in the security community, all rely directly on modifying the specific fields of the virtual machine method entity. For example, changing the jni function pointer of the Dalvik method, changing the access rights of the class or method, and so on. This brings a very serious problem. Since Android is open source, each mobile phone manufacturer can modify the code, and the structure of ArtMethod in Andfix is ​​hard-coded according to the structure in the public Android source code. If a manufacturer modifies this ArtMethod structure, it will be inconsistent with the structure in the original open source code, then on this modified device, the universal replacement mechanism will have problems. This is the root cause of instability.

We have also rethought the underlying replacement principle of the code, starting from overcoming its limitations and compatibility, and implemented an immediate code hot fix with a more elegant replacement idea. Sophix implements a replacement method that ignores the specific structure of the underlying layer, that is, replacing the original one by one like this:

In this way, we not only solve the compatibility problem, but also ignore the differences in the underlying ArtMethod structure, so there is no need to distinguish between all Android versions, and the amount of code is greatly reduced. Even if the members of ArtMethod are constantly modified in future Android versions, as long as the ArtMethod array is still arranged in a linear structure, it can be directly applied to future new versions such as Android 8.0 and 9.0, without the need to adapt to new system versions.

Class loading scheme

The principle of the class loading scheme is to let the Classloader load new classes after the app is restarted. Because when the app is halfway through running, all the classes that need to be changed have been loaded, and a class cannot be unloaded on Android. If you don't restart, the original class is still in the virtual machine, and the new class cannot be loaded. Therefore, only when the next restart is completed, the new class in the patch is preemptively loaded before the business logic is reached, so that when the class is accessed later, it will be resolved as a new class. Thus achieving the purpose of hot repair.

Let's take a look at the implementation principles of the three major class loading solutions of Tencent. The QQ Space solution will invade the packaging process and add some useless information for hacking, which is not elegant to implement. The QFix solution needs to obtain the functions of the underlying virtual machine, which is not stable and reliable, and a big problem is that it cannot add public functions.

WeChat's Tinker solution is a complete full-scale dex loading, and it can be said that it has taken patch synthesis to the extreme. However, we found that sophisticated weapons are not suitable for all battlefields. Tinker's synthesis solution is a full-scale synthesis from the dex method and instruction dimension, and the entire process is developed by ourselves.

Although it can save a lot of space, the comparison of dex content is too fine-grained, the implementation is relatively complex, and the performance consumption is relatively serious. In fact, the size of dex accounts for a relatively low proportion of the entire apk. The size of the dex file in an app is not the main part, and the main space occupied is still the resource file. Therefore, the cost-effectiveness of the space-time cost conversion of the Tinker solution is not high.

In fact, the best granularity for dex comparison should be in the class dimension. It is neither as subtle as the method and instruction dimension, nor as coarse as the bsbiff comparison. In the class dimension, the best balance between time and space can be achieved. Based on this principle, we took a different approach and implemented a completely different full dex replacement solution.

Sophix also uses the technology of full-scale synthesis of dex, which is derived from Atlas, the plug-in framework of Taobao Mobile. It directly uses the original class search and synthesis mechanism of Android to quickly synthesize a new full-scale dex. In this way, we do not need to deal with the situation where the number of methods exceeds the limit during synthesis, nor do we need to perform destructive reconstruction on the structure of dex.

As can be seen from the figure, we have rearranged the order of dex in the package. In this way, when the virtual machine searches for classes, it will first find the classes in classes.dex, and then classes2.dex and classes3.dex. It can also be regarded as a class instrumentation solution at the dex file level. This method is very clever. It breaks and reorganizes the order of classes.dex in the old package and the patch package, and finally allows the system to naturally recognize this order to achieve the purpose of class coverage. This will greatly reduce the cost of synthetic patches.

Resource repair

In the process of Android hotfix, not only the wrong code needs to be fixed, but also the resource files need to be fixed. Currently, the resource hotfix solutions on the market are basically based on the implementation of Instant Run. The Instant Run implementation process is roughly divided into two steps:

  1. Construct a new AssetManager and add the complete new resource package to the AssetManager using addAssetPath through the reflection bar. This will give you an AssetManager containing all the new resources.
  2. Find all the places that previously referenced the original AssetManager, and replace the references with AssetManager through reflection

This method of sending a complete package takes up a lot of space. Some solutions first make a difference in the resource package, synthesize the complete package at runtime, and then load it. This does reduce the size of the package, but there are more synthesis operations at runtime, which consumes runtime time and memory. The synthesized package is also a complete package and still takes up disk space.

so library repair

The repair of the so library is essentially the repair and replacement of native methods. We know that in JNI programming, native methods can be registered dynamically or statically. Dynamically registered native methods must implement the JNI_OnLoad method and a JNINativeMethod[] array. Statically registered native methods must be in the format of Java+full class path+method name.

The dynamically registered native method mapping is completed by calling the JNI_OnLoad method during the loading of the so library. The statically registered native method mapping is completed when the native method is executed for the first time, of course, the premise is that the so library has been loaded.

We use a similar class repair reflection injection method. Inserting the path of the patched so library to the front of the nativeLibraryDirectories array can achieve the goal of loading the so library as the patched so library instead of the original so library directory, thus achieving the purpose of repair.

<<:  APICloud launches the "100 million yuan profit sharing plan", and the API ecosystem realizes the full integration of mobile development technologies

>>:  Google merges Android Pay and Google Wallet into new "Google Pay" brand

Recommend

APP promotion: How to cultivate core users!

Why do you want to be a core user? What is the ul...

How to write the Alipay interface for the Douyin mini program?

The process of connecting Douyin to Alipay First,...

How much does it cost to develop a parent-child mini program in Dalian?

More and more businesses are paying attention to ...

Three Squirrels' business strategy and marketing logic

When it comes to the national snack internet cele...

Bai Zhiyong After Effects Full Case System Tutorial

: : : : : : : : : : : : : : : : : : : : : : : : : ...

Marketing promotion strategy, 3 strategies to create brand personalization!

How can you make your brand stand out, have lasti...

100 essential tools for new media operations!

1 Graphics and text editor ◆135 Editor www.135edi...

Strategy! Answers to the 4 most common Baidu search ads problems

There are too many display styles for Baidu searc...

Can marketing really do more with less money?

There are some things that you cannot think about...