Analysis and application of WebView cache principle

Analysis and application of WebView cache principle

[[191419]]

1. Background

Nowadays, App development uses Hybrid mode to some extent. When it comes to WebView, some js files are often loaded (such as bridge.js for Native communication with WebView). These js files do not change frequently, so we hope that after loading js in WebView once, if js does not change, there is no need to initiate a network request to load it next time, thereby reducing traffic and resource usage. So what are the ways to achieve this goal? First, we have to start with the cache principle of WebView.

2. WebView cache type

WebView mainly includes two types of caches. One is the browser's built-in web page data cache, which is supported by all browsers and defined by the HTTP protocol. The other is the H5 cache, which is set by the web page developer. The H5 cache mainly includes App Cache, DOM Storage, Local Storage, Web SQL Database storage mechanisms, etc. Here we mainly introduce App Cache to cache js files.

3. Browser's built-in web data cache

1. Working Principle

The browser cache mechanism controls file cache through fields such as Cache-Control (or Expires) and Last-Modified (or Etag) in the HTTP protocol header. For the functions of these fields and the browser cache update mechanism, you can read these two articles (H5 Cache Mechanism Analysis of Mobile Web Loading Performance Optimization, Android: Hand-in-hand Teach You to Build WebView's Cache Mechanism & Resource Preloading Solution), which have detailed introductions. Below, from the perspective of my actual application, I will introduce the headers that are usually encountered in the HTTP protocol.

These two fields are used by the browser to decide whether the file needs to be cached when receiving a response, or whether a request needs to be made when a file needs to be loaded.

  • Cache-Control:max-age=315360000 means the cache duration is 315360000 seconds. If the file needs to be requested again within 315360000 seconds, the browser will not make a request and will directly use the locally cached file. This is a field in the HTTP/1.1 standard.
  • Expires: Thu, 31 Dec 2037 23:55:55 GMT, which means that the expiration time of this file is 23:55:55 on December 31, 2037. Before this time, the browser will not make another request to obtain this file. This is a field in HTTP/1.0. If the client and server time are not synchronized, it will cause cache problems. Therefore, there is the Cache-Control above. When they appear in the HTTP Response Header at the same time, Cache-Control has a higher priority.

The following two fields are used by the server to determine whether the file needs to be updated when a request is initiated.

  • Last-Modified: Wed, 28 Sep 2016 09:24:35 GMT, which means that the file was last modified on September 28, 2016 at 9:24:35. For the browser, this field will be included in the If-Modified-Since field of the Request Header when the next request is made. For example, if the browser caches a file that has exceeded Cache-Control (or Expires), then when the file needs to be loaded, a request will be made. The request header has a field called If-Modified-Since: Wed, 28 Sep 2016 09:24:35 GMT. After receiving the request, the server will compare the Last-Modified time of the file with this time. If the time has not changed, the browser will return 304 Not Modified to the browser, and the content-length must be 0 bytes. If the time has changed, the server will return 200 OK and return the corresponding content to the browser.
  • ETag: "57eb8c5c-129", this is the file's signature string. Its function is the same as Last-Modified above. However, the next time the browser requests, ETag is sent to the server as the If-None-Match: "57eb8c5c-129" field in the Request Header. The server compares the signature string of the original file with that of the original file. If they are the same, 304 Not Modified is returned. If they are different, 200 OK is returned. When ETag and Last-Modified appear at the same time, as long as any one of the fields is in effect, the file is considered to have not been updated.

2. How to set WebView to support the above protocols

From the above introduction, we can know that any mainstream and qualified browser should be able to support these fields at the HTTP protocol level. This is not something that we developers can modify, nor is it a configuration that we should modify. On Android, our WebView also supports these fields. However, we can set the Cache Mode of WebView through code to make the protocol effective or invalid. WebView has the following Cache Modes:

  • LOAD_CACHE_ONLY: Do not use the network, only read local cache data.
  • LOAD_DEFAULT: Decide whether to fetch data from the network based on cache-control.
  • LOAD_CACHE_NORMAL: Deprecated in API level 17, and works the same as LOAD_DEFAULT mode starting from API level 11
  • LOAD_NO_CACHE: Do not use cache, only get data from the network.
  • LOAD_CACHE_ELSE_NETWORK: Use the data in the cache as long as it is available locally, regardless of whether it is expired or no-cache. Obtain data from the network only when there is no cache locally.

The sample code for setting the Cache Mode of WebView cache is as follows:

  1. WebSettings settings = webView.getSettings();
  2. settings.setCacheMode(WebSettings.LOAD_DEFAULT);

Many people on the Internet say that you should choose the Cache Mode according to the network conditions. When there is a network, set it to LOAD_DEFAULT, and when there is no network, set it to LOAD_CACHE_ELSE_NETWORK. But in my business, the updates of js files are all non-overwriting updates, that is, every time you change the js file, the URL address of the file will definitely change, so I hope that the browser can cache the js and keep using it, so I will only set it to LOAD_CACHE_ELSE_NETWORK. Of course, if you can change the Cache-Control field of the cdn server of js, that's fine, just use LOAD_DEFAULT. As for whether the file should be updated in an overwriting or non-overwriting manner, it is not what I want to discuss today. In the field of web front-end, this is a topic that can be discussed.

Regarding iOS WebView, my colleague discovered during actual testing that the Response Header that controls file caching is the Expires field. And iOS cannot set the Cache Mode for the entire WebView, it can only be set for each URLRequest. I will have the opportunity to learn more about iOS later.

3. Storage path in the phone

How are the files cached by the browser stored? This question has been a mystery since I came into contact with WebView. This time, due to work needs, I specially rooted two mobile phones, a Redmi 1 (Android 4.4) and a Xiaomi 4c (Android 5.1). After rooting two Nexus phones with high system versions (6.0 and 7.1) failed, I decided to take a look at where the cache of WebView is stored on 4.4 and 5.1 systems.

First of all, you don’t have to think about it, you know that these files must be in the /data/data/package name/directory. As mentioned in one of my previous blogs, this is the internal storage directory of each application.

Next, we open the terminal, use adb to connect to the phone, and then follow the commands below.

  1. // 1. Enter the shell first
  2. adb shell
  3. // 2. Enable the root account
  4. su
  5. // 3. Modify folder permissions
  6. chmod 777 data/data/your application package name/
  7. // 4. Modify the permissions of the subfolders, because the Android command line does not support recursive chmod like the -R command in Linux. . .
  8. chmod 777 data/data/your application package name/*
  9. // 5. So if you have a deeper directory hierarchy for your application, you need to chmod further...
  10. chmod 777 data/data/your application package name/*/*
  11. // 6. When the terminal prompts you that there is no such file or directory, it means that chmod is complete and all the folders and files in the internal storage can be seen. If you have a better method, please let me know. Thank you very much~
  • Android 4.4 directory: /data/data/package name/app_webview/cache/, as shown in the second folder below.

You may have noticed that the first folder is called Application Cache, we will talk about it later.

  • Android 5.1 directory: /data/data/package-name/cache/org.chromium.android_webview/, as shown in the figure below.

However, on the 5.1 system, the /data/data/package name/app_webview/ folder still exists, but the app_webview/cache folder that stores the WebView's built-in cache on the 4.4 system no longer exists (note that the App Cache directory is still there), as shown in the figure below.

In summary, the cache supported by the browser protocol of WebView is located in different locations on different system versions. Perhaps except for 4.4 and 5.1 that I have rooted, the cache of WebView in other versions of the system may also exist in different directories.

Another thing is the storage format and index format of cache files, which may be different on different phones. I have seen people online saying that there is a file called webview.db or webviewCache.db. This file is not under app_webview/cache or org.chromium.android_webview, but in /data/data/package name/database/. However, I did not see this file on my two rooted phones, and I opened all the db files under /data/data/package name/ and did not find any table storing URL records.

In fact, taking the 5.1 system as an example, I saw files called index and /index-dir/the-real-index under /data/data/package-name/cache/org.chromium.android_webview/, as well as a bunch of files named md5+underscore+number, which can also be seen in the picture above. I still have some questions about the principle of this, and I hope that professional experts can answer them.

4. H5 Cache

After talking about the built-in cache of WebView, let's talk about the App Cache in H5. This cache is controlled by the developer of the web page, not by Native, but the WebView in Native also needs us to make some settings to support this feature of H5.

1. Working Principle

When writing web page code, specify the manifest attribute to enable the page to use App Cache. Usually the HTML page code is written like this:

  1. <html manifest= "xxx.appcache" >
  2. </html>

The xxx.appcache file uses a relative path, in which case the path of the appcache file is the same as the page. You can also use an absolute path, but the domain name must be consistent with the page.

A complete xxx.appcache file generally includes three sections, and the basic format is as follows:

  1. CACHE MANIFEST
  2. # 2017-05-13 v1.0.0
  3. /bridge.js
  4.  
  5. NETWORK:
  6. *
  7.  
  8. FALLBACK:
  9. /404.html
  • The files below CACHE MANIFEST are the files to be cached by the browser
  • The files under NETWORK are the files to be loaded
  • The file below FALLBACK is the page displayed when the target page fails to load.

How AppCache works: When an HTML page with a manifest file is loaded, the file specified by CACHE MANIFEST will be cached in the browser's App Cache directory. When the page is loaded next time, the file cached by the manifest will be applied first, and then a request to load the xxx.appcache file will be sent to the server. If the xxx.appcache file has not been modified, the server will return 304 Not Modified to the browser. If the xxx.appcache file has been modified, the server will return 200 OK and return the content of the new xxx.appcache file to the browser. After receiving it, the browser will load the content specified in the new xxx.appcache file for caching.

As you can see, AppCache needs to send a xxx.appcache request every time a page is loaded to check if the manifest file has been updated (byte by byte). AppCache has some pitfalls and is no longer officially recommended, but it is still supported by mainstream browsers. The article mainly mentions the following pitfalls:

  • To update a cached file, you need to update the manifest file that contains it, even if it is just adding a space. A common method is to modify the version number in the manifest file comment. For example: # 2012-02-21 v1.0.0
  • The browser uses the cached file first, and then updates the cached file by checking whether the manifest file has been updated. In this way, the cached file may not use the latest version.
  • During the cache update process, if one file fails to update, the entire update will fail.
  • The manifest and the HTML that references it must be on the same host.
  • A list of files in the manifest file. If it is a relative path, it is the relative path to the manifest file.
  • The manifest may also be updated incorrectly, causing the cache file to fail to update.
  • Resources that are not cached cannot be loaded in cached HTML, even if there is network. For example: http://appcache-demo.s3-website-us-east-1.amazonaws.com/without-network/
  • The manifest file itself cannot be cached, and the update of the manifest file uses the browser cache mechanism. Therefore, the cache time of the Cache-Control of the manifest file cannot be set too long.

2. How to set WebView to support AppCache

AppCache support is not enabled by default in WebView. You need to add the following lines of code to set it up:

  1. WebSettings webSettings = webView.getSettings();
  2. webSettings.setAppCacheEnabled( true );
  3. String cachePath = getApplicationContext().getCacheDir().getPath(); // Use the internal private cache directory '/data/data/package name/cache/' as the storage path of WebView's AppCache
  4. webSettings.setAppCachePath(cachePath);
  5. webSettings.setAppCacheMaxSize(5 * 1024 * 1024);

Note: Both setAppCacheEnabled and setAppCachePath of WebSettings must be called.

3. Path to store AppCache

According to the API description of Android SDK, setAppCachePath can be used to set the AppCache path, but I actually tested and found that no matter how you set this path, whether it is set to the application's own internal private directory or the external SD card, it will not take effect. AppCache cache files will eventually be stored in the folder /data/data/package name/app_webview/cache/Application Cache, which can be seen in the screenshots of the Android 4.4 and 5.1 system directories above, but if you do not call the setAppCachePath method, WebView will not generate this directory. This is a bit strange to me. I guess that starting from a certain system version, in order to consider the integrity and security of the cached files, the SDK set the AppCache cache directory to the internal private storage when implementing it.

V. Conclusion

Similarities

The built-in cache of WebView and AppCache can both be used for file-level caching, which basically meets the needs of non-overlapping js, css and other file updates.

Differences

  • The built-in cache of WebView is implemented at the protocol layer (browser kernel standard implementation, developers cannot change it); while AppCache is implemented at the application layer.
  • The cache directory of WebView may be different on different systems; as for AppCache, although there is a way to set the storage path of AppCache, it is ultimately stored in a fixed internal private directory.
  • WebView's built-in cache eliminates the need to send HTTP requests when the cache is in effect; however, AppCache will always send a request for the manifest file.
  • The built-in cache of WebView can change the cache mechanism of WebView by setting CacheMode; while the cache strategy of AppCache is controlled by the manifest file, that is, it is controlled by the web page developer.

***In fact, in many cases, these two types of caches work together. When the manifest file does not control the loading of certain resources, for example, in the xxx.appcache file I wrote above, the * sign is used under the NETWORK section, which means that all non-cached files must be loaded from the network. At this time, these resources will go to the WebView's built-in cache mechanism. Combined with WebView's CacheMode, we actually cache these files in WebView's built-in cache. Understanding the principles of these two types of caches will help us better design our own pages and apps, reduce network requests as much as possible, and improve App operation efficiency.

<<:  Use Node.js to segment text content and extract keywords

>>:  Recruitment | How to use your fragmented time to add a duck leg to your dinner! Developer part-time recruitment started

Recommend

Android 12 is soaring in value! Android 12 brings a new wallpaper theme system

[[383475]] Recently, Google has released the firs...

Zhihu Product Analysis Report

"Where there is a question, there is an answ...

Super "charming" elementary particles, triggering a missing mystery?

Neutrinos may be the most fascinating elementary ...

Ma Fang's 12 Principles of the Workplace

Ma Fang's 12 Lectures on "12 Principles ...

Tips for placing advertising video content on Tik Tok!

This article takes the education industry as an e...

Five ways for education and training institutions to attract new customers

Customer acquisition has always been an eternal t...

2015 App Store Submission Review Guide (Part 1)

Introduction: When uploading new products to the ...

Sleeping all day long, who is the "king of sleep" in the animal kingdom?

Produced by: Science Popularization China Author:...