Web caching via HTTP protocol

Web caching via HTTP protocol

Dear, do you know what cache is?

In fact, caching is like getting a gym membership. After I spent 699 for a year's membership, I can exercise for free in the next year. On the web, we don't pay money, but space. After we spend a certain amount of space, we can get a qualitative leap in the speed of opening web pages. When we visit a page for the first time, we need to pay a certain amount of space to save the downloaded css, js, html, img and other related resources locally. On the second, third... visit, you don't need to download the file. Generally speaking, there are two ways to set up file cache. One is to set the response header file in the server, and the other is to use the h5 manifest file to make relevant settings. Let's first look at the way to set the response header in the message.

Server Cache Negotiation

There are two types of caches set up in this way, one that requires server verification and the other that does not require a request to be sent for verification.

ETag/Last-Modified

These two methods are similar in that they both send a request to the server for verification. Simply put, if you want to cache, why do you need to verify? In fact, this is a unique way of the protocol. Sending a verification is mainly to check whether the file has changed.

ETag

ETag is used to calculate whether the content of a file has changed. For example, if you delete a space in a file, it is considered a change in the file content. The usual practice is to use the md5 or SHA1 algorithm to calculate the unique value of the file. In fact, it can be done on the front end. Find an md5 algorithm for file parsing, and then pass the file in to get the ETag value. However, here, our focus is not on asking you to generate Etag, but to see the important role of ETag in caching. ETag is a method of HTTP/1.1A, generated by the Web server and written into the response header.

  1. //response Headers
  2. ETag: "751F63A30AB5F98F855D1D90D217B356"  

Then, when it reaches the browser, it is cached locally. The next time you open the same article, it will send an If-None-Match in the request header to the server to check if the file has changed. If not, it tells the browser to use the local file, otherwise it returns the new file.

  1. //request Headers
  2. If-None-Match: "751F63A30AB5F98F855D1D90D217B356"  

Normally, the server turns on Etag by default, but in order to prevent your colleagues or backend brothers from turning off Etag due to incorrect backend configuration files, you need to make some settings in the configuration files. Here I take Nginx as an example: Open the ngnix.conf file and check whether there is the following statement:

  1. etag off ;
  2. more_set_headers -s 404 -t 'ETag' ;
  3. more_clear_headers 'Etag' ;

If there is, delete it. Then restart nginx. The reason why they turned off Etag is actually very simple, that is, after turning on Etag, it will increase the load of the server and cause performance limitations. Therefore, turning off or turning on Etag must be weighed.

Last-Modified

This is different from document content information verification. Here, the date verification method is used. That is, the server will mark the file with a file modification date, and then the client accepts the date. The next time the request is made, the date is returned and the server verifies. If the date has not changed, the browser is told to use the local cache. That is, in the corresponding header of the server, Last-Modified can be set to enable this cache protocol.

  1. //Response Header  
  2. Last -Modified:Tue, 03 Mar 2015 01:38:18 GMT

After receiving this response header, the browser will cache the file and save the date. When it is requested next time, the date will be passed in and verified through If-Modified-Since:

  1. If-Modified-Since:Tue, 03 Mar 2015 01:38:18 GMT

If the date has not changed, tell the browser to use the cache. So how should we usually enable this function on the server? By default, the server will send the Last-modified tag to static resources. However, it should be noted that the update time of Last-Modified can only be counted in seconds. If you change the file too frequently, Last-Modified will be invalid (but who is so awesome that they can update the file multiple times within 1 second~) In fact, we usually don't use the Last-Modified tag alone, but usually combine it with expires to form a downgradable cache.

Expires/Cache-Control

The biggest difference between the Expires/Cache protocol and the above verification protocol is that it can omit the verification request step, does not require server verification, and directly uses the local cache. This method is usually suitable for stable projects with few version iterations.

Expires

An absolute time for Expires can be set on the server side.

  1. //Response Headers
  2. Expires:Tue, 03 May 2016 09:33:34 GMT

This tells the browser that before May 3, 2016, it can directly use the cached copy of the text. However, there may be some bugs due to the difference in GMT time between the server and the client. Therefore, it is only recommended to use it in the case of long-term caching. Otherwise, you should choose Cache-Control. How to set it on the server side? Here is an example of nginx:

  1. location ~* \.(?:css|js)$ {
  2. expires 1d;
  3. access_log off ;
  4. add_header Cache-Control "public" ;
  5. }

By setting the expiration time to one day through expires, the server will add one day based on the current time. At the same time, add the Expires and Cache-Control header tags. That is, the resulting Response Header is:

  1. Expires: Fri, 28 Feb 2014 10:42:09 GMT
  2. Cache-Control: max -age=86400 //24*60*60

(HTTP stipulates that if max-age and expires appear, max-age overrides expires by default) When expires is a negative number, it means no-cache, and a positive number or zero means max-age=time. If you don't want to cache, you can set it directly:

  1. expires -1; // forever expires, Cache-Control: no -cache

Cache-Control

This should be a new tag added by HTTP1.1 to solve the bug of expires time difference in HTTP1.0. It has many configuration items, which can actually completely replace expires (most servers now support it). Quote the original words:

The Cache-Control header is defined in the HTTP/1.1 specification and replaces the headers (such as Expires) that were previously used to define response caching policies. All current browsers support Cache-Control, so using it is sufficient.

However, most servers currently add both, because HTTP stipulates that if Cache-Control and expires appear at the same time, expires will be overwritten by default. At this time, the returned response code is no longer 304 (file unchanged), but 200 (resource successfully accessed).

Currently, before sending each request, the browser will check whether there is a backup of the corresponding file in the cache system. If so, it will directly imitate a Response header from the local system. Now that the theoretical knowledge is laid out, let's take a look at what attributes can be configured in cache-control (the following attributes are all after cache-control)

  • public: shared cache, can be cached by cache proxy servers, such as CDN
  • private: Private cache, cannot be cached by public cache proxy servers, but can be cached by user proxies such as browsers.
  • max-age=[seconds]: indicates that the cache is fresh within this time range and does not need to be updated. Similar to the Expires time, but this time is relative, not absolute. That is, the cache is fresh within a certain number of seconds after a request is successful.
  • s-maxage=[seconds]: Similar to max-age, except only applies to shared caches (such as proxies).
  • no-cache: This does not mean no caching, but it just forces a request to be sent to the origin server for verification before using the cache to check if the file has changed (in fact, this is not much different from ETag/Last)
  • no-store: Disable caching and prevent the browser from retaining a cached copy
  • must-revalidate: tells the browser that you must revalidate the information to see if it is expired. The returned code is not 200 but 304.
  • proxy-revalidate: Similar to must-revalidate, except that it only applies to proxy caches.

For example, here I can set Cache-Control to:

  1. //Response Headers  
  2. Cache-Control:private, max -age=0, must-revalidate

This file is a private file and can only be cached by the browser, not by the proxy. max-age indicates that the cache expires immediately, which is actually not much different from no-cache. Then must-revalidate tells the browser that you must verify whether the file has expired. For example, you may verify Last-Modified or ETag next. If it has not expired, use the local cache. In fact, the above can be directly equivalent to:

  1. //Response Headers  
  2. Cache-Control:private, no -cache

Results of using no-store

  1. //Response Headers
  2. Cache-Control: no -store;

This means that no matter what is different, you need to re-download it. This strongly indicates that you are not allowed to use cached files. No ETag verification will be done in the future. Of course, if you take ancient browsers like IE6 into consideration, you can just be shameless and use the following tag directly:

  1. Cache-Control: no -cache, no -store, must-revalidate //HTTP1.1
  2. Pragma: no -cache //HTTP1.0
  3. Expires: 0 //Proxy

However, there are basically no browsers that do not support Cache-Control. Therefore, under normal circumstances, you can directly use the following strategy to set it: (From google developer)

How do we usually configure the corresponding cache-control header in nginx?

  1. ##Set no -cache
  2. //Nginx
  3. expires -1;
  4. //cache-control
  5. Cache-Control: no -cache
  6.  
  7. ##Set max -age=0
  8. //Nginx
  9. expires 0;
  10. //cache-control
  11. Cache-Control: max -age=0
  12.  
  13. ##Set other headers
  14. //nginx
  15. add_header Cache-Control "no-cache" ;
  16. add_header Pragma no -cache;

The above is basically the response header of the server. What does cache-control mean in the browser's Request headers? When there is: Cache-Control: max-age=0 in the request header, it means that the cache needs to be verified (ETag||Last-Modified). If the cache is not expired, it can be used. When there is: Cache-Control: no-cache in the request header, it means that the browser can only get the *** file. It corresponds to no-store in the Response Header.

Combination of Cache Strategies

The last/ETag/Expires/Cache mentioned above are all HTTP protocol cache strategies. Of course, there are more than one kind of cache. For example, some meta defined in HTML 4.0 can also implement custom cache.

  1. <meta http-equiv= "Cache-Control" content= "no-cache, no-store, must-revalidate" />
  2. <meta http-equiv= "Pragma" content= "no-cache" />
  3. <meta http-equiv= "Expires" content= "0" />

However, the reality is that these meta can only be used in file:// local files. If it is a server, it will be overwritten by default. Currently, the mainstream is to use HTTP1.1 protocol cache, but we generally do not use any one of them alone. However, what is the effect of their combination?

If your webpage is not specially customized (private), using cache can greatly improve the performance of your website. So it is highly recommended. A website, to put it simply, is HTML+JS+CSS+fonts+img files (videos are not included). We can make some cache levels for these files.

The above is just a simple setting. You should know that HTML cannot be cached (most web pages). The cache setting time should be set after your version is stable, otherwise it will not be worth the loss. In addition, setting Cache-Control can also be used with ETag or Last-Modified for compensation verification, so that if the file changes later, it can be reflected in time.

Clear the cache

The most common way is to modify the version number of the file, or generate a random file name. If you are just testing locally and want to clear the cache manually, you can use .

But it is different on Mac, use command+R = F5 to refresh, command+shift+R = ctrl+F5 to hard reload. In addition, even if you set a cache policy, it will not cache files. These files include dynamic authentication files, such as files generated by cookie verification, input verification code, etc. POST request files cannot be cached.

<<:  When it comes to dealing with tricky bugs, what’s the difference between novices and experts?

>>:  How to re-flash iOS 11 and downgrade to iOS 10: without losing data

Recommend

Prepare for Double Eleven, direct e-commerce holiday marketing plan!

This article shares all aspects of direct-operate...

Cocos training stationed in Ledong Excellence to teach and inspire the scene

On May 15, the senior Cocos development team visi...

Regarding iOS multithreading, it is enough for you to look at me

[[142590]] In this article, I will sort out sever...

Why do bicycles also love "magnesium"?

What? Bicycles love magnesium too? It turns out t...

Where is the starfish's head? A new answer to the age-old mystery

Where is the head of a starfish? | Tuchong Creati...

How to design a popular short video script?

What is the most popular thing nowadays? Of cours...

It's not just the skin that can sense warmth and cold, but also the teeth

Author: Shurui Chen of Nagoya University The surf...

How to use data to drive operational growth?

With the advent of the data age, the previous ext...

iPhone 6 disassembly: No revolutionary changes, but easier to repair

On the evening of October 2nd, Beijing time, Bill...

Google: How Android's private computing core protects data

Google has revealed more technical details about ...