Eight years ago, the Yahoo team did a detailed study on web cache, but with the rapid development of the Internet, the research data has changed. This article is mainly about the data collection and research of the current cache situation by the Facebook web team. Including the time when PC and mobile resources are cached and the time when the resources exist. Web cache is a very important factor in performance optimization and is worth reading. My ability is limited. If there are any translation errors, please feel free to contact me and I will correct them in time :) text: Web page loading speed is a factor that every website should pay attention to. However, it is often overlooked. Caching is a very important factor in improving website access speed (because users do not need to recalculate or download cached resources the next time they visit) Our team (Facebook Web Team) recently had a discussion about the current situation of facebook.com without caching. The main question is: at Facebook, we release two versions every day, how can we make caching more efficient? What kind of caching strategy is suitable for us? While looking for a solution, we found an article about performance research on the Yahoo Performance Optimization Research Blog. But what surprised us was that 20% of page visits were performed with an empty cache. However, this research result was 8 years ago, when IE7 was just released and jQuery had just released its first version, so we decided to re-study it to see if there is any improvement now. Re-study: In previous research, Yahoo created an HTTP header on the server to set the expiration time and last modification time of the image. If the image has not changed, a GET request is sent to the server with the last modification time information. If the image has not been modified, 304 (not modified) is returned instead of 200 (request successful). Because the server can record the request status of the browser request, Yahoo uses the server log to count the number of cached users. Like that research method, we created a PHP endpoint that can send image requests and log them in the database. This image uses http headers to control browser cache and other caches created by proxies. Then log this information when the user requests the image. The HTTP header information for this image is set as follows: However, due to some known bugs, we replaced the two properties in IE7 and IE8 with the following: When the browser sends a request for an image, two things happen: 1. Because the browser has never opened this image, there is no additional header information. The server will return a status code: 200 Success and then return the image data to the browser. The browser will then cache the Last-Modified (file last modified time) and ETag (entity value of the requested variable) in the HTTP header information of the file. 2. The browser checks the if-none-match or if-modified-since header information. If it has been opened before, it will not load the image data and directly return Status: 304 Not Modified (not updated). At the same time, we replace $now() with $header['if-modified-since'] in the Last-Moidified header information, so the content returned each time will be the same. Now the only question left was where do we use this image, we decided to include an img tag below the Facebook search bar so that the image would be rendered every time Facebook loads. On full page reload the resource would be loaded based on the cache headers. This would be the best way to test our idea. After ensuring that the endpoint can log requests normally and the image tags can be accessed normally, we officially started this research! Research findings: After several weeks of data collection, we decided to study the more valuable data of the 7-day period. The statistical results of the data still surprised us: 25.5% of the requests were still empty cache. In order to make the data look clearer, we separated the statistics of PC and mobile phones, but the data was still similar: 24.8% of PCs and 26.9% of mobile phones were empty cache. This result did not meet our expectations, so we studied the data more deeply. It may be clearer to separate the statistics of PC browsers: According to the data from the previous week: users are more likely to use Chrome and Opera to cache. You may notice that there is no data for Firefox in this chart. That's because Firefox 31 and earlier versions have an 80% cache probability in our statistics, but there is a significant drop in versions 32 and higher. That's because Firefox's cache strategy conflicts with our statistical method (http://www.janbambas.cz/new-firefox-http-cache-enabled/), so we simply removed the data statistics for Firefox. OK, now let’s look at the mobile data: As you can see, the cache ratio of most browsers is between 68% and 84%. The data on mobile platforms is quite different. We think they are all low-end mobile devices (Year class: A classification system for Android). Other than that, the data is similar to that on desktop platforms. The following figure shows the proportion of empty cache users on mobile and mobile terminals: On average, 44.6% of users have an empty cache, which is consistent with a study done by the Yahoo team in 2007. Going further: The article is not over yet. At Facebook, we iterate very quickly, releasing two versions almost every day. This drives us to think about how long the cache setting is suitable for us? We subtract the current time from the time returned by the if-modified-since file header to find the answer. So according to the above method, we counted the time from the first normal request to the occurrence of 304 request (this shows how long it took for the user to go from no cache to cache). The following is the chart generated by the data: The horizontal axis is the time value in hours, and the vertical lines P50 and P75 represent the proportion of cached requests within a certain period of time. For example, P50 tells us that 50% of requests will be cached at 47 hours. Similarly, p75 means that 75% of requests will be cached. The test data from the mobile terminal tells us that about 50% of the requests are cached within 12 hours. Practical Application: In general, our statistics are similar to those in 2007. If we do not include Firefox (version 32 and higher) in the statistics, the peak percentage of cached pages is 84.1%, higher than 80% in 2007. On the other hand, the existence time of the cache is not too long. Based on our research, although 42% of the requests will be cached 47 hours after the release of a new version, the existence time of the cached resource on the computer is also about this time. This new discovery is very meaningful for other websites. Why does the cache not exist for too long? It is easy to understand. From the development of the Internet, the size of websites has changed a lot from 2007 to now. Take 2007 for example. At that time, our home network speed was about 2.5M, and Yahoo's homepage was 168.1KB. Now my mobile phone has 8G downlink, and Yahoo's homepage has become 768KB. Now the average size of web pages on the market has exceeded 1MB, which will put a lot of pressure on the smooth operation of our browsers (Translator's note: Because there are too many resources to cache, if the default resource cache size is exceeded, some early cache files will be automatically deleted, such as IE's default is 50MB, and Chrome's is 320MB). Therefore, making proper use of browser cache makes more sense than it did eight years ago. *** Practice tells us: try to use external stylesheets and JS, set Cache-Control and ETag in headers, compress our data as much as possible, manage cache with different URLs, and split resources that need to be updated frequently. These optimization methods are not only applicable to projects of Facebook's scale, but also to other websites. Although our update frequency will have a negative impact on cache optimization, this is not the focus of this article. In fact, we have begun to use the results of this research to benefit all users who visit Facebook. |
<<: Good news for independent developers: a complete collection of development tools
>>: Seven things to know about Alibaba's new president, Michael Evans
Tesla's sales in the UK rose by more than a f...
Recently, the National Health Commission and the ...
There is an exploding pressure cooker in everyone...
In order to better penetrate into various industr...
Chinese Academy of Sciences releases blue book on...
It makes sense to integrate IoT and mobile device...
Willow catkins are "floating" again, ri...
Today I will talk about attracting new users, pre...
What I want to share with you today is the user o...
Abstract: The physical processes used to remove d...
Recently, there has been a lot of debate about Wi-...
This article takes the products of the three gian...
Since the advent of dual-SIM dual-standby mobile p...
Google has been making a series of moves in the f...
BOE will miss out on the iPhone once again - but ...