What happens when you open a web page?

What happens when you open a web page?

Entering a URL in the browser or clicking a link, the webpage opens... This is a common scene when we surf the Internet, but behind such a simple appearance, there are extremely complex technical processes. Want to learn more? Read on.

The process of an HTTP request

To simplify, let's start with an HTTP request and briefly introduce the network transmission process of an HTTP request, that is, "what happens from entering the URL to the completion of the page download."

● DNS Lookup first obtains the IP address corresponding to the URL

● Socket Connect The browser and the server establish a TCP connection

● Send Request Send HTTP request

● Content Download server sends response

If we talk about the physical layer, it would be a bit of a hooligan. If you still agree with these steps, let's talk about the performance issues involved.

● If you still remember the DNS query, think about it now. DNS Lookup is to communicate with countless DNS servers in order to obtain a string of IP addresses. How much time does this take? Don't forget that when you finish the query, you haven't communicated with the server over there yet.

● TCP connection requires three handshakes. If the server is far away, how long will it take for these three handshakes? Don't forget that you haven't sent a request yet after the connection is established. (Usually it takes 0.5 seconds to get out)

● When sending HTTP requests, you need to know that the upstream and downstream bandwidths of our network are usually different. Usually, the upstream bandwidth is smaller. It is okay if there is only one, but now web pages usually request many resources in succession. What should I do if the upstream is congested when the bandwidth is small? Don't forget that we have reached the third step. The server has not sent you a response yet, and your browser can't draw anything yet.

● Finally the server sends a response, but unfortunately the server you are visiting is very busy, tens of thousands of people need this resource, and the server’s upstream bandwidth is also limited. What should you do?

I think I came up with some pretty good interview questions. By the way, the latency of the first two steps is not greatly affected by the network bandwidth; adding bandwidth to the second two steps can alleviate the problem to a certain extent, but you need money, and it is very expensive.

Although the blogger has optimized WebKit local rendering, he knows that most of the time spent loading a web page is wasted on network communication, so optimizing these steps will be less labor-intensive and more effective than optimizing the browser kernel.

The main optimization methods in the network are caching, prefetching, compression, and parallelism. If you are asked about performance optimization in an interview in the future, you can consider it in this way.

The following will introduce the existing optimization methods in stages.

DNS Optimization

For DNS optimization, caching is undoubtedly the simplest, crudest and most effective method. When talking about caching, we must mention the cache level:

● Browser DNS cache

● System DNS cache

● Hosts file

● Cache on each DNS server

Of course, the DNS cache expiration period is usually short, and in many cases it is necessary to search again. In order to reduce the delay experienced by users (note that this is not network delay), prefetching is a good method.

For example, when you type a URL, the browser may find that you are likely to visit a certain website based on your history, so it will pre-fetch the DNS for you. For example, when you type "w", Chrome will already find the IP address of weibo.com for you. Chrome users can check chrome://predictors and you will know.

In addition, the browser will record your past history and know what other links are usually under each domain name in order to establish the topology of the website. When you visit a website under this domain name, it will perform DNS resolution on other linked domain names in advance.

TCP Optimization

Seeing how complicated the DNS optimization is, you know that this simple step is not that simple.

As a result, the optimization of TCP is actually simple, because DNS has already obtained the IP in advance, so we just need to follow the previous steps to establish the connection.

So when you type the first letter, the DNS resolves it and sets up the connection, and you may not have finished typing the URL at this time. When you first visit a website, the browser quickly sets up a TCP connection to another server for you.

HTTP transport optimization

At this point, some people may think that since the TCP connection has been established, why not just pre-fetch one step further and pre-fetch all the link content directly, so that the web page will be loaded before I finish typing the URL.

This idea is good, but the reality is cruel, because we must remember that our bandwidth is limited. DNS and TCP connections are relatively light and do not occupy too much network bandwidth, but HTTP transmission is different. If you prefetch all links, your bandwidth will be filled up quickly, so that your normal requests cannot be met, and performance will be seriously degraded.

The cache appears again, and when mentioning the cache, we must mention the hierarchy.

● PageCache is the fastest. It directly caches the DOM structure and rendering results of the existing web page in memory. That’s why it’s so fast when you click forward and backward.

● HTTP Cache File-level cache is stored on the local file system and implemented according to RFC2616.

● Proxy Cache If you use a proxy server to access the Internet, the proxy server will usually follow the cache standard

● CDN is a content server that is geographically close to you. For example, if you request an image from Taobao in Hangzhou in Beijing, and the image is available on a CDN in Beijing, you don’t have to go to Hangzhou.

● DMOC (distributed memory object caching system) CDN mainly stores static data, but there is usually a lot of dynamic data in the web page that needs to be checked in the database. The pressure will be very high when there is more traffic. Usually there will be a layer of memory cache server outside the server, which is dedicated to caching objects in these databases. According to "Taobao Technology in the Past 10 Years", it can reduce 99.5% of database access.

● Server In fact, there are not many requests that actually fall on the server.

Have you thought about where to add another layer of cache? In fact, we can add it between 2 and 3, that is, add cache on the router.

The pre-fetching engine that Xiaomi routers and Sogou cooperated with is actually equivalent to adding a layer of cache on the router and intelligently pre-fetching. Why do I write a separate paragraph here to talk about Xiaomi? Could it be Xiaomi's water army? No, it's because when the blogger saw this news, his heart sank, and it collided with the blogger's graduation project.

Last year, when 360 first launched its portable Wi-Fi, the blogger came up with this idea. He also thought that after making this product, he would start a business with it and discuss cooperation with 360. As a result, he just finished it recently and submitted the paper. He fantasized about starting a peak in life and subverting the industry. As a result, he found that Xiaomi and Sogou had launched the same product and commercialized it. The peak of life that he had promised was gone. If he had known earlier, he would have applied for a patent last year.

Another common HTTP optimization is compression. Network transmission time = message size / network speed. Since network speed is expensive, it is better to compress it. Most servers will perform gzip compression on HTTP messages. You can see it in the Http Header. I won’t go into details.

Future Protocol: SPDY

The above are all traditional practices. The following is a future technology. Since the HTTP protocol was developed in the last century and can no longer adapt well to the development of the current Web, Google proposed the SPDY protocol, which is currently a base version of the HTTP2.0 standard being specified.

SPDY has the following main features:

● Multiple HTTP connections can be connected in parallel on one TCP connection to reduce the time it takes to establish connections.

● Request priority (no specific implementation has been seen yet).

● HTTP header compression: The HTTP compression mentioned above compresses the HTTP body, but not the header. For small HTTP messages, the header still accounts for a large proportion, and there are a large number of small messages in the current web.

● Server push/hint The server actively pushes objects (you can think of it as the server helping the client to pre-fetch).

The industry currently has both praise and criticism for SPDY, and bloggers are also cautious, mainly on 1 and 4. 4 is actually the same as the contradiction of HTTP direct prefetching mentioned earlier. What if the pushed data is not needed and occupies bandwidth? There are difficulties in how to implement hints.

The first potential risk is that the TCP connection is disconnected midway, then all connections will be stopped. This situation may be less common in PC Internet, but TCP connection disconnection is still relatively common in mobile Internet.

However, as a future technology, it is still necessary to pay attention to it.

As a winner of Toutiao's Qingyun Plan and Baijiahao's Bai+ Plan, the 2019 Baidu Digital Author of the Year, the Baijiahao's Most Popular Author in the Technology Field, the 2019 Sogou Technology and Culture Author, and the 2021 Baijiahao Quarterly Influential Creator, he has won many awards, including the 2013 Sohu Best Industry Media Person, the 2015 China New Media Entrepreneurship Competition Beijing Third Place, the 2015 Guangmang Experience Award, the 2015 China New Media Entrepreneurship Competition Finals Third Place, and the 2018 Baidu Dynamic Annual Powerful Celebrity.

<<:  Le Xiaobao Story Optical Machine Disassembly Review

>>:  China Automobile Association: Domestic new energy vehicle sales in January 2024 reached 629,000 units, a year-on-year increase of 93.3%

Recommend

Panasonic plans to develop fully autonomous electric vehicles

According to a report by Kyodo News, Panasonic is...

Highlights of APP free and paid channel promotion!

Before I did it, I had heard of various methods o...

Why did a nanomaterial researcher spark a passion for track cycling?

In track cycling, what have nanoscientists done t...

How to carry out user operations in Qingyuan WeChat Mini Program?

Q: How to operate user in WeChat Mini Program? A:...

How to make users fall in love with watching ads?

No one likes to watch ads, but everyone needs to ...

The latest guide to advertising after consumption upgrade!

When placing advertisements, many advertisers pre...

ROM Features Comparison

In addition to the rich applications that smartpho...

Are black takeaway spoons poisonous? Do you still dare to use them?

Editor: Gong Zixin When you order takeout, do you...

Use "User Story Map" to split requirements and accurately define product MVP

If you are a startup and haven’t used user story ...