Scent of Romance Comics: What is a website log? How does website log help SEO?

Scent of Romance Comics: What is a website log? How does website log help SEO?

A website log is a file ending with "·log" that records various raw information such as the web server receiving and processing requests and runtime errors. To be precise, it should be a server log. The greatest significance of website logs is to record website operations, such as the operation of the space and the records of access requests. Through the website log, you can clearly know which page of your website the user visited, under what IP, time, operating system, browser, and resolution display, and whether the visit was successful.

The so-called website log is a record of the processing status of the server where the website is located when it accepts various requests from users. Whether it is normal processing or various errors, they will be recorded in the website log, and the file ends with . log is the extension.

By analyzing the website log files, we can see the behavioral data of users and search engine spiders visiting the website. These data can enable us to analyze the user and spider preferences for the website and the health of the website. In website log analysis, what we mainly need to analyze is spider behavior.

During the spider crawling and indexing process, the search engine will allocate corresponding resources to websites with specific weights. A search engine-friendly website should make full use of these resources so that spiders can quickly, accurately and comprehensively crawl valuable content that users like, without wasting resources on useless content that has abnormal access.

Website log data analysis and interpretation:

1. Number of visits, dwell time, and crawl volume

From these three data we can learn: the average number of pages crawled each time, the single page crawling dwell time and the average dwell time each time.

Average number of pages crawled each time = total crawl volume / number of visits

Single page crawling and staying = each stay/each crawling

Average dwell time per visit = total dwell time / number of visits

From these data we can see the spider's activity, affinity, crawling depth, etc. The higher the total number of visits, dwell time, crawling volume, average crawled pages, and average dwell time, the more popular the website is with search engines. The single-page crawling residence time indicates the website page access speed. The longer the time, the slower the website access speed, which is not conducive to search engine crawling and inclusion. We should try our best to increase the web page loading speed, reduce the single-page residence time, and allow crawler resources to crawl and include more.

In addition, based on these data, we can also calculate the overall trend performance of the website over a period of time, such as: spider visit trend, residence time trend, and crawling trend.

2. Directory crawling statistics

Through log analysis, we can see which directories on the website are favored by spiders, the depth of the crawled directories, the crawling status of important page directories, the crawling status of invalid page directories, etc. By comparing the crawling and inclusion of pages in the directory, we can find more problems. For important directories, we need to increase weight and crawling through internal and external adjustments; for invalid pages, we block them in robots.txt.

In addition, through multi-day log statistics, we can see the effects of on-site and off-site behaviors on the directory, whether the optimization is reasonable, and whether it has achieved the expected results. For the same directory, over a long period of time, we can see the performance of pages under that directory and infer the reasons for the performance based on the behavior.

3. Page crawling

In the website log analysis, we can see the specific pages crawled by the spider. Among these pages, we can analyze which pages the spider has crawled that need to be prohibited from crawling, which pages that have no value to be included, which duplicate page URLs have been crawled, etc. In order to make full use of spider resources, we need to prohibit crawling these addresses in robots.txt.

In addition, we can also analyze the reasons why the pages are not included. For new articles, it is because they were not crawled and not included, or they were crawled but not released. For some pages that are not very meaningful to read, we may need it as a crawling channel. For these pages, should we add Noindex tags, etc. But on the other hand, would spiders be so stupid as to rely on these meaningless channel pages to crawl web pages? Spiders don’t understand sitemaps? I still have doubts about this.

4. Spider access IP

Someone once suggested using the spider's IP segment to determine the website's downgrade. I feel this is not very meaningful because it is too hindsight. Moreover, the demotion should be judged more based on the first three data items. It is not very meaningful to judge based on a single IP segment. IP analysis is more useful for determining whether there are collection spiders, fake spiders, malicious click spiders, etc.

5. Access status code

Spiders often use status codes such as 301, 404, etc. These status codes must be processed in a timely manner to avoid causing bad effects on the website.

6. Crawl time period

By analyzing and comparing the crawling volume of multiple spiders per day, we can understand the active periods of specific spiders for this website at specific times. By comparing weekly data, we can see the activity cycles of specific spiders during the week. Knowing this can provide certain guidance for the update time of website content, and the previous so-called small three, small four, etc. are all unscientific statements.

<<:  How about Minsheng Life Insurance: What challenges do you often encounter as an SEO optimizer?

>>:  The essence of WeChat public account operation: from original to unique, from low price to priceless

Recommend

If you don’t understand these 6 points, you can’t be creative in advertising

Creativity is definitely the most frequently ment...

Chart丨Collection! Two Sessions Knowledge Handbook

Planning: Zhang Weige, Li Zhen Produced by: Qin Y...

Guangdiantong Advertising Creative Optimization Operation Guide

This article will bring you the specific steps fo...

These 20 "health care" suggestions are all wrong, don't be misled anymore!

“Drinking more porridge can nourish the stomach”,...

3 steps to event planning and promotion!

The author summarizes the "three-part theory...

Xiaomi in India: Poaching Google employees while showing goodwill to Android One

After Google teamed up with three major Indian lo...

Sony vs. Samsung: Which 55-inch 1080p TV is better?

From a business perspective, the TV business is ob...