A website log is a file ending with "·log" that records various raw information such as the web server receiving and processing requests and runtime errors. To be precise, it should be a server log. The greatest significance of website logs is to record website operations, such as the operation of the space and the records of access requests. Through the website log, you can clearly know which page of your website the user visited, under what IP, time, operating system, browser, and resolution display, and whether the visit was successful. The so-called website log is a record of the processing status of the server where the website is located when it accepts various requests from users. Whether it is normal processing or various errors, they will be recorded in the website log, and the file ends with . log is the extension. By analyzing the website log files, we can see the behavioral data of users and search engine spiders visiting the website. These data can enable us to analyze the user and spider preferences for the website and the health of the website. In website log analysis, what we mainly need to analyze is spider behavior. During the spider crawling and indexing process, the search engine will allocate corresponding resources to websites with specific weights. A search engine-friendly website should make full use of these resources so that spiders can quickly, accurately and comprehensively crawl valuable content that users like, without wasting resources on useless content that has abnormal access. Website log data analysis and interpretation: 1. Number of visits, dwell time, and crawl volume From these three data we can learn: the average number of pages crawled each time, the single page crawling dwell time and the average dwell time each time. Average number of pages crawled each time = total crawl volume / number of visits Single page crawling and staying = each stay/each crawling Average dwell time per visit = total dwell time / number of visits From these data we can see the spider's activity, affinity, crawling depth, etc. The higher the total number of visits, dwell time, crawling volume, average crawled pages, and average dwell time, the more popular the website is with search engines. The single-page crawling residence time indicates the website page access speed. The longer the time, the slower the website access speed, which is not conducive to search engine crawling and inclusion. We should try our best to increase the web page loading speed, reduce the single-page residence time, and allow crawler resources to crawl and include more. In addition, based on these data, we can also calculate the overall trend performance of the website over a period of time, such as: spider visit trend, residence time trend, and crawling trend. 2. Directory crawling statistics Through log analysis, we can see which directories on the website are favored by spiders, the depth of the crawled directories, the crawling status of important page directories, the crawling status of invalid page directories, etc. By comparing the crawling and inclusion of pages in the directory, we can find more problems. For important directories, we need to increase weight and crawling through internal and external adjustments; for invalid pages, we block them in robots.txt. In addition, through multi-day log statistics, we can see the effects of on-site and off-site behaviors on the directory, whether the optimization is reasonable, and whether it has achieved the expected results. For the same directory, over a long period of time, we can see the performance of pages under that directory and infer the reasons for the performance based on the behavior. 3. Page crawling In the website log analysis, we can see the specific pages crawled by the spider. Among these pages, we can analyze which pages the spider has crawled that need to be prohibited from crawling, which pages that have no value to be included, which duplicate page URLs have been crawled, etc. In order to make full use of spider resources, we need to prohibit crawling these addresses in robots.txt. In addition, we can also analyze the reasons why the pages are not included. For new articles, it is because they were not crawled and not included, or they were crawled but not released. For some pages that are not very meaningful to read, we may need it as a crawling channel. For these pages, should we add Noindex tags, etc. But on the other hand, would spiders be so stupid as to rely on these meaningless channel pages to crawl web pages? Spiders don’t understand sitemaps? I still have doubts about this. 4. Spider access IP Someone once suggested using the spider's IP segment to determine the website's downgrade. I feel this is not very meaningful because it is too hindsight. Moreover, the demotion should be judged more based on the first three data items. It is not very meaningful to judge based on a single IP segment. IP analysis is more useful for determining whether there are collection spiders, fake spiders, malicious click spiders, etc. 5. Access status code Spiders often use status codes such as 301, 404, etc. These status codes must be processed in a timely manner to avoid causing bad effects on the website. 6. Crawl time period By analyzing and comparing the crawling volume of multiple spiders per day, we can understand the active periods of specific spiders for this website at specific times. By comparing weekly data, we can see the activity cycles of specific spiders during the week. Knowing this can provide certain guidance for the update time of website content, and the previous so-called small three, small four, etc. are all unscientific statements. |
<<: How about Minsheng Life Insurance: What challenges do you often encounter as an SEO optimizer?
The concept of "private domain traffic"...
Usually when we build a website, we just do a rou...
As the hashtag challenge of the short video platf...
Let’s talk about content marketing today. In simp...
It's the end of the year again. Advertising an...
Preface: User operation is the most important par...
As a product develops over time, it accumulates m...
360 Promotion uses professional media promotion a...
When developing an e-commerce mini program , more...
Introduction: This article divides the nine commo...
Bilibili has developed over the past ten years, f...
Since Baidu Encyclopedia has a relatively high we...
There is always not enough traffic for the video,...
The popular Wang Baobao is at the center of the n...
On the Douyin platform, all Douyin users can open...