Car Home Page Performance Monitoring Construction Practice

Car Home Page Performance Monitoring Construction Practice

1. Introduction

Paying attention to user experience and improving page performance is one of the daily tasks of every front-end developer. Although it is not easy to measure the help of improving page performance to the business, it is definitely more beneficial than harmful. How to measure the performance of a page? How to help developers quickly locate the bottleneck of page performance? This has always been one of the key tasks of the front-end. This article shares some of Autohome's work in page performance monitoring construction, which mainly includes three aspects:

Technology Selection

  • Which page performance monitoring technology solutions should I choose?
  • Which indicators should be collected to objectively and comprehensively measure page performance and help R&D colleagues quickly locate performance bottlenecks while minimizing the impact on page performance?
  • How to evaluate the performance of non-homepage SPA applications?
  • How to collect and report as much data as possible without affecting page performance and ensuring the accuracy of collected data? How to choose the appropriate time to collect indicators and the reporting method?

Overall architecture design

Integrate selected technical solutions, build a systematic performance monitoring architecture, provide performance monitoring and performance analysis tool chains, and support production and research colleagues to discover and locate page performance issues in all stages of DevOps.

Establishing a judging system

Only with data can we measure; only with scores can we improve technology.

Using the numerous indicators collected, according to the application characteristics and the importance of each performance indicator, different baselines and weights are set, and the application score is obtained by weighted average. The score can intuitively tell R&D colleagues whether the application page is fast or slow? Is the application performance high or low? Does it need to be improved?

The application score can only reflect the performance of a single application and is mainly used by research and development colleagues. A company has multiple departments, each department has multiple teams, and a team has multiple applications. We need performance scores at the company, department, and team levels so that leaders at all levels can intuitively understand the page performance of their teams and it is also convenient for superiors to judge the performance of subordinate teams. Therefore, we still use the weighted average algorithm to obtain team, department, and company performance scores based on the number of application PVs and application levels.

2. Technology Selection

Based on the operating environment when monitoring page performance, we divide the technical solutions into two types: Synthetic Monitoring (SYN) and Real User Monitoring (RUM).

Synthetic Monitoring (SYN)

Refers to running a page in a simulation environment to evaluate page performance. Early representative tools include the well-known YSlow and PageSpeed. With the advancement of technology, the three most mature SYN tools are: Lighthouse, WebPageTest and SiteSpeed. Although Lighthouse only supports Chrome browsers and has a high implementation cost, it has many advantages such as Google support, easy expansion, rich indicators, and scoring. It has gradually replaced WebPageTest and become the preferred tool for SYN. ​​The following uses Lighthouse as an example to introduce the operation process, advantages and disadvantages of SYN.

Operation process

From the running results page, in addition to outputting key performance indicator values ​​and scores, Lighthouse also provides us with optimization suggestions and diagnostic results. Version 10.1.0 of Lighthouse has built-in 94 and 16 performance and best practice specifications or suggestions, including some that are less noticed in daily research and development and are more meaningful, such as: minimizing main thread work (mainthread-work-breakdown), web pages have blocked recovery round-trip cache (bf-cache), and reducing unused JavaScript in js files (unused-javascript).

It is recommended to run Lighthouse using Node Cli or Node Module, and output results in both html and json formats. The data in json is more comprehensive, including details such as the largest-contentful-paint-element, long-tasks, etc.

Pros and Cons

According to our practice, Lighthouse has the following advantages and disadvantages:

improve

In response to the above shortcomings and product needs, we have made some improvements:

To solve the problem that there is no default benchmark environment and the same page runs on different user terminals, resulting in different results due to different operating environments and hardware resources, we have made two improvements:

First, provide a SYN benchmark operating environment. Use the self-developed Web version of the SYN service of the Lighthouse Node Module and deploy it in the container. By adding a queue strategy on the Node server, it is ensured that only one SYN task is allowed to run in a single container at any time, and the hardware resources (4 cores + 4G) and network speed configuration (M-side applications use a unified 10M network speed) of each container are the same, thus ensuring that the operating results and final scores are relatively fair and reliable.

Secondly, it supports running SYN tasks in a scheduled task interval of 6, 12 or 24 hours. It counts the AVG and TP values ​​of the indicators of multiple runs and eliminates the deviation of the results of a few abnormal runs.

  • Regarding the problem of slow operation and high resource usage, we believe that if the same page has not been revised, there is no need to test it too frequently. We recommend that you add a scheduled task to run once every 12 hours or more for important pages with large PV volume. This can objectively reflect the performance of the page and save resources.
  • Integrate SYN into your CI pipeline and use SYN as a tool for pre-launch page performance testing or competitive product comparison.

Applicable scenarios

  • Use SYN as a page performance testing tool and integrate it into the front-end monitoring backend application, QA suite and CI. It is recommended that R&D colleagues use SYN to evaluate page performance before delivering the page and improve page quality defects and delivery quality based on optimization suggestions and diagnosis results.
  • Use SYN to compare competing products.
  • Using SYN as the preferred tool for analyzing slow pages captured by RUM, combined with Chrome DevTools, most problems can be located in practice.

How to use

Summarize

SYN has low implementation costs and is easy to unify standards. Compared with RUM, it is less affected by the runtime environment and has more comparable and reproducible results. It is an important part of performance monitoring. Based on Lighthouse, with the help of container orchestration technologies such as K8S, quickly building a SYN Web service that provides a benchmark environment is the first stage of building a page performance monitoring system. This stage focuses on providing key functions such as evaluating page performance and analyzing slow pages. Although Lighthouse still has two problems: it only supports Google Chrome, is not representative enough, and cannot truly reflect the performance of real user terminals, its advantages outweigh its disadvantages and can be used as the preferred solution for SYN. ​​At the same time, we will solve these two problems by adding another technical solution: real user monitoring (RUM).

Real User Monitoring (RUM)

As the name implies, it refers to monitoring the real user terminal (browser) and collecting the real performance indicators when the user runs the page. There are two main technical solutions in the industry:

Relying on the W3C organization and widely supported by browser manufacturers, the technical solutions: PerformanceTiming and PerformanceNavigationTiming tend to measure the time consumption of each node and stage when the page is running from the perspective of the browser processing process.

Each factory develops its own technical solutions based on actual needs. Google web-vitals is the best representative among them. It uses more easy-to-understand indicators to show page performance from the perspective of user experience.

In addition to the two common technical solutions mentioned above, a few commercial front-end monitoring service providers, in addition to supporting W3C and web-vitals, also provide a few custom performance indicators, such as: FMP in Alibaba ARMS, SPA_LOAD in Byte WebPro. SPA_LOAD is used to evaluate the performance of SPA non-homepage pages, which is quite innovative and will be mentioned later.

Technology Selection

Technology selection mainly solves two problems: 1) Which of the two W3C specifications, PerformanceTiming and PerformanceNavigationTiming, should be used as the main one? 2) How should the W3C Timing specification and web-vitals collaborate?

Which of the two W3C specifications, PerformanceTiming and PerformanceNavigationTiming, should be used as the main standard?

PerformanceTiming: It has been abandoned by the latest W3C standard, but it is still supported by current mainstream browsers, and is well supported by old browsers with high compatibility.


PerformanceNavigationTiming : The latest standard was introduced in 2019 with Navigation Timing Level 2, which aims to replace Navigation Timing Level 1, which covers PerformanceTiming.

Changes:

  • Integrate PerformanceTiming and PerformanceNavigation functions.
  • Abandon the domLoading node, which has insufficient guidance due to different implementations among browser vendors.
  • Add ServiceWorker related nodes.
  • Each attribute node time uses high-precision relative time with startTime as the starting point.

advantage:

  • Use high-precision relative time to avoid inaccurate subsequent node values ​​due to changes in the user-side system time.
  • Support ServiceWorker related statistics.

shortcoming:

  • Insufficient browser compatibility.

in conclusion:

According to Can I Use statistics, the compatibility of the two is only 2.67% different. However, from the perspective of our actual user distribution, the use of PerformanceNavigationTiming reduces user compatibility by 12%, which is unacceptable. Therefore, we mainly use PerformanceNavigationTiming. If the browser does not support it, we use PerformanceTiming. The data formats of the two are not much different. We ignore the domLoading node in PerformanceTiming and assume that the browser is incompatible with the workStart node when using PerformanceTiming.

How should the W3C Timing specification and web-vitals work together?

PerformanceNavigationTiming (hereinafter referred to as Timing): Based on W3C specifications, it focuses on measuring page performance from the perspective of browser processing, hereinafter referred to as Timing.

advantage:

  • Good browser compatibility Good browser compatibility.
  • Rich data and comprehensive indicators. It includes the time consumption of each stage, such as: Unload, Redirect, DNS, TCP, SSL, Response, DomContentLoadedEvent and LoadEvent; it also supports displaying the time consumption when the page runs to each node, such as: workStart, fetchStart, requestStart, domInteractive and domComplate; it can also calculate the time consumption of DCL (DomContentLoaded), window.load event or PageLoad (page loading) according to the given node value.

shortcoming:

  • Lack of key indicators. Although there are many and comprehensive indicators, they are not intuitive enough and it is difficult to express the user experience effect.

web-vitals: Currently contains 6 indicators: TTFB, FCP, LCP, FID, INP and CLS, among which FID will be replaced by INP. TTFB, FCP and LCP reflect page loading performance, FID and INP represent page interaction experience, and CLS represents page visual stability. Only 6 indicators can support the evaluation of page loading, interaction and visual stability. However, some indicators of web-vitals are still derived from the Largest Contentful Paint, Layout Shift, Performance Event Timing and Performance Paint Timing specifications formulated by W3C, but with better compatibility and stronger integrity.

advantage:

  • The indicators are simple, concise and easy to understand.
  • It comes with a baseline that can judge the performance of a page based on indicators.

shortcoming:

  • Insufficient browser compatibility, especially on IOS.
  • LCP can be forged. Method: Add a large-size white background image to the page. The loading time of the image is likely to be the LCP value, but the LCP value has no business significance.
  • Limited by the LCP and CLS principles, there are certain requirements for the timing of collecting indicators, which will be introduced in detail later.

Request

A true, objective and comprehensive measure of page performance.

in conclusion

Collecting both timing and web-vitals data at the same time brings the following benefits:

  • With rich indicators and comprehensive data, Timing can be used to reflect the processing time of each node and each stage from the perspective of the browser, and web-vitals can be used to directly express the user's visual experience. It is recommended to first directly judge the page performance through web-vitals, and then further analyze it through Timing, so as to achieve comprehensive consideration and comprehensive analysis, and reduce the misjudgment of page performance due to insufficient browser compatibility, LCP fraud, etc.
  • Combining Timing and web-vitals data makes it easier to locate problems. For example, if the TTFB collected by web-vitals is slow, you can use Timing to locate the specific stage of the slowness, such as Unload, Redirect, DNS, TCP, or SSL.
  • Solve the problem of insufficient compatibility of old browsers with web-vitals. If the browser does not support web-vitals, the page performance can be judged by DCL, window.load event or PageLoad (page loading) time.

Section: RUM technology selection, collect PerformanceNavigationTiming and web-vitals at the same time. If the browser is not compatible with PerformanceNavigationTiming, use PerformanceTiming instead.

What indicators are collected

Our requirements are: to measure page performance objectively and comprehensively without affecting page performance as much as possible, and to help R&D colleagues quickly locate performance bottlenecks. Specifically, there are three requirements: 1) The indicators must be comprehensive and objective; 2) The bottlenecks of slow pages can be found; 3) Under the premise of meeting the first two requirements, the page performance should not be affected as much as possible.

Indicators should be comprehensive and objective

First, we collected six web-vitals indicators, and the results are as follows:

Google plans to replace FID with INP in 2024. FID reflects the delay time of the first interaction, and INP represents the longest delay time of all interactions. We believe that both FID and INP have their own usage scenarios, and it is not contradictory to keep them at the same time.

Secondly, we processed PerformanceNavigationTiming, and the results are as follows:

Unlike the W3C example diagram below:

The reason is:

  • In the actual page running process, each stage does not necessarily run in series as shown in the above figure. There is a case where responseEnd takes longer than domLoading.
  • The HTTP Cache phase has no start and end time nodes, and can only be shown to occur between the fetchStart and domainLookUpStart nodes.
  • The ServiceWorkerInit, ServiceWorkerFetchEvent and Request phases only have start nodes but no end nodes, so the phase time consumption cannot be counted. For the Request phase, responseStart cannot be used as the end node because the content is transmitted in frames on the network, and the entire page content may not be transmitted at once.
  • The Processing stage starts with domInteractive, which does not conform to objective laws. When the page executes to domInteractive, DOM is ready, so the Processing stage cannot represent the page processing process.

So we combine the W3C sample diagram to show the actual page operation process in the form of points, segments and lines:

  • Point: refers to the node without a termination node, including workStart, fetchStart, requestStart, domInteractive and domComplete, represented by a white dot.
  • Segment: refers to the processing stage with real start and end nodes, such as the unload stage value: unloadEnd - unloadStart. Similarly, redirect, DNS, TCP, SSL, Response, domContentLoadedEvent and loadEvent are represented by blue bars.
  • Line: Contains events triggered during page loading, such as DCL (DomContentLoaded) and window.load. In addition, we customize the PageLoad event to indicate the time taken to load the entire page, with a value of loadEventEnd-loadEventStart. This is indicated by a yellow bar graph.

Segment and line specific algorithms:

  • UnloadEvent = unloadEventEnd - unloadEventStart
  • Redirect = redirectEnd - redirectStart
  • DNS = domainLookupEnd - domainLookupStart
  • TCP = connectEnd - connectStart
  • SSL = connectEnd - secureConnectionStart
  • Response = responseEnd - responseStart
  • loadEvent = loadEventEnd - loadEventStart
  • DCL = domContentLoadedEventStart-startTime
  • WindowLoad = loadEventStart-startTime
  • PageLoad = loadEventEnd - startTime

In addition, in order to more comprehensively reflect the page performance, the following inputs are also collected and counted:

  • There is a small probability (1%) of collecting complete PerformanceEntry data. PerformanceEntry includes data on LargestContentfulPaint, LayoutShift, PerformanceEventTiming, PerformanceLongTaskTiming, PerformanceNavigationTiming, PerformancePaintTiming, PerformanceResourceTiming, PerformanceServerTiming, etc. It includes not only the performance indicators of the page itself, but also covers information on resources, networks, caches, JS long blocking tasks, slow execution events, etc., which is very helpful for evaluating page performance and determining the bottleneck of slow pages. Considering that most pages have a lot of content, the number of PerformanceEntry collections is too large. If 100% of them are collected and reported, it will have a great impact on bandwidth, storage, and query performance, so it can only be collected with a small probability.
  • The page navigation type is taken from PerformanceNavigationTiming.type, which determines whether the page is loaded for the first time or refreshed and reloaded.
  • By resource type, count the number of resources, total transmission volume, and total time consumed.
  • By domain name, count the number of domain name resources, total transmission volume and total time consumed.

Can find the bottleneck of slow pages

We refer to the Lighthouse 50-point line and HTTP Archive site statistics, and formulate slow page standards based on the application type, whether it is PC or M. The specific thresholds are as follows:

For slow pages, in addition to collecting the PerformanceNavigationTiming, web-vitals, and low-probability complete PerformanceEntry data and statistical data mentioned in the previous section, we also collect:

  • TOP N slow resources: Method: Take the N PerformanceEntries of PerformanceResourceTiming type with the largest duration values.
  • Long tasks refer to tasks that occupy the UI thread for more than 50 milliseconds. Method: Collect all PerformanceEntry of PerformanceLongTaskTiming type. However, most browsers currently cannot provide the script address (containerSrc) and method name (containerName) of the long task. Collecting this part of data can only determine whether the long task occurs and the number of times it occurs. The content of PerformanceLongTaskTiming is as follows:

  • Slow events refer to interactive events whose processing time exceeds 104ms. The method is to collect all PerformanceEntry of PerformanceEventTiming type.
  • The number of page redirects, the value is: PerformanceNavigationTiming.redirectCount, which can help analyze why the Redirect phase takes a long time.
  • When the CLS and LCP values ​​are greater than the slow page threshold, their associated elements are recorded.

Does not affect page performance

RUM must be implemented by invading the page and introducing the JS SDK on the page, which inevitably affects the page performance. As a tool for discovering and analyzing page performance, it should not aggravate page performance issues. In order to minimize the performance impact, we did two things:

Asynchronous loading of JS SDK: The page only introduces a small JS header file with a single function. After the page reaches the DomContentLoaded event, the JS main file with complete functions is asynchronously loaded in a dynamic script mode.

Reduce bandwidth usage:

  • Sampling reporting: Slow pages must be reported, and non-slow pages are reported by sampling. The default sampling ratio is 30% to reduce the number of reports.
  • Reduce the volume of reported data: The full amount of PerformanceEntry data can fully reflect the page performance. Many pages often exceed the white-bar PerformanceEntry data, and the volume is too large, so only the full amount of PerformanceEntry data can be collected with a small probability to calculate and collect the statistical data of PerformanceEntry.

To sum up, we collect two main categories of indicators: PerformanceEntry and web-vitals.

How to evaluate the performance of non-homepage SPA applications?

By collecting the above indicators, we can objectively and comprehensively evaluate the page performance of regular pages and SPA application homepages. However, SPA application non-homepages are not browser-standard. During the SPA route switching process:

  • The browser will only execute the History.replaceState() method and will not and should not regenerate the PerformanceNavigationTiming data.
  • Most PerformanceEntry data contains the performance indicators of all route switching pages since the SPA homepage. Taking PerformanceResourceTiming as an example, it is impossible to get the timing of the actual route switching through the History.replaceState() method. Excluding the historical data in the PerformanceResourceTiming collection, the PerformanceResourceTiming data of the current route can be obtained. Because most front-end frameworks will first execute their internal logic during the SPA route switching process, and then trigger the History.replaceState() method, the triggering time of the History.replaceState() method is later than the execution time of the route switching.
  • web-vitals does not currently support collecting performance indicators after SPA non-home page route switching.

Therefore, SPA non-homepage cannot or cannot accurately obtain the above indicators , so we will not evaluate the performance of SPA non-homepage for the time being.

In view of the difficulty in evaluating the performance of non-homepage SPA in the industry, Byte WebPro creatively launched the concept of SPA_LOAD . The basic logic is: starting from triggering the history.replaceState() method, using MutationObserver to monitor DOM changes, resource loading, request sending and other change events to find the time when a page reaches a stable state as the end point, and by calculating the time between the start and end points, the performance of non-homepage SPA pages is measured. SPA_LOAD is similar to the regular page onload event, but the start time is later than the actual route switching time. By the end time, the page may have been loaded. It is slightly insufficient, but it is the best solution at present, and we may introduce it later.

Timing of collecting indicators

Only by collecting real and accurate indicator values ​​can we truly reflect the performance of the page. Otherwise, it may mislead production and research colleagues and make them misjudge the actual page performance. Therefore, there are several principles for choosing the timing of collecting indicators:

  • When it comes to standards, the most important thing is to ensure that the indicators are accurate. If they are inaccurate, it is better not to adopt them.
  • More samples, more reliable reporting. For some indicators, such as CLS, INP, PerformanceEventTiming, etc., the later the data is collected, the more accurate the value will be. However, the later the data is collected, the less time is left for reporting, and the greater the probability of data reporting failure. In order to collect as many data samples as possible, we cannot wait until the page is closed to collect indicators and submit reports.
  • Fairness: For some indicators, such as CLS, INP, and LCP, the values ​​may increase as the page is opened longer. For these indicators, we can only ensure that there are "more samples" of data and choose a relatively reasonable collection time that is fair to "all projects".
  • For some indicators in web-vitals, such as LCP, CLS, and INP, each change will trigger its callback function. Google officially recommends collecting and reporting each time the indicator value changes. This processing logic makes the indicator value more accurate, but it takes up too many front-end connections, bandwidth, and CPU resources, and also seriously increases the processing difficulty of the back-end receiving program. It is not a reasonable and balanced choice. Therefore, we need to find a suitable collection time to collect and report all performance indicators at one time.

So how do we determine the collection time? We must first analyze the exact generation time of the two types of indicator data, PerformanceEntry and web-vitals:

For PerformanceEntry data, when the onload event is triggered, the page is almost loaded, and most of the indicator data in PerformanceEntry that affects the first screen loading has been generated. The ungenerated data has little impact on the evaluation of page performance, such as the loadEventEnd indicator value in PerformanceNavigationTiming. Therefore, we believe that PerformanceEntry indicators can be collected when the onload event is triggered.

The generation principles of various indicators in web-vitals are different. When the onload event is triggered:

  • TTFB and FCP indicators have been generated and will not change, so they can be collected.
  • The maximum element corresponding to the LCP is most likely already loaded, so we believe that the LCP value is most likely accurate at this time and can be collected.
  • It is impossible to determine whether the CLS value is accurate. Its calculation logic is: every 5 seconds after the page is opened is a session window, and the accumulated offset value generated in the window is the CLS value. If the CLS value of the next session window is greater than the previous session window, it will be replaced. Therefore, for the CLS indicator, it is more appropriate to collect 5 seconds after the page is opened, preferably 5 seconds or an integer multiple of it, which is fair to "all projects".
  • FID and INP cannot be determined to be accurate either, as they are generated only after user interaction, which includes events such as click, input, drag and drop, and touch. FID is the delay time of the first interaction, and INP takes the maximum delay time among multiple interaction operations. Both indicators depend on user operations, and INP may increase in value as the number of user operations increases, so it is impossible to guarantee the accuracy of these two indicators at any time.

Based on the above considerations, we believe that at least one of the following conditions must be met: triggering the onload event or opening the page for 5 seconds, in order to ensure that the PerformanceEntry or some web-vitals indicator values ​​are accurate and the collection is meaningful. In pursuit of the principle of "more samples", combined with the actual situation that the RUM SDK is asynchronously loaded after the onload event, we set a total of three collection times based on whether the page is closed normally, with the following characteristics:

In order to prevent the collected indicators from affecting page performance, we asynchronously load the RUM JS SDK. By default, the indicators in web-vitals only support asynchronous callbacks. Asynchronous loading and asynchronous callbacks may result in the inability to obtain the values ​​of each indicator during collection. Therefore, we have modified the web-vitals source code to support synchronous acquisition of the values ​​of each indicator.

Reporting method

After collecting the indicators, you need to choose an appropriate reporting method to reliably send the indicators to the backend. The reporting method includes two parts: reporting mechanism and reporting timing.

To choose a suitable reporting mechanism, first, it must meet the functional requirements, have high browser compatibility, and preferably have no restrictions on data size; second, it must be able to sense abnormal reporting requests, facilitate reporting retries, and thus improve reporting reliability; finally, the client must support setting a timeout to avoid long-term connection occupation and increased backend service pressure. There are four common reporting mechanisms: Image, XMLHttpRequest, sendBeacon, and Fetch API. Their features are as follows:


Image

XMLHttpRequest

sendBeacon

Fetch

Rationale

Create a 1-pixel, hidden img DOM, and include the report address and report content in the img's src. If the report is successful, a 200 status code is returned.

Use the browser's built-in object XMLHttpRequest to report data

Use the navigator.sendBeacon() method, which is specifically designed to send analytics data, to submit analytics data to the backend by asynchronously sending HTTP POST requests.

fetch is a modern, Promise-based API for making HTTP requests

Browser compatibility

high

high

IE is not supported

Medium to low, does not support IE

Data size limit

Less than 8K

The size limit varies from browser to browser and is subject to CDN, backend proxy and web server. The default value is usually 8K. 8K refers to the length after URI encoding.

Use POST request, no limit

Yes, some browsers are smaller than 64K

Use POST request, no limit

Perceive and report abnormalities

Partially supported.

The img onerror event is triggered when the request returns a status code of 404 or 204. The onerror event is not triggered when the status code is greater than or equal to 400, such as 400, 500, 502, and 504.

support

Not supported.

The return value of sendBeacon() can only indicate whether the browser has sent a request.

support

Configurable timeout

No, it depends on the server settings.

Can

Can't

Can

advantage

Easy to use, high compatibility

Powerful, flexible and easy to expand

No size limit, high compatibility

Easy to use, high reliability

When the page is closed, you can still send

Compared with XMLHttpRequest, it is simpler to use and more powerful.

shortcoming

The data size is limited, and it is difficult to detect and report abnormalities

The code writing is slightly complicated

Be careful about cross-domain processing

Unable to perceive reported exceptions, the function return value true, false can only represent whether the sending is successful

Same as XMLHttpRequest

Least browser compatibility

Applicable scenarios

The reported data is small and the reliability requirement is not high

Many functional requirements

It is recommended to report via POST in text/plain or application/x-www-form-urlencoded format, CORS pre-verification

Need to report data when the page is closed

Same as XMLHttpRequest, more suitable for terminals with newer browser distribution

The above conclusion: Depends on chrome114

Compared with Fetch, XMLHttpRequest has almost the same functions and higher compatibility. Compared with Image, XMLHttpRequest has the advantages of high compatibility, no data size limit, ability to sense exceptions, and set timeouts. XMLHttpRequest is the preferred reporting mechanism for sending indicator data when the page is normal (not closed). As for sendBeacon, although there are many shortcomings, it can still report data when the page is closed, and the success rate is still relatively high. It is suitable as a reporting mechanism for sending indicator data that has not been sent when the page is closed.

Since sendBeacon is selected as the reporting mechanism for sending indicator data when the page is closed, how to determine whether the page is closed? The traditional solution is to listen to the unload or beforeunload event, which has two shortcomings:

  • It cannot meet the functional requirements. When mobile users leave a page, they are more accustomed to hiding the browser rather than closing it. When the page is hidden, the unload and beforeunload events will not be triggered.
  • Performance loss. Some browsers cannot use bfcache after listening to the unload or beforeunload event, resulting in reduced page performance.

A more appropriate and modern solution is to listen to the pagehide or visibilitychange===hiden event. This solution not only avoids the two shortcomings of the traditional solution, but also has higher compatibility. For SPA projects, we do not collect non-homepage performance indicators, and triggering the History.replaceState() method is also considered leaving the page.

To summarize, the overall reporting mechanism is: 1) When the page is normal (not closed), the SDK collects performance indicator data when the collection time comes, and reports it using the XMLHttpRequest mechanism; 2) By listening to the pagehide or visibilitychange===hiden event, when the page is closed, if any of the collection timing conditions are met, the indicator is collected immediately and reported using the sendBeacon mechanism.

Overall Solution

To sum up, we have organized the RUM implementation outline as follows:

In the actual encoding process, the specific processing flow is as follows:

Pros and Cons

During the implementation process, we summarized the advantages and disadvantages of RUM as follows:

The RUM architecture is complex to design and has high implementation costs. Since it is a rigid technical requirement, we can only invest resources and work hard to do it well.

In view of the shortcomings of no diagnostic results and optimization suggestions, we can combine SYN to complement each other and use SYN to diagnose the distribution of slow page performance bottlenecks.

As for the problem of not being able to judge the performance of a page without a score, we take three steps: first, we set a standard for slow pages to determine whether a single page is fast or slow. The standard value has been described in the previous article; second, we count the AVG, TP50, TP90, and TP99 values ​​of each important indicator of the page to comprehensively evaluate the performance distribution of all requests for the page; finally, we will score the application where the page is located and directly tell the R&D staff the performance of the application. The specific approach to application scoring will be discussed in detail in the following section "Establishing a Scoring System".

3. Overall Architecture Design

The previous article deeply analyzed the characteristics, usage methods, advantages and disadvantages of SYN and RUM. We found that SYN and RUM have their own strengths and are irreplaceable. It is best to use SYN and RUM at the same time to build a systematic performance monitoring architecture, provide a performance monitoring and performance analysis tool chain, and support production and research colleagues to discover and locate page performance problems in various stages of devpos.

During the coding, building, and testing phases, R&D colleagues can use SYN to perform page performance tests to determine whether the page performance meets the standards. If there are problems with page performance, SYN can be used to diagnose performance issues and obtain optimization suggestions. This will solve the long-standing pain points of front-end pages, such as the difficulty of front-end page performance testing and the lack of standards for delivered page quality. After the application is deployed, RUM is used to collect real user page performance and evaluate real page performance. If there are still slow pages, SYN can still be used to locate the performance bottleneck of slow pages, and the diagnostic results and optimization suggestions can be used to improve optimization efficiency. In addition, SYN can also be used to compare competing products to achieve the goal of knowing yourself and your competitors and being one step ahead of your competitors.

With SYN and RUM, we can build a closed loop for online continuous optimization of slow pages, as shown in the figure above. RUM is responsible for collecting slow pages generated by real users, and automatically creates SYN JOBs for the top 10 slow pages after background storage and aggregation. After the JOB is completed, the diagnostic results and optimization suggestions are notified to R&D colleagues as alarms. R&D colleagues use SYN optimization suggestions and tools such as devtools and webpack to improve pages, deliver high-quality pages, and reduce the frequency of slow pages. This cycle is repeated and the application page performance is continuously optimized, so that the application can finally achieve the ultimate performance.

3. Establishing a Judging System

After introducing SYN and developing our own RUM SDK and collecting a lot of SYN and RUM indicator data, we will set about establishing an evaluation system to evaluate the performance of each application, team, department, and even the entire company, output a few key indicators and scores, and intuitively tell employees at all levels and roles about the page performance of their organization and peer organizations. By comparing the scores with peers, we can determine whether to optimize page performance.

When establishing the evaluation system, we adhered to the principle of highlighting key points and key indicators while also being able to comprehensively, comprehensively, objectively and truly reflect the performance of the page. Therefore, we divided the evaluation system into two parts. First, it displays the most critical performance indicators. Second, it outputs the performance scores and levels of each indicator, application, team, department and company level.

Display the most critical performance indicators

We select one indicator from web-vitals and performanceNavigationTiming that best represents page performance, namely DCL and LCP. LCP has low compatibility and can be forged. DCL can replace LCP to partially reflect the first screen performance and has high compatibility and is difficult to forge. It complements LCP and best reflects page performance.

TP90 represents the lower limit of the experience of 90% of users. Compared with AVG, TP50, and TP75, it covers and counts a wider range of users. It can also shield a small amount of dirty data generated by the page in the special network environment of the mobile phone, and is more representative.

Output ratings at all levels

Application performance rating

In order to comprehensively, comprehensively and objectively evaluate the performance of application pages, we will select representative indicators in each dimension, refer to the industry indicator distribution given by HTTP Archive, and set up different scoring baselines according to application characteristics, such as application type (PC or M-end) and end users (C-end users, B-end customers and internal employees), and calculate the scores of each indicator. Then, we set weights according to the importance of each indicator, and obtain the application performance score through the weighted average algorithm. The process is as follows:

There are several important processes involved in obtaining application performance scores: 1) screening indicators and setting weights; 2) selecting scoring algorithms; 3) establishing indicator baselines; and 4) using weighted average algorithms to calculate application performance scores. These are described below.

Indicator screening and weight setting

During this process, I mainly considered two points:

Comprehensive consideration. Page performance covers many aspects. Traditionally, only a few indicators such as first byte, white screen, and first screen time are measured. It is relatively one-sided and not objective enough. We believe that judging page performance should cover various dimension indicators, such as page loading, interactive experience and visual experience. In addition, we have also introduced the concept of slow API ratio. API request ratio refers to the request ratio of API that takes more than 3s after the page is opened. Slow API may affect the time-consuming loading of the first screen and also affect the user experience in the interaction process. In order to let R&D students pay attention to slow pages and use online SYN as a daily development performance evaluation tool, we also use SYN scores as weight indicators. The backend system will count the number of visits to the top 10 slow pages every day and automatically create SYN timing tasks. After the task is executed and analyzed, the optimization suggestions and diagnostic results are notified to R&D students. SYN scoring items include performance scores and best practice scores, both of which are percentage-based.

Highlight the key points. Improve the weight ratio of important indicators. For example, loading indicators are the most critical to measure whether the page can be used, so the weight ratio is given the largest proportion. LCP is the most important loading indicator, and the weight ratio is also increased accordingly. Since LCP itself is not necessarily completely reasonable and may be forged, when judging page loading performance, loading indicators such as DCL, FCP, TTFB, WindwLoad and PageLoad are also introduced. This practice has advantages: many indicators, wide dimensions, large angles, more comprehensive and more objective and accurate; disadvantages: increase the complexity and difficulty of the evaluation system.

Based on the above two points, we set the index filtering results and weight ratio, as shown in the figure below:

Each indicator participates in the scoring operation with its TP90 statistical value.

Select the scoring algorithm

The scoring algorithm mainly refers to the Lighthouse scoring curve model. The basic principle is: set two control points, usually TP50 and TP90, that is, get the index value points at 50 or 90 points, and then generate a log-normal curve based on these two control points. Through this curve, you can obtain the score corresponding to any index value. The following figure is the scoring curve of M-terminal LCP:

m represents the median. In the figure, the m value is 4000ms, which means that when the LCP value is 4000ms, it means that when the LCP value is 4000ms, it will get 50 points; similarly, when the LCP value is 2520ms, it will get 90 points. After setting m and p10, the scoring curve model on the right will be generated. According to this model, the score from LCP value from 0 to positive infinite can be obtained.

Establish a baseline for indicators

The index baseline is established to provide two control points for the scoring algorithm, namely the values ​​of m and p10. There are three methods for establishing:

  • Directly borrow the Lighthouse configuration. The 50-point and 90-point thresholds of the corresponding indicators of Lighthouse are m and p10 control point values, such as the indicators in web-vitals.
  • Refer to HTTP Archive statistics, use the values ​​of p10 and p75 in the statistics to m and p10 control point values. For example, performanceNavigationTiming indicators.
  • Self-built baseline. A few indicator baselines cannot be established from the above two methods, so they can only build a baseline. Specific practices: Take the data obtained by the current backend system as the sample, and use the tp75 and tp95 values ​​of the sample as m and p10 control point values. Suitable for slow API proportional indicators.

During the process of establishing a baseline, the difference in application value and R&D requirements should be considered, and targeted settings should be made according to application characteristics. First, for applications of PC and M-end types, different indicator baselines should be set. Fortunately, Lighthouse and HTTP Archive both provide configuration references for PC and M-end; secondly, according to the actual end user type of the application, targeted adjustments should be made. The C-end and B-end applications directly generate business value, and the performance requirements should be higher than those of internal applications, and the values ​​of the two control points should be lower.

Use weighted average algorithm to calculate application performance scores

Based on the index tp90 value, index baseline and scoring algorithm, the percentage-based score of the index is obtained.

Use the weighted average algorithm to find the application performance score, and the result = (Σ (indicator tp90 value × index weight)) / (Σ weight) , where: Σ represents sum.

Ratings of organizations at all levels

As for finding the performance scores of organizations at all levels, including teams, departments and companies, the organization scores will still be obtained using the weighted average algorithm based on the number of applications under their jurisdiction, application performance scores and application weights. The process is as follows:

The difficulty in finding organizational performance scores is: How to set application weights? We mainly refer to the PV interval and application level of the application. The larger the PV level and the higher the application level, the greater the weight. The specific configuration reference is as follows:

The PV value in the PV interval refers to the use of PV, rather than the real PV, and the amount of RUM data collected/sampling ratio is used.

After the above steps, we can obtain application performance scores and performance scores of organizations at all levels, such as team performance scores, department performance scores and home performance scores. The display effects are as follows:

Performance points refer to the lighthouse standard. If you score above 50 points or above, you will be considered qualified, and if you score above 90 points or above, you will be considered excellent.

IV. Conclusion

Through the construction of page performance monitoring and evaluation system, we have both original page performance data, aggregate statistics, and final score. Through ratings, statistics and original data, we open up the link to discover, locate and analyze performance problems. R&D students can intuitively judge whether the application performance needs to be optimized through ratings? If optimization is needed, then aggregate statistical data to analyze the application bottleneck points; when positioning specific bottleneck points, we can view detailed data to assist in analyzing the specific reasons for the bottleneck points; after improvement, we can view the optimization effect through statistics, which will ultimately be reflected in improving the score.

Due to space limitations, this article can only introduce the practices related to the construction of page performance monitoring and evaluation system. A complete page performance monitoring system should also include: monitoring, alarm, optimization, governance and other modules. It is not just indicators, which can measure page performance, discover performance bottlenecks, help R&D students improve optimization efficiency, and also solve the root cause. By building a series of front-end tool chains, improve the delivery process, and improve the quality of page delivery from the source through specifications, tools and processes, avoid bringing problems online and discovering performance problems before users.

V. References

vitals-spa-faq

Web Vital Metrics for Single Page Applications

Beaconing In Practice

HTTP Archive

A project that collects and analyzes website performance data aims to help web developers understand the Internet's technological trends and best practices for performance optimization.

<<:  Five-minute technical talk | Application of analytic hierarchy process (AHP) in user experience design evaluation

>>:  The media dug deep into the iOS 17 Beta 4 code and found that the new Action button of iPhone 15 Pro supports nine customization options

Recommend

You have an April Fool's Day marketing strategy package, please check it!

April Fools' Day is here again! Are advertise...

6 major points, briefly analyzing Weibo promotion methods!

Weibo is one of the most popular and active socia...

Xiaomi Mobile wants to make secondary SIM cards? Lei Jun is playing a big game

[[150227]] On the afternoon of September 22, Xiao...

How does a new media director write a promotion plan?

Today’s topic is how to write a promotion plan fo...

8 bottom navigation bar design tips extracted from mainstream overseas products

Hello everyone, I am Clippp. Today I will introdu...

Is it necessary to apply for a 400 phone number?

As for the question of whether it is necessary to...

Selling one app in two parts, this is Google's new strategy

[[133445]] For mobile software developers, softwa...

How much does a roll of 4 square wire cost?

On the market, 4 square wires are sold for 380 yu...

iPhone 11, so embarrassing!

[[276367]] Apple CEO Tim Cook The 2019 iPhone is ...