1. IntroductionPaying attention to user experience and improving page performance is one of the daily tasks of every front-end developer. Although it is not easy to measure the help of improving page performance to the business, it is definitely more beneficial than harmful. How to measure the performance of a page? How to help developers quickly locate the bottleneck of page performance? This has always been one of the key tasks of the front-end. This article shares some of Autohome's work in page performance monitoring construction, which mainly includes three aspects: Technology Selection
Overall architecture designIntegrate selected technical solutions, build a systematic performance monitoring architecture, provide performance monitoring and performance analysis tool chains, and support production and research colleagues to discover and locate page performance issues in all stages of DevOps. Establishing a judging systemOnly with data can we measure; only with scores can we improve technology. Using the numerous indicators collected, according to the application characteristics and the importance of each performance indicator, different baselines and weights are set, and the application score is obtained by weighted average. The score can intuitively tell R&D colleagues whether the application page is fast or slow? Is the application performance high or low? Does it need to be improved? The application score can only reflect the performance of a single application and is mainly used by research and development colleagues. A company has multiple departments, each department has multiple teams, and a team has multiple applications. We need performance scores at the company, department, and team levels so that leaders at all levels can intuitively understand the page performance of their teams and it is also convenient for superiors to judge the performance of subordinate teams. Therefore, we still use the weighted average algorithm to obtain team, department, and company performance scores based on the number of application PVs and application levels. 2. Technology SelectionBased on the operating environment when monitoring page performance, we divide the technical solutions into two types: Synthetic Monitoring (SYN) and Real User Monitoring (RUM). Synthetic Monitoring (SYN)Refers to running a page in a simulation environment to evaluate page performance. Early representative tools include the well-known YSlow and PageSpeed. With the advancement of technology, the three most mature SYN tools are: Lighthouse, WebPageTest and SiteSpeed. Although Lighthouse only supports Chrome browsers and has a high implementation cost, it has many advantages such as Google support, easy expansion, rich indicators, and scoring. It has gradually replaced WebPageTest and become the preferred tool for SYN. The following uses Lighthouse as an example to introduce the operation process, advantages and disadvantages of SYN. Operation processFrom the running results page, in addition to outputting key performance indicator values and scores, Lighthouse also provides us with optimization suggestions and diagnostic results. Version 10.1.0 of Lighthouse has built-in 94 and 16 performance and best practice specifications or suggestions, including some that are less noticed in daily research and development and are more meaningful, such as: minimizing main thread work (mainthread-work-breakdown), web pages have blocked recovery round-trip cache (bf-cache), and reducing unused JavaScript in js files (unused-javascript). It is recommended to run Lighthouse using Node Cli or Node Module, and output results in both html and json formats. The data in json is more comprehensive, including details such as the largest-contentful-paint-element, long-tasks, etc. Pros and ConsAccording to our practice, Lighthouse has the following advantages and disadvantages: improveIn response to the above shortcomings and product needs, we have made some improvements: To solve the problem that there is no default benchmark environment and the same page runs on different user terminals, resulting in different results due to different operating environments and hardware resources, we have made two improvements: First, provide a SYN benchmark operating environment. Use the self-developed Web version of the SYN service of the Lighthouse Node Module and deploy it in the container. By adding a queue strategy on the Node server, it is ensured that only one SYN task is allowed to run in a single container at any time, and the hardware resources (4 cores + 4G) and network speed configuration (M-side applications use a unified 10M network speed) of each container are the same, thus ensuring that the operating results and final scores are relatively fair and reliable. Secondly, it supports running SYN tasks in a scheduled task interval of 6, 12 or 24 hours. It counts the AVG and TP values of the indicators of multiple runs and eliminates the deviation of the results of a few abnormal runs.
Applicable scenarios
How to useSummarizeSYN has low implementation costs and is easy to unify standards. Compared with RUM, it is less affected by the runtime environment and has more comparable and reproducible results. It is an important part of performance monitoring. Based on Lighthouse, with the help of container orchestration technologies such as K8S, quickly building a SYN Web service that provides a benchmark environment is the first stage of building a page performance monitoring system. This stage focuses on providing key functions such as evaluating page performance and analyzing slow pages. Although Lighthouse still has two problems: it only supports Google Chrome, is not representative enough, and cannot truly reflect the performance of real user terminals, its advantages outweigh its disadvantages and can be used as the preferred solution for SYN. At the same time, we will solve these two problems by adding another technical solution: real user monitoring (RUM). Real User Monitoring (RUM)As the name implies, it refers to monitoring the real user terminal (browser) and collecting the real performance indicators when the user runs the page. There are two main technical solutions in the industry: Relying on the W3C organization and widely supported by browser manufacturers, the technical solutions: PerformanceTiming and PerformanceNavigationTiming tend to measure the time consumption of each node and stage when the page is running from the perspective of the browser processing process. Each factory develops its own technical solutions based on actual needs. Google web-vitals is the best representative among them. It uses more easy-to-understand indicators to show page performance from the perspective of user experience. In addition to the two common technical solutions mentioned above, a few commercial front-end monitoring service providers, in addition to supporting W3C and web-vitals, also provide a few custom performance indicators, such as: FMP in Alibaba ARMS, SPA_LOAD in Byte WebPro. SPA_LOAD is used to evaluate the performance of SPA non-homepage pages, which is quite innovative and will be mentioned later. Technology SelectionTechnology selection mainly solves two problems: 1) Which of the two W3C specifications, PerformanceTiming and PerformanceNavigationTiming, should be used as the main one? 2) How should the W3C Timing specification and web-vitals collaborate? Which of the two W3C specifications, PerformanceTiming and PerformanceNavigationTiming, should be used as the main standard?PerformanceTiming: It has been abandoned by the latest W3C standard, but it is still supported by current mainstream browsers, and is well supported by old browsers with high compatibility. PerformanceNavigationTiming : The latest standard was introduced in 2019 with Navigation Timing Level 2, which aims to replace Navigation Timing Level 1, which covers PerformanceTiming. Changes:
advantage:
shortcoming:
in conclusion: According to Can I Use statistics, the compatibility of the two is only 2.67% different. However, from the perspective of our actual user distribution, the use of PerformanceNavigationTiming reduces user compatibility by 12%, which is unacceptable. Therefore, we mainly use PerformanceNavigationTiming. If the browser does not support it, we use PerformanceTiming. The data formats of the two are not much different. We ignore the domLoading node in PerformanceTiming and assume that the browser is incompatible with the workStart node when using PerformanceTiming. How should the W3C Timing specification and web-vitals work together?PerformanceNavigationTiming (hereinafter referred to as Timing): Based on W3C specifications, it focuses on measuring page performance from the perspective of browser processing, hereinafter referred to as Timing. advantage:
shortcoming:
web-vitals: Currently contains 6 indicators: TTFB, FCP, LCP, FID, INP and CLS, among which FID will be replaced by INP. TTFB, FCP and LCP reflect page loading performance, FID and INP represent page interaction experience, and CLS represents page visual stability. Only 6 indicators can support the evaluation of page loading, interaction and visual stability. However, some indicators of web-vitals are still derived from the Largest Contentful Paint, Layout Shift, Performance Event Timing and Performance Paint Timing specifications formulated by W3C, but with better compatibility and stronger integrity. advantage:
shortcoming:
RequestA true, objective and comprehensive measure of page performance. in conclusionCollecting both timing and web-vitals data at the same time brings the following benefits:
Section: RUM technology selection, collect PerformanceNavigationTiming and web-vitals at the same time. If the browser is not compatible with PerformanceNavigationTiming, use PerformanceTiming instead. What indicators are collectedOur requirements are: to measure page performance objectively and comprehensively without affecting page performance as much as possible, and to help R&D colleagues quickly locate performance bottlenecks. Specifically, there are three requirements: 1) The indicators must be comprehensive and objective; 2) The bottlenecks of slow pages can be found; 3) Under the premise of meeting the first two requirements, the page performance should not be affected as much as possible. Indicators should be comprehensive and objectiveFirst, we collected six web-vitals indicators, and the results are as follows: Google plans to replace FID with INP in 2024. FID reflects the delay time of the first interaction, and INP represents the longest delay time of all interactions. We believe that both FID and INP have their own usage scenarios, and it is not contradictory to keep them at the same time. Secondly, we processed PerformanceNavigationTiming, and the results are as follows: Unlike the W3C example diagram below: The reason is:
So we combine the W3C sample diagram to show the actual page operation process in the form of points, segments and lines:
Segment and line specific algorithms:
In addition, in order to more comprehensively reflect the page performance, the following inputs are also collected and counted:
Can find the bottleneck of slow pagesWe refer to the Lighthouse 50-point line and HTTP Archive site statistics, and formulate slow page standards based on the application type, whether it is PC or M. The specific thresholds are as follows: For slow pages, in addition to collecting the PerformanceNavigationTiming, web-vitals, and low-probability complete PerformanceEntry data and statistical data mentioned in the previous section, we also collect:
Does not affect page performanceRUM must be implemented by invading the page and introducing the JS SDK on the page, which inevitably affects the page performance. As a tool for discovering and analyzing page performance, it should not aggravate page performance issues. In order to minimize the performance impact, we did two things: Asynchronous loading of JS SDK: The page only introduces a small JS header file with a single function. After the page reaches the DomContentLoaded event, the JS main file with complete functions is asynchronously loaded in a dynamic script mode. Reduce bandwidth usage:
To sum up, we collect two main categories of indicators: PerformanceEntry and web-vitals. How to evaluate the performance of non-homepage SPA applications?By collecting the above indicators, we can objectively and comprehensively evaluate the page performance of regular pages and SPA application homepages. However, SPA application non-homepages are not browser-standard. During the SPA route switching process:
Therefore, SPA non-homepage cannot or cannot accurately obtain the above indicators , so we will not evaluate the performance of SPA non-homepage for the time being. In view of the difficulty in evaluating the performance of non-homepage SPA in the industry, Byte WebPro creatively launched the concept of SPA_LOAD . The basic logic is: starting from triggering the history.replaceState() method, using MutationObserver to monitor DOM changes, resource loading, request sending and other change events to find the time when a page reaches a stable state as the end point, and by calculating the time between the start and end points, the performance of non-homepage SPA pages is measured. SPA_LOAD is similar to the regular page onload event, but the start time is later than the actual route switching time. By the end time, the page may have been loaded. It is slightly insufficient, but it is the best solution at present, and we may introduce it later. Timing of collecting indicatorsOnly by collecting real and accurate indicator values can we truly reflect the performance of the page. Otherwise, it may mislead production and research colleagues and make them misjudge the actual page performance. Therefore, there are several principles for choosing the timing of collecting indicators:
So how do we determine the collection time? We must first analyze the exact generation time of the two types of indicator data, PerformanceEntry and web-vitals: For PerformanceEntry data, when the onload event is triggered, the page is almost loaded, and most of the indicator data in PerformanceEntry that affects the first screen loading has been generated. The ungenerated data has little impact on the evaluation of page performance, such as the loadEventEnd indicator value in PerformanceNavigationTiming. Therefore, we believe that PerformanceEntry indicators can be collected when the onload event is triggered. The generation principles of various indicators in web-vitals are different. When the onload event is triggered:
Based on the above considerations, we believe that at least one of the following conditions must be met: triggering the onload event or opening the page for 5 seconds, in order to ensure that the PerformanceEntry or some web-vitals indicator values are accurate and the collection is meaningful. In pursuit of the principle of "more samples", combined with the actual situation that the RUM SDK is asynchronously loaded after the onload event, we set a total of three collection times based on whether the page is closed normally, with the following characteristics: In order to prevent the collected indicators from affecting page performance, we asynchronously load the RUM JS SDK. By default, the indicators in web-vitals only support asynchronous callbacks. Asynchronous loading and asynchronous callbacks may result in the inability to obtain the values of each indicator during collection. Therefore, we have modified the web-vitals source code to support synchronous acquisition of the values of each indicator. Reporting methodAfter collecting the indicators, you need to choose an appropriate reporting method to reliably send the indicators to the backend. The reporting method includes two parts: reporting mechanism and reporting timing. To choose a suitable reporting mechanism, first, it must meet the functional requirements, have high browser compatibility, and preferably have no restrictions on data size; second, it must be able to sense abnormal reporting requests, facilitate reporting retries, and thus improve reporting reliability; finally, the client must support setting a timeout to avoid long-term connection occupation and increased backend service pressure. There are four common reporting mechanisms: Image, XMLHttpRequest, sendBeacon, and Fetch API. Their features are as follows:
The above conclusion: Depends on chrome114 Compared with Fetch, XMLHttpRequest has almost the same functions and higher compatibility. Compared with Image, XMLHttpRequest has the advantages of high compatibility, no data size limit, ability to sense exceptions, and set timeouts. XMLHttpRequest is the preferred reporting mechanism for sending indicator data when the page is normal (not closed). As for sendBeacon, although there are many shortcomings, it can still report data when the page is closed, and the success rate is still relatively high. It is suitable as a reporting mechanism for sending indicator data that has not been sent when the page is closed. Since sendBeacon is selected as the reporting mechanism for sending indicator data when the page is closed, how to determine whether the page is closed? The traditional solution is to listen to the unload or beforeunload event, which has two shortcomings:
A more appropriate and modern solution is to listen to the pagehide or visibilitychange===hiden event. This solution not only avoids the two shortcomings of the traditional solution, but also has higher compatibility. For SPA projects, we do not collect non-homepage performance indicators, and triggering the History.replaceState() method is also considered leaving the page. To summarize, the overall reporting mechanism is: 1) When the page is normal (not closed), the SDK collects performance indicator data when the collection time comes, and reports it using the XMLHttpRequest mechanism; 2) By listening to the pagehide or visibilitychange===hiden event, when the page is closed, if any of the collection timing conditions are met, the indicator is collected immediately and reported using the sendBeacon mechanism. Overall SolutionTo sum up, we have organized the RUM implementation outline as follows: In the actual encoding process, the specific processing flow is as follows: Pros and ConsDuring the implementation process, we summarized the advantages and disadvantages of RUM as follows: The RUM architecture is complex to design and has high implementation costs. Since it is a rigid technical requirement, we can only invest resources and work hard to do it well. In view of the shortcomings of no diagnostic results and optimization suggestions, we can combine SYN to complement each other and use SYN to diagnose the distribution of slow page performance bottlenecks. As for the problem of not being able to judge the performance of a page without a score, we take three steps: first, we set a standard for slow pages to determine whether a single page is fast or slow. The standard value has been described in the previous article; second, we count the AVG, TP50, TP90, and TP99 values of each important indicator of the page to comprehensively evaluate the performance distribution of all requests for the page; finally, we will score the application where the page is located and directly tell the R&D staff the performance of the application. The specific approach to application scoring will be discussed in detail in the following section "Establishing a Scoring System". 3. Overall Architecture DesignThe previous article deeply analyzed the characteristics, usage methods, advantages and disadvantages of SYN and RUM. We found that SYN and RUM have their own strengths and are irreplaceable. It is best to use SYN and RUM at the same time to build a systematic performance monitoring architecture, provide a performance monitoring and performance analysis tool chain, and support production and research colleagues to discover and locate page performance problems in various stages of devpos. During the coding, building, and testing phases, R&D colleagues can use SYN to perform page performance tests to determine whether the page performance meets the standards. If there are problems with page performance, SYN can be used to diagnose performance issues and obtain optimization suggestions. This will solve the long-standing pain points of front-end pages, such as the difficulty of front-end page performance testing and the lack of standards for delivered page quality. After the application is deployed, RUM is used to collect real user page performance and evaluate real page performance. If there are still slow pages, SYN can still be used to locate the performance bottleneck of slow pages, and the diagnostic results and optimization suggestions can be used to improve optimization efficiency. In addition, SYN can also be used to compare competing products to achieve the goal of knowing yourself and your competitors and being one step ahead of your competitors. With SYN and RUM, we can build a closed loop for online continuous optimization of slow pages, as shown in the figure above. RUM is responsible for collecting slow pages generated by real users, and automatically creates SYN JOBs for the top 10 slow pages after background storage and aggregation. After the JOB is completed, the diagnostic results and optimization suggestions are notified to R&D colleagues as alarms. R&D colleagues use SYN optimization suggestions and tools such as devtools and webpack to improve pages, deliver high-quality pages, and reduce the frequency of slow pages. This cycle is repeated and the application page performance is continuously optimized, so that the application can finally achieve the ultimate performance. 3. Establishing a Judging SystemAfter introducing SYN and developing our own RUM SDK and collecting a lot of SYN and RUM indicator data, we will set about establishing an evaluation system to evaluate the performance of each application, team, department, and even the entire company, output a few key indicators and scores, and intuitively tell employees at all levels and roles about the page performance of their organization and peer organizations. By comparing the scores with peers, we can determine whether to optimize page performance. When establishing the evaluation system, we adhered to the principle of highlighting key points and key indicators while also being able to comprehensively, comprehensively, objectively and truly reflect the performance of the page. Therefore, we divided the evaluation system into two parts. First, it displays the most critical performance indicators. Second, it outputs the performance scores and levels of each indicator, application, team, department and company level. Display the most critical performance indicatorsWe select one indicator from web-vitals and performanceNavigationTiming that best represents page performance, namely DCL and LCP. LCP has low compatibility and can be forged. DCL can replace LCP to partially reflect the first screen performance and has high compatibility and is difficult to forge. It complements LCP and best reflects page performance. TP90 represents the lower limit of the experience of 90% of users. Compared with AVG, TP50, and TP75, it covers and counts a wider range of users. It can also shield a small amount of dirty data generated by the page in the special network environment of the mobile phone, and is more representative. Output ratings at all levelsApplication performance ratingIn order to comprehensively, comprehensively and objectively evaluate the performance of application pages, we will select representative indicators in each dimension, refer to the industry indicator distribution given by HTTP Archive, and set up different scoring baselines according to application characteristics, such as application type (PC or M-end) and end users (C-end users, B-end customers and internal employees), and calculate the scores of each indicator. Then, we set weights according to the importance of each indicator, and obtain the application performance score through the weighted average algorithm. The process is as follows: There are several important processes involved in obtaining application performance scores: 1) screening indicators and setting weights; 2) selecting scoring algorithms; 3) establishing indicator baselines; and 4) using weighted average algorithms to calculate application performance scores. These are described below. Indicator screening and weight settingDuring this process, I mainly considered two points: Comprehensive consideration. Page performance covers many aspects. Traditionally, only a few indicators such as first byte, white screen, and first screen time are measured. It is relatively one-sided and not objective enough. We believe that judging page performance should cover various dimension indicators, such as page loading, interactive experience and visual experience. In addition, we have also introduced the concept of slow API ratio. API request ratio refers to the request ratio of API that takes more than 3s after the page is opened. Slow API may affect the time-consuming loading of the first screen and also affect the user experience in the interaction process. In order to let R&D students pay attention to slow pages and use online SYN as a daily development performance evaluation tool, we also use SYN scores as weight indicators. The backend system will count the number of visits to the top 10 slow pages every day and automatically create SYN timing tasks. After the task is executed and analyzed, the optimization suggestions and diagnostic results are notified to R&D students. SYN scoring items include performance scores and best practice scores, both of which are percentage-based. Highlight the key points. Improve the weight ratio of important indicators. For example, loading indicators are the most critical to measure whether the page can be used, so the weight ratio is given the largest proportion. LCP is the most important loading indicator, and the weight ratio is also increased accordingly. Since LCP itself is not necessarily completely reasonable and may be forged, when judging page loading performance, loading indicators such as DCL, FCP, TTFB, WindwLoad and PageLoad are also introduced. This practice has advantages: many indicators, wide dimensions, large angles, more comprehensive and more objective and accurate; disadvantages: increase the complexity and difficulty of the evaluation system. Based on the above two points, we set the index filtering results and weight ratio, as shown in the figure below: Each indicator participates in the scoring operation with its TP90 statistical value. Select the scoring algorithmThe scoring algorithm mainly refers to the Lighthouse scoring curve model. The basic principle is: set two control points, usually TP50 and TP90, that is, get the index value points at 50 or 90 points, and then generate a log-normal curve based on these two control points. Through this curve, you can obtain the score corresponding to any index value. The following figure is the scoring curve of M-terminal LCP: m represents the median. In the figure, the m value is 4000ms, which means that when the LCP value is 4000ms, it means that when the LCP value is 4000ms, it will get 50 points; similarly, when the LCP value is 2520ms, it will get 90 points. After setting m and p10, the scoring curve model on the right will be generated. According to this model, the score from LCP value from 0 to positive infinite can be obtained. Establish a baseline for indicatorsThe index baseline is established to provide two control points for the scoring algorithm, namely the values of m and p10. There are three methods for establishing:
During the process of establishing a baseline, the difference in application value and R&D requirements should be considered, and targeted settings should be made according to application characteristics. First, for applications of PC and M-end types, different indicator baselines should be set. Fortunately, Lighthouse and HTTP Archive both provide configuration references for PC and M-end; secondly, according to the actual end user type of the application, targeted adjustments should be made. The C-end and B-end applications directly generate business value, and the performance requirements should be higher than those of internal applications, and the values of the two control points should be lower. Use weighted average algorithm to calculate application performance scoresBased on the index tp90 value, index baseline and scoring algorithm, the percentage-based score of the index is obtained. Use the weighted average algorithm to find the application performance score, and the result = (Σ (indicator tp90 value × index weight)) / (Σ weight) , where: Σ represents sum. Ratings of organizations at all levelsAs for finding the performance scores of organizations at all levels, including teams, departments and companies, the organization scores will still be obtained using the weighted average algorithm based on the number of applications under their jurisdiction, application performance scores and application weights. The process is as follows: The difficulty in finding organizational performance scores is: How to set application weights? We mainly refer to the PV interval and application level of the application. The larger the PV level and the higher the application level, the greater the weight. The specific configuration reference is as follows: The PV value in the PV interval refers to the use of PV, rather than the real PV, and the amount of RUM data collected/sampling ratio is used. After the above steps, we can obtain application performance scores and performance scores of organizations at all levels, such as team performance scores, department performance scores and home performance scores. The display effects are as follows: Performance points refer to the lighthouse standard. If you score above 50 points or above, you will be considered qualified, and if you score above 90 points or above, you will be considered excellent. IV. ConclusionThrough the construction of page performance monitoring and evaluation system, we have both original page performance data, aggregate statistics, and final score. Through ratings, statistics and original data, we open up the link to discover, locate and analyze performance problems. R&D students can intuitively judge whether the application performance needs to be optimized through ratings? If optimization is needed, then aggregate statistical data to analyze the application bottleneck points; when positioning specific bottleneck points, we can view detailed data to assist in analyzing the specific reasons for the bottleneck points; after improvement, we can view the optimization effect through statistics, which will ultimately be reflected in improving the score. Due to space limitations, this article can only introduce the practices related to the construction of page performance monitoring and evaluation system. A complete page performance monitoring system should also include: monitoring, alarm, optimization, governance and other modules. It is not just indicators, which can measure page performance, discover performance bottlenecks, help R&D students improve optimization efficiency, and also solve the root cause. By building a series of front-end tool chains, improve the delivery process, and improve the quality of page delivery from the source through specifications, tools and processes, avoid bringing problems online and discovering performance problems before users. V. Referencesvitals-spa-faq Web Vital Metrics for Single Page Applications Beaconing In Practice HTTP Archive A project that collects and analyzes website performance data aims to help web developers understand the Internet's technological trends and best practices for performance optimization. |
ModMy announced today that it has archived its de...
[Scholars from the Saiwai] 20220521 Dragon Contro...
April Fools' Day is here again! Are advertise...
Weibo is one of the most popular and active socia...
[[150227]] On the afternoon of September 22, Xiao...
Today’s topic is how to write a promotion plan fo...
With the development of mobile Internet and the O...
Every time I write something, the two most troubl...
Hello everyone, I am Clippp. Today I will introdu...
As for the question of whether it is necessary to...
In order to improve the management efficiency of p...
Spring Security detailed explanation and practica...
[[133445]] For mobile software developers, softwa...
On the market, 4 square wires are sold for 380 yu...
[[276367]] Apple CEO Tim Cook The 2019 iPhone is ...