BackgroundWhy does Tik Tok need to continuously optimize its image capabilities?As one of the most basic capabilities of Douyin, image capabilities serve all of Douyin's businesses. With the growth of Douyin's image services such as text, e-commerce, and IM, the amount of image loading is increasing, and the corresponding image bandwidth costs are also increasing. In order to reduce image costs and improve the user's image browsing experience, it is necessary to continuously explore and optimize image capabilities, improve image loading speed, reduce overall image costs, and achieve "good, fast, and economical" images while ensuring the quality of image display. About BDFrescoBDFresco is a general-purpose basic network image framework for Android that was expanded and optimized by the veImageX team of Volcano Engine based on the open source Fresco. It mainly provides capabilities such as image network loading, image decoding, basic image processing and transformation, image service quality monitoring and reporting, self-developed HEIF software decoding, memory caching strategy, and cloud control configuration distribution. It has currently covered almost all ByteDance apps. The following will introduce what optimizations Douyin has made in the image direction based on BDFresco from the perspective of Douyin. Optimization ideasThe complete loading process of a network image is as follows: The client obtains business data through the network, and the response content includes the corresponding image data. By handing the image URL data to BDFresco for loading, the image loading process officially begins. BDFresco will determine whether the current image is in the memory cache and disk cache. If it exists, it will perform the corresponding decoding or rendering operation. If not, it will directly download through veImageX-CDN, download the image resources to the local computer, and then perform decoding and rendering operations. The image loading process not only occupies client memory, storage, CPU and other resources, but also consumes network traffic and server resources. The image loading process is essentially a multi-level cache logic. The image loading process can be divided into four core stages: memory cache, image decoding, disk cache, and network loading. Combined with the indicator monitoring system, each stage can be optimized separately:
Optimization processIndicator ConstructionBefore optimizing images, it is necessary to complete a data inventory of the overall image quality. Index construction is a crucial step. By establishing an index system, we can understand the current status of images, determine the optimization direction, and evaluate the effect of optimization. BDFresco provides log reporting capabilities. The reported image logs are cleaned by veImageX cloud data, and finally the image quality related indicators can be viewed in the veImageX cloud console. A complete data monitoring system has been established from triggering image loading to memory, decoding, disk, and network stages, covering hundreds of indicators such as loading time, success rate, client and CDN cache hit rate, file size, memory usage, and large image abnormality monitoring at each stage. Specific measures1 Memory cache optimization1.1 Memory search optimizationMemory Cache PrincipleBDFresco uses the Producer/Consumer interface to implement the image loading process, such as network data acquisition, cache data acquisition, image decoding and other tasks. Different stages are handled by different Producer implementation classes. All Producers are nested one layer at a time, and the results are consumed by Consumers. A simplified image memory cache logic is as follows: Among them, reading memory or disk cache is matched through cache key, and cache key is converted through Uri, which can be simply understood as cacheKey==uri. Douyin has previously launched an experiment to optimize cache key: for different domain names of the same resource, the host and query parameters will be removed, that is, cacheKey is simplified to scheme://path/name Optimization planWhen the business is loading images, BDFresco supports passing in a Uri array. The Uri are all the same resource and point to different veImageX-CDN addresses. In fact, the batch of Uri (ABC) will be recognized as the same cache key internally. As shown in the figure below, the three Uri of ABC are not executed completely in the order of [A full process search -> B full process search -> C full process search], but will first perform a memory cache search on ABC, and then perform a full process search on ABC in order. Since ABC are the same resource but with different domain names, the cache keys generated on the client are the same. In fact, the memory cache lookup of ABC is an invalid operation. Since this link is executed in the UI thread and there are multiple image scenes in TikTok, one slide will trigger multiple image loading logics. Therefore, some scenes will cause freezes and frame drops. By removing unnecessary memory search processes, the overall frame rate is significantly improved. 1.2 Splitting the dynamic and static image cacheThe memory cache size of Tik Tok images is configured based on the Java heap memory size, and the default size is 1/8, that is, 32M or 64M. Since Android 8, image memory data is no longer stored on the Java heap, but on the native heap. It is unreasonable to continue to use the heap memory size to configure the image memory cache size. Therefore, by multiplying the memory cache size by 2, it is hoped that decoding operations can be reduced and OOM and ANR indicators can be optimized. The stability indicators after the experiment showed that although OOM was reduced, the problem was converted into native crashes and ANR, which were significantly deteriorated. The experiment did not meet expectations. The cache hit rate of an image is positively correlated with the cache size. The larger the cache size, the higher the hit rate. However, as the cache size increases, the room for improving the hit rate becomes smaller and smaller. Based on the experimental results, simply increasing the cache size will cause the memory water level to rise, causing ANR and native crash problems, and this solution is not feasible. Currently, the memory cache of animated images and static images uses the same cache block. BDFresco's cache management uses the LRU elimination strategy. If too many animated frames are played, the static image cache is easily replaced. Switching back to the static image requires re-decoding, which is bound to result in performance loss and reduced user experience. There are many such scenarios on TikTok, such as IM and personal page mixed scenes of animated and static images. At the same time, considering that there is not much room for improving the hit rate by directly increasing the memory cache size, we try to isolate the dynamic image and static image caches. Using one memory cache each for dynamic and static images can effectively improve the hit rate and reduce decoding operations . Final experimental benefits:
2 Image decoding optimization2.1 Decoding format optimizationMemory size Image length Image width
The number of bytes per pixel is determined by the color mode Bitmap.Config, i.e. ARGB color channels, which mainly have 6 types:
Currently, Douyin mainly uses two configurations, ARGB_8888 and RGB_565. ARGB_8888 supports transparent channels and has higher color quality. RGB_565 does not support transparent channels, but the overall memory usage is reduced by half. The optimization ideas of Douyin are as follows:
2.2 heif decoding memory optimizationOptimization principle: The original logic of heic image decoding in BDFresco is to call the decoding interface of the decoder through jni, return the decoded pixel data, return it to the java layer and convert it into a Bitmap object for display. The original logic has the problem of using very large temporary objects, which will cause java memory overhead and GC. After optimization, the creation of large objects is reduced, and the Bitmap object construction is completed directly in the native layer, which is expected to reduce the time consumption of heif image decoding and improve a certain degree of fluency. Change the original heif image decoding process from: Optimize to process: Before the fix : Two large arrays are used when decoding each heic image:
After the fix: No large arrays are used in the Java layer, and only a 40K-700K native layer DirectByteBuffer array is used. This reduces the creation of two large arrays in the Java layer, reduces the probability of GC and OOM problems caused by the creation of large arrays, and thus brings fluency and ANR benefits. Experiments were conducted on TikTok, and performance-related indicators were significantly improved: Java memory usage was reduced, HEIC decoding time was reduced, and Android ANR was reduced, which significantly increased the consumer market for pictures and texts and boosted the overall usage time benefits. 2.3 Adaptive Control DecodingEarlier, we mentioned that more than 15% of the images have a waste of double size, which results in a large amount of memory being required in the decoding stage. However, such a large bitmap is not needed to be displayed on the control. We decode the image by resizing it to the control size and finally decode a small-resolution bitmap, which can maximize the decoding memory application. However, considering that image waste is mainly caused by overly large images sent by the server, simply limiting the size during the decoding stage cannot solve the problem of large images in the network stage. The problems of bandwidth waste and long network loading time are still not solved. Therefore, we have pre-migrated this stage and optimized it during the network loading stage. For specific solutions, please refer to the on-demand scaling solution in Section 4.2. 3 Disk cache optimizationBy optimizing the client's disk cache configuration to improve the cache hit rate and reduce the number of image requests, the image bandwidth cost can be reduced while increasing the image loading speed. There are three types of disk cache: main disk, small disk, and independent disk. Each disk has an upper limit and uses an LRU replacement algorithm. Currently, Douyin mainly uses main disks and independent disks. The overall process is as follows: Pictures are stored on the main disk by default, and the probability of pictures being replaced is high. If the business specifies an independent disk cacheName, the specified picture will use a separate disk, and the probability of being replaced is low.
4 Network Loading Optimization4.1 Image format optimizationCommon image formats
heic format promotionCurrently, the best supported format by the veImageX platform is the heic encoding format. However, by the beginning of 2022, the coverage rate of Douyin's Android terminal was less than 50%. Directly increasing the proportion of heic in the business can significantly reduce bandwidth costs and increase image loading speed.
When promoting the heif animated picture experiment, we found that the frame rate of the personal page UI had deteriorated significantly, with a frame rate drop of 6-8 frames on both high-end and low-end devices. The experiment could not be launched. To address this issue, we optimized the decoding cache logic of the heif animated picture and proposed an independent cache optimization solution for the heif animated picture. heif dynamic image independent cachePrinciple of dynamic image After the image file is downloaded and parsed into a byte stream, BDFresco will pre-decode it before the animated image is officially played. When the animated image is officially played, the Bitmap will be rendered to the screen according to the playback order of the animated image scheduler, and the next frame will be pre-decoded during the playback process. For example, if the 5th frame needs to be played, the 6th frame rate will be decoded synchronously. The pre-decoding operation is performed in the child thread. The core difference between different schedulers is: when the child thread pre-decoding speed is too slow and the Bitmap to be played in the next frame does not exist, it should continue to return to the current frame to repeat the playback and wait for the child thread to decode, or return to the next frame and decode and render directly in the main thread.
Independent Cache After investigating the frame drop problem of heif animation, it was found that heif animation adopted a new playback scheduling logic FixedSlidingHeifFrameScheduler: the animation does not have any pre-decoding logic. When the corresponding frame needs to be played, it is decoded directly in the main thread, that is, one frame is played and decoded one frame at a time. This also causes the Heif animation to occupy a lot of CPU resources in the main thread for decoding during playback. Why must heif animations be decoded in the main thread? Compared with other animated images that support arbitrary frame decoding, heif animated images use inter-frame compression and introduce the concept of I frame and P frame. I frame is a key frame that contains complete information about the current image and can be decoded independently. P frame is a difference frame that does not have complete picture data, but only data that is different from the previous frame. It cannot be decoded independently and decoding depends on the previous frame data. Since the memory cache of AndroidBDFresco is LRU replacement, the Bitmap may be recycled at any time. Therefore, the decoding of Heif animations must be strictly carried out in the order of the animations, otherwise it will cause problems such as flowery screen and green screen during the playback of Heif animations. Solution thinking:
After experiments, we finally adopted an independent cache solution, which achieved bandwidth benefits while maintaining no significant degradation in the frame rate of individual pages. 4.2 Scaling on DemandbackgroundThe image loading process will eventually render the decoded bitmap on the control. When the bitmap size is larger than the control, it will not actually affect the user's perception. The pixel value of the image will not exceed the space occupied by the control. When the image size >> control size:
SolutionWhen displaying an image, the corresponding bitmap and control size are reported. From the reported data, it can be seen that the image size of a large number of business requests is much larger than the control. Therefore, a general solution is needed. Under the premise of ensuring image quality, the client provides a set of control specifications to converge the image to a fixed size according to the control size, ensuring that the image size is basically consistent with the display control, while reducing the image fragmentation problem. Double-column cover scenarios exist in multiple businesses such as personal pages, local pages, and recommendations. Here we take a double-column cover as an example: income
5 Abnormal recoveryAlthough we have made a series of optimizations to the image loading process, due to the large number of images on Douyin itself, some businesses such as e-commerce and IM have high requirements for image clarity, and there are operations such as image zooming and long image display. The business will load super-large images directly into the memory. The memory of a single image can even be as high as 100M+. A large amount of memory will be requested regardless of the disk IO stage, memory decoding, or Bitmap copying process, which will eventually lead to freezes, ANR, and even OOM crashes. Therefore, a backup solution is needed to solve the frequent OOM problem of images and improve the reliability of image loading. When the system memory reaches the ceiling, Tik Tok will relieve the pressure by releasing image memory: it monitors the alarm callback of the system memory, releases image memory caches of different sizes according to different levels, and reduces the probability of OOM and ANR. However, due to the existence of large images, a large number of OOM still exist. OOM backupMemory is a global indicator, and the cause of the exception cannot be directly determined through the OOM stack, because when OOM occurs, the memory may be at a high water level, and it is possible that a small object is requested and the exception is directly triggered. However, most of the top 5 stacks in the crash are related to the image stack, so it is reasonable to suspect that the frequent large memory requests for images in the App caused it. Therefore, for the high-frequency image decoding and memory copy logic, a backup logic is added. When the code encounters OOM, it is actively caught and some memory is released by clearing the memory cache occupied by the image to reduce the memory level:
The experimental results show that although some OOMs are converted into native crashes, the overall impact on users is greatly reduced, and the experiment is in line with expectations. SummarizeOverall, after building full-link monitoring of images, Douyin has made a lot of optimizations to the image loading process based on data analysis.
From the perspective of benefits, it can be roughly divided into two aspects: cost optimization and client experience optimization. The cost benefit is mainly the reduction of image bandwidth costs, and the experience benefit is reflected in the daily activity and OOM indicators. As various optimization solutions are promoted to more business lines, the benefits are also increasing continuously. This article briefly introduces Douyin's best practices, experience, and business benefits for image optimization based on BDFresco. Due to space limitations, this article omits details such as the exploration process and specific implementation, but still hopes to provide some inspiration or reference for colleagues in the industry. Currently, BDFresco has been integrated into the veImageX product of Volcano Engine and is open to the industry. If you want to experience the same image optimization capabilities of Douyin, you can apply for it on the official website of Volcano Engine veImageX.
|
<<: iOS 18 upgrade experience, there’s something new!
>>: Apple is also keeping up with the times on the front end! Did you know?
By analyzing 7 vertical accounts with high reader...
A brand is the position that a product or service...
According to the new product release rules of the ...
What are people competing for in the Internet cir...
WeChat, which used to be updated frequently, has ...
The concept of the "second half of the Inter...
People are in panic due to the epidemic. The skyr...
Old-school Seo: Baidu screen domination and traffi...
Compared with ordinary high-frequency Internet pr...
[[151174]] On the 6th, an investor told me that h...
WeChat Mini Program is an application that users ...
“ Retention analysis is an important method and a...
The holiday season is one of the most magical tim...
The homogeneity of current market competition is ...