I. IntroductionAll editing tools have the ability to create a film with one click, solving the problem of difficulty in editing and packaging special effects in video creation. The mainstream practice in the industry is generally to identify and extract highlights from the video materials uploaded by users, add template special effects packaging in the later stage, and finally produce the film. The above processing will crop the video to adapt to the length of the template to fill the pit. Bilibili started working on the smart film-making function in July 2022. The first version only supports the "image to video" function. The core is to add simple music packaging to the image materials selected by the user and convert them into videos. The basic process is as follows: picture In October 2022, we started working on the second version of smart film production, which supports adding video elements and expands the dimensions of special effects packaging. In addition to the industry-standard template special effects, it also combines smart music and automatically converts user audio information into subtitles. The new smart film production business process: picture The first and second versions above generally complete the special effects packaging of intelligent film structure, namely the three basic intelligent elements of template, soundtrack, and ASR subtitle. Due to historical reasons, with the rapid rolling iteration of business, intelligent film only completed the requirements of rapid launch, that is, the construction from 0 to 1. No core observable indicators are defined for the performance of film. For example, there are many basic experience problems such as the effect problems reported by internal users and the long time it takes to film: picture This article mainly discusses the performance optimization and practice of B station's intelligent film production from the two levels of efficiency and effect. 2. Observability Data ConstructionBased on the overall business flow of intelligent film formation, the three core links and the sub-links under the three main links are sorted out. First, two key available indicators of intelligent film formation are defined. Synthesis time: the total time it takes to start and finish intelligent filming after the user selects the material. Here we use the P90 indicator as a reference The definition and extraction of synthesis effects is relatively complex, and there are three dimensions of links that can be optimized: ● Material application success rate: Improve the material application success rate of the intelligent film-making sub-links (basic templates, soundtracks, ASR subtitles). The successful application of each sub-link means that the final restored effect will be richer. The definition of this dimension is relatively idealistic, but the indicator of the success rate of sub-link material application is quantifiable in business terms. ●The template restores the richness of atomic capabilities: the special effects packaging set template’s own atomic capabilities are completed, and the template has richer sub-elements. This dimension is to supplement the basic capabilities of business means and provide template effects. No explanation will be given. ●Accuracy of material recommendations: Intelligently recommended templates, music and other packaging effects can match the material content selected by the user with high accuracy. The recommendation of templates and music depends on the picture label recognition rate of the AI model. The picture label recognition rate is mainly manually evaluated, and the recognition rate is 41% (P0 picture label recognition rate is 68%). The optimization of this part depends on the ability upgrade of the picture recognition model itself, which is not explained in detail in this article. In the above, we finally selected "material application success rate" as the main quantitative indicator of synthesis effect. This article mainly focuses on the optimization of this dimension. With the basic indicator definition, we organize and output the observation data that needs to be completed from the global perspective of intelligent film formation: picture 3. Performance OptimizationInitial data showed that the P90 time for smart filming was 20 seconds (basically exceeded the time limit), and the material application success rate was 46%. The overall usability was poor. Based on the quantification of basic data, we started from three core links to identify the points that can be optimized and then optimize them. The overall points of investigation are as follows: picture 3.1 Template link optimizationThe initial template download success rate was only 91%, and the template took 19 seconds from frame extraction recommendation to download completion P90. The template recommends that the overall business chain from the material page to the intelligent film synthesis page is as follows: picture We optimize from the following key points 3.1.1 Resource duplicate download problemThere are mainly two types of resource duplicate download problems:
picture The above are two typical resource duplication problems. By reducing unnecessary resources, download time can be saved. Finally, P90 time is reduced by 2s 3.1.2 Special resource transcoding issuesWe collected the links of intelligent slice timeout (service configuration 20s is timeout), analyzed 80+ bad cases, and checked the reasons for timeout one by one. We found two scenarios with more timeouts: picture 1) Subtitle downloads on iOS often time out for 120 seconds. After repeated attempts, we found that there was a bug in the business downloader. When downloading multiple subtitle fonts, the download link would get stuck until the download task times out for 120 seconds before returning the result. 2) When the sub-element of the template material contains GIF material, it is easy to time out. Analysis found that the third-party editing SDK used by the business has a private material format definition. The GIF material will be transcoded into the CAF format material customized by the third-party editing SDK on the template consumer side. This transcoding process takes a long time and is prone to timeout.
picture Based on the optimization in the second direction, a series of links need to be processed:
From the perspective of business iteration and upstream template production and maintenance costs, the preferred solution is "self-developed editing SDK supports CAF format". From the perspective of material format standardization, "Meishe supports reverse conversion of CAF material to GIF format" is preferred. In the end, we chose "self-developed editing SDK supports CAF format" to solve this problem at a low cost. After the material format conversion and multiple subtitle issues were fixed, the P90 time was significantly reduced to 12s. picture 3.1.3 Template resource size production standardization & version compatibilityThe template materials for smart films are generally produced by internal designers. In the early days, the materials were not compressed in a standardized manner when they were put into the library, and there was no size limit for the templates when they were produced. Some of the templates produced were very large and took a long time to download. Here we optimize from two directions: picture
picture
picture Template version compatibility picture A template is a special effect package set, which is composed of multiple basic atomic capabilities, such as subtitles, fonts, transition effects, filters, picture-in-picture, etc., plus a standard restore protocol. The atomic capabilities of templates gradually increase with version iterations. How to design a version compatibility solution? The simple approach is to perform version control on the atomic capabilities supported by different templates. The problem here is:
A more reasonable approach is for the App to maintain a list of atomic capabilities that support restoration. The cloud will select a matching template list based on the template atomic capabilities supported by the App and the atomic capabilities supported by the template itself, and then send it to the App. The above solves the template distribution problem. However, there are still some situations that require version compatibility processing:
The problem with version isolation is that manual configuration is prone to errors. In historical version iterations, there have been a few errors in version isolation information configuration due to long intervals between version releases, frequent personnel changes on the template production side, and incomplete context information, which ultimately led to failure in pulling template sub-elements, thus affecting the success rate of template downloads. We solved the above problems by downloading error messages from templates, indexing the corresponding template sub-elements, and calibrating the template version information one by one. After the problem was solved, the template download success rate increased to 96%. picture 3.1.4 Add preloading and backup processing to template resourcesThe basic practice in the industry is to use preloading and adding a backup to improve the success rate of material download applications. We have made preloading logic from three aspects
picture At the same time, the template downloader itself has been optimized. The historical template business downloader only supports serial downloads. The new download component of the base frame is connected to the business to solve the problem of concurrent downloads. 3.2 ASR Link OptimizationThe second intelligent link of intelligent film formation relies on the ASR service. The ASR service mainly analyzes audio data and outputs audio classification information: music, mixed sound, human voice, and no sound. The identification of each category depends on the proportion of each type of information: picture Its business links are as follows: Points that can be optimized in the ASR link Problem 1: The ASR service takes a long time. When the ASR link time is counted on a single line, it is found that P90 usually exceeds 20s and is unavailable. Question 2: The ASR link pre-process includes audio file extraction and audio upload links. The audio upload link may take a long time. The main reason is historical: there is a business service and file storage service in the middle of the audio file upload link for forwarding, which is time-consuming and lossy. picture Problem 1: Collaborating with the AI server to find extreme cases and troubleshoot, we finally found that the ASR service interface was flushed. The service QPS was too high, which caused the ASR processing of the business to wait in line for a long time. The solution was to add the flushing task to the blacklist. After the processing, the ASR link P90 time consumption was reduced by 50% picture picture Question 2: Simply remove the upload business service middle layer. The client can directly call the basic file storage BFS service interface and then return the storage address to the AI service side, thus reducing the link. picture 3.3 Smart Music LinkThe third link of intelligent film production is music recommendation. Its basic process is as follows. picture There are three main dimensions of indicators for AI-based music recommendation: user characteristics, music characteristics, and picture characteristics:
Music is recommended by weight based on the above three features, and the picture feature dimension is more in line with the effect of the intelligent film at that time. During a component upgrade and replacement process, the business side passed the wrong frame extraction address to the AI service side, resulting in the inability to output the picture label. The AI side returned a music recommendation with a downgraded strategy based on user characteristics and music characteristics (low matching between music and pictures, homogeneity problem), but the business side was unaware of it. The problem was discovered mainly because the AI team had monitoring and alarms based on the screen marking success rate. Over a period of time, the marking success rate was significantly lower than expected. picture Issues fixed:
Early warning of the problem: How do business testers and R&D personnel determine whether the recommended music returned is downgraded during the delivery acceptance stage? And whether the business side can perceive it more quickly after going online. The AI server returns an error message saying that there is no image feature. The client takes two actions based on this error:
Through the above series of optimizations, the intelligent film P90 takes about 10 seconds, and the material synthesis success rate is 90%+ picture 4. Index anti-crackingThe previous part mainly explains the process of intelligent film performance optimization. This part mainly focuses on client monitoring and alarming of the achieved indicators to prevent data degradation. We mainly establish the overall monitoring and alarm process from the following dimensions picture
picture
picture
picture
There are two questions here: how to quickly notify the on-duty personnel after the alarm is triggered? And how to let the on-duty personnel quickly find the error information? picture We configure custom Webhook information through the Fawkes alarm platform. After the alarm is triggered, the standard Webhook configuration is parsed to filter the key log information of the alarm, and the key log information and the on-duty personnel information of the day are encapsulated through the custom Webhook and pushed to the alarm processing group. picture
The above is a real-time alarm monitoring SOP construction, which conducts daily inspections on the three main links of the intelligent film. Regularly collect, analyze, and adjust alarm information, making alarms more accurate and improving daily duty efficiency. V. Summary and Outlook5.1 SummaryWe first defined the core availability indicators of intelligent film production, and refined the observable data of key link nodes based on the core indicators. At the same time, we optimized the time consumption and success rate of the three links of template, ASR subtitle, and music based on the data. Finally, we established a real-time monitoring and alarm duty mechanism for the core links of intelligent film production to prevent data degradation. In the future, data, optimization, and alarm will continue to evolve. Data: more refined, data calibrated Optimization: Intelligent template production end material size monitoring, template material storage standardization, image recognition accuracy improvement Monitoring alarm part: strategy alarm completion (intelligent music matching strategy), intelligent film time consumption alarm completion, alarm granularity refinement, and alignment of dual-end alarm difference items. 5.2 Future DirectionsSmart Film 1.0 mainly includes templates, ASR subtitles, and special effects packaging of the three basic elements of music, and does not process the user materials themselves (Before). Smart Film 2.0 is a product that competes with other products in the industry. It uses the ability of image recognition to intelligently extract highlights and perform intelligent editing (ing...). Intelligent Film Production 3.0 is based on the AIGV big model, generates video content through AI, and produces it in one click (Future).
Authors of this issue Xu Huiyu Senior Development Engineer at Bilibili |
<<: Integrate UniLinks with Flutter (Android AppLinks + iOS UniversalLinks)
>>: Android uses LeakCanary to detect memory leaks
Today, we want to discuss a problem that seriousl...
Produced by: Science Popularization China Author:...
The Good Wife is an American legal drama televisio...
In 2015, DirectX 12 was officially announced, and...
Information flow ads are ads located in the updat...
Exercise is one of the most effective ways to rev...
China Consumers Association The results of the 20...
Information flow advertising, as a new breakthrou...
Brother Xian has been feeling rather depressed la...
On Xiaohongshu, where content is king, being able...
When it comes to "college entrance examinati...
At work, we often use mobile phones and computers...
PART 01 Beijing Aeon Medical acquires German comp...
We imagine the most common scenario, when an appl...
Onmyoji is a phenomenal mobile game with over 10 ...