Application of Image Technology in Live Broadcasting (Part 1) - Beauty Technology

Application of Image Technology in Live Broadcasting (Part 1) - Beauty Technology

2016 is the first year of video live streaming. Whether it is the rising valuation at the capital level, the astronomical remuneration of platform anchors, or the rush of mainstream figures like "bureau chief" to live stream, the social popularity of live streaming can be seen. After the major live streaming platforms have experienced the stage of developing the concept of live streaming from scratch, how to differentiate themselves and how to solve the various pornographic issues that arose during the wild growth period have become the "growing pains" of almost all platforms. Putting aside the policy and content issues, from the technical level alone, innovation based on image technology has become the most feasible way to solve these problems. As the first sharing of Tu material, we first chose the hot topic of live streaming and gave an introductory introduction to the application of image technology in live streaming. This sharing series is compiled from the speech of Tu Tu CTO at the Architect Salon.

1. Technical Framework

Everyone loves beauty. Early image beautification educated the market, and in the live broadcast era, beautification has also become a standard feature of live broadcast platforms. At present, the mainstream technology used for live broadcast beautification is OpenGL ES. Its advantage is that it runs directly on the GPU, so it has high performance and low power consumption, which is more cost-effective for live broadcast. Second, it is cross-platform, supported by both iOS and Android, and the beautification effect can be achieved directly on these two platforms. Another advantage of OpenGL ES is that there are a large number of ready-made open source libraries. For example, GPUImage, Google's grafika, and some practical libraries based on Android are all very commonly used. There are also some enthusiastic developers on the Internet who directly open source their own beauty algorithms, including a complete set of solutions, from acquisition to processing to beauty processing, to the output of a source code, with relevant solutions.

2. Principle of beauty: Mixing is essential

The general principles of most beauty products on the market are similar: the camera collects the image, processes it in a certain way, and finally outputs a beauty-enhanced picture.

Specifically, the original image is first processed by skin smoothing, that is, acne and spots are removed. Then the smoothed image is mixed with the original image. The mixing step is indispensable, because if only the smoothed image is used, it is easy to lose details. Moreover, by mixing the two images, the degree of skin smoothing can be controlled by adjusting the mixing weight of the two images to achieve different levels of skin smoothing effects. Of course, the last step is also critical, which is skin beautification, such as adjusting the skin color to be whiter, redder, or some special needs can be achieved. Basically, most of the beauty processes are like this.

3. Skin resurfacing algorithm: acne removal is noise reduction

Essentially, a picture is a two-dimensional data. If the grayscale values ​​of two adjacent areas differ greatly, this means that there are noise points. For example, if there is a pimple on the face, this pimple will naturally produce a change in grayscale value, which is a noise point in an abstract sense. Therefore, the core of the algorithm for skin refining is denoising. There are many ways to denoise, and there are various algorithms on the Internet, including ready-made papers. However, no matter what algorithm is used, the denoising algorithm for beauty must maintain a feature, that is, it must maintain the boundary and achieve smoothness at the same time, that is, filtering.

The most common filtering algorithm is bilateral filtering, which has the advantage of being very efficient and therefore very suitable for mobile platforms. There are other algorithms that are a bit more complex and can also achieve the same effect, but they are not very efficient in mobile applications. Although GPUs are parallel computing and are very suitable for this type of computing, the capabilities of GPUs are limited and are very power-consuming beyond this range. Including bilateral filtering, a filtering algorithm also has various implementations. Considering running it on a mobile platform, special optimizations can be made, such as appropriately reducing the accuracy in precision calculations to achieve a balance between effect and efficiency.

4. Skin color adjustment - detection is the problem

After the skin is smoothed, the next step is skin color adjustment. The technique of adjusting skin color is already very mature, so the more difficult part is skin color detection. Why do we need skin color detection? Some early live beautification apps did not have this function, so they simply processed the entire image based on the beautified skin color, which caused the overall color cast and the effect was worse than not doing it. Therefore, before processing the image, skin color detection must be performed first, and the pixels corresponding to the skin color range must be found among all the pixels in the image before processing.

The special thing about skin color detection in live broadcast is the conversion of color space. Because there are three main color spaces related to image processing: RGB, YUV, and HSV, and all three color spaces are used in live broadcast.

RGB is the most common color space. The display devices we use in daily life are based on the RGB space, so I won’t explain it here.

YUV is a relatively traditional color space, which was first used in the transmission of television signals and is currently used in the data sampling and transmission process of live broadcasts. This is because the human eye is much more sensitive to brightness (Y) than chrominance (U, V), so YUV is easier to compress than RGB, which makes it easier to save bandwidth for transmission.

The HSV color space is used for skin color detection. Because if RGB is used for skin color detection, it is necessary to detect whether the three values ​​of R, G, and B meet the color range of skin color at the same time. The same is true for YUV. Among the three values ​​of HSV: hue (H), saturation (S), and value (V), only H is related to skin color, so only H needs to be considered (H values ​​between 25-50 can be judged as skin color), and the amount of calculation required is naturally much less than RGB.

Therefore, at different stages of live broadcast, these three color spaces should be used separately, and these three color spaces should be constantly converted into each other.

5. Details — Beyond the Algorithm

Beauty algorithms are important, but beauty is a very subjective thing. Even if the algorithm is written very beautifully and efficiently, it does not guarantee that the beauty effect is perfect. Therefore, after using the standard algorithm, designers still need to make adjustments based on their own experience. For example, many platform algorithms are similar, but why do the final beauty effects still feel different? In fact, there are many details in it, which requires time to optimize, especially what the user needs are and how to be more beautiful.

Let me give you another example. Many platforms have very different live beautification effects under different lighting conditions, such as daytime, nighttime, indoors, outdoors, natural light, and artificial light. The reason for this may be that the algorithm does not take lighting factors into account, resulting in a small factor affecting the effect.

Therefore, this requires a lot of testing, using technical means combined with manual optimization to ensure the best beauty effect. As the saying goes: the devil is in the details.

6. Performance - Don't like it? Let's run a test.

When it comes to performance, the iOS platform generally has no problems, or very few problems. For example, GPUImage is a third-party library that is a long-standing library on the iOS platform. It implements many effects, such as some of the algorithms mentioned above. In GPUImage, you can see a simple version of the implementation, including how to write scripts, how to run, and how to do bilateral filtering. There are simple implementations that can also have good results. Including when doing live broadcasts, GPUImage can be used as a good client extension. The only thing you need to do is to add a push stream; because it contains the data from the client's acquisition and processing to each frame, whether it is YUV or RGB, it can be output. So there are relatively fewer problems on the iOS platform.

The Android platform has a bigger problem. Because of the characteristics of Android itself, there are many manufacturers, many devices, and many system versions, so it is difficult for them to be compatible with each other.

The first is the equipment problem. For example, when a beauty algorithm is run on different machines, even if it is the same GPU, the performance may vary greatly. Therefore, in order to ensure that a script can adapt to different machines, there is a way to do this: make a grading algorithm based on the performance of the GPU. If the rating is relatively high, use the most complex algorithm. If the performance rating is relatively low, reduce the beauty effect to ensure that it can be used in most environments.

The second is the version issue. For example, only versions above 4.0 can directly obtain a texture from the camera acquisition through the camera. This is called GLTEXTUREEXTERNAL_OES. The camera directly transfers the acquired image to the GPU, and all is accelerated by the GPU. What can be done after 4.3? From camera acquisition to processing to encoding, all GPUs are used. This is the best and fastest, and of course this has the highest compatibility requirements for the system. Because some manufacturers did not have compatibility with these things when implementing them, it is difficult to do GPU acceleration.

There is also YUV output. Many live broadcast platforms need to support YUV output. These YUV data involve the conversion process between CPU and GPU. Because the processing may be completed in the GPU, the GPU cannot output directly, so it is necessary to convert from GPU to CPU. There is no better solution for this at present. Some of the underlying GPUs of Android are not open yet. Sometimes it can be achieved through Graphics Buffer, but Android has not opened this. If you want to have this thing, the only thing you can do is to take out the Android source code, including linking the source code to the key code, so as to achieve better results. The conversion from CPU to GPU can be at the millisecond level. If it is directly converted from GPU to CPU, it may take about 20 milliseconds for a good device. The resulting data may have frame drops if you predict 24 frames. It may not have much impact on the mainstream, and it is acceptable in most cases. Of course, this also depends on the user's application scenario.

*** Let me talk about a question that is often asked: iOS and Android platforms both have built-in face detection APIs, why not use them?

First of all, the system frequency is low and the speed is slow. Apple may have such a consideration, not to affect the normal use of the camera API, so the frequency is very low. It may take 3 seconds to detect once; it is not that it takes 3 seconds to detect once, but that it takes 3 seconds to give you a data to tell you whether there is a face in this photo. As a practical product, assuming 24 frames per second, at least a dozen tests must be done to meet the real-time requirements, otherwise it will not be able to keep up with the frame rate requirements. The problem is more serious on Android, because it depends on the device, and some devices don’t even have it, so the manufacturer directly removes this setting. Another problem is the feature point. There are these feature points on iOS, such as eyes, mouth, and nose, but there are no such feature points on Android.

7. Beauty 2.0 – From makeup to plastic surgery

The above content all belongs to the concept of beauty 1.0, and the most advanced beauty technology has developed to the concept of 2.0. To make a simple analogy, if beauty 1.0 is just makeup, beauty 2.0 can basically achieve the effect of plastic surgery - making the eyes bigger and the round face into an oval face. The basis for achieving this effect is face recognition. This is easy to understand. Only by determining whether there is a face and knowing where the facial features are, can we make them more beautiful.

Regarding face recognition, this is another big problem. Due to space limitations, we will discuss this issue in detail in the next issue.

<<:  [Umeng+] Li Danfeng: Insight into the business secrets of big data from user behavior data

>>:  Application of Image Technology in Live Broadcasting (Part 2) - Image Recognition

Recommend

6 ways to operate Weibo for beginners, get started quickly

01. Understanding of Weibo Operations If you have...

Understanding the Infernal Affairs of Coolpad, 360, and LeTV

[[138273]] Coolpad Group announced that the compa...

Qiuqiu's 7-day traffic explosion attack and defense strategy (Issue 1-2)

This course can help you solve the problems of ins...

Hawking: Soon humans will be no match for artificial intelligence

Famous physicist Stephen Hawking recently appeared...

A diver was bitten by a moray eel underwater and his blood turned green...

Tim Powell is a diving enthusiast. In 2010, he up...

Do you have lumbar disc herniation at a young age? This can relieve the pain

I don’t know if you have noticed that when the we...

Samsung monopolizes Snapdragon 835, LG G6 is forced to polish 821 processor

It has been rumored for a long time that Samsung ...