Live Broadcast with Live Broadcasting X WeChat Mini Program

In December 2017, WeChat Mini Program opened up real-time audio and video capabilities to developers, bringing a broad space for imagination to the industry. Lianmai interactive live broadcast technology became the standard for video live broadcast in the live broadcast boom in 2016, but only on the native APP can a good user experience be guaranteed. At that time, real-time audio and video interaction was not possible in WeChat Mini Program. WeChat Mini Program announced the opening of real-time audio and video capabilities in December last year, and Apple announced last June that it would support WebRTC. The industry suddenly saw thousands of trees in bloom, and the future was bright. What kind of chemical reaction can Lianmai interactive live broadcast technology produce with WeChat Mini Program and WebRTC? What do developers need to know and consider when implementing Lianmai interactive live broadcast technology on WeChat Mini Program or browser WebRTC?

On Saturday, March 17, 2018, at the Zego Meetup Beijing stop, a technology salon hosted by Zego Technology, Xian Niu, a senior technical expert and architect at Zego Technology, shared with participants the Zego team’s thoughts and practices on the combination of live broadcasting technology and WeChat mini-programs.

On that day, Beijing was covered with heavy snow early in the morning, but it could not stop the enthusiasm of the participants to learn and communicate, and the event site was packed. Auspicious snow means a good harvest, and 2018 will be a good year for entrepreneurs.

Technical difficulties and solutions for live broadcasting

Let’s first review the interactive live broadcast technology, starting with the application scenarios.

The first type of application scenario is the most common multi-host connection scenario in live video broadcasts. Since 2016, it has developed from one-way live broadcasts to two-person connection, three-person connection, and gradually to multi-person connection. Two-person connection refers to the two hosts in the live video broadcast scene interacting with each other. Specific program formats include conversations, talk shows, karaoke or chorus. In live video broadcasts, it is very common for two to three hosts to connect with each other, and sometimes the audience is allowed to connect with each other. The application scenarios of multi-host connection include werewolf killing, multi-person video group chats, and group live quiz. On the mobile terminal, there are often more than a dozen or twenty users interacting with each other in the same room.

The second application scenario is online claw machine, or live claw machine, which is also a product form of live video broadcasting, a combination of live video broadcasting and the Internet of Things. In addition to live video broadcasting, online claw machine technology also adds signaling control, which can realize remote viewing of the claw machine and control of the doll crane. At the same time, the anchor and the audience can interact through text, as well as voice and video live broadcasting. This is a trend at the end of 2017, bringing the live broadcasting technology of live broadcasting to the combination of live video broadcasting and the Internet of Things. I believe that more application scenarios combining live video broadcasting and the Internet of Things will emerge this year.

The third application scenario is live answering, which is a trend that emerged in January 2018. It is an exploration of answering programs in the video live broadcast scenario. In addition to the basic requirements of low latency, smoothness and high definition, this application scenario also requires that the answering questions and the video screen must be synchronized. In addition, the number of users in the live answering room of Huajiao Live once exceeded five million, so the live answering technology must support millions of concurrent users. Although the entry threshold has been increased due to regulatory reasons during the Spring Festival, I believe that there will be other new ways of playing. Some new ways of playing discussed in the industry are also shared with you here: the host can invite guests to answer questions through the microphone, and users participating in the live answering can set up sub-rooms to answer questions in groups. These innovative ways of playing are technically feasible. In essence, this is the combination of live answering technology and interactive live broadcast technology through the microphone.

What are the requirements for live video technology in these three application scenarios? The first is that the delay must be low enough. If the one-way delay cannot be less than 500 milliseconds, the interactive experience of the video call cannot be guaranteed. The second is echo cancellation. Because when user A and user B make a video call, user A's voice is collected and fed back when it is transmitted to user B. User A will hear an echo after a certain delay, which has a great impact on the call experience. Therefore, echo cancellation must be performed. The third is to be smooth and not stuck. Why is smoothness necessary? Because there is a requirement for ultra-low latency, smoothness and latency are themselves a pair of contradictory technical requirements. If the latency is low enough, the jitter buffer must be small enough, so that network jitter can easily appear, resulting in the picture being too fast, too slow, or stuck.

Let’s take a closer look at how to solve these three core technical requirements of live video streaming.

1. Ultra-low latency architecture

The system architecture of the live broadcast solution on the market is generally like this. On the left is a low-latency network, which provides live broadcast services for users who need low latency, but the cost is relatively high. On the right is a content distribution network, which provides video live broadcast services for onlookers. Although the latency is slightly higher, the cost is relatively low and it supports higher concurrency. The middle is connected by a bypass service. The bypass server pulls the audio and video streams from the low-latency real-time network, selectively performs mixing, format conversion or protocol conversion, and then forwards them to the content distribution network, and then distributes them to onlookers through the content distribution network.

To build an ultra-low latency real-time system architecture, the following points need to be considered:

Load balancing - Ultra-low latency architecture must be load balanced, which means that any network node must load users evenly. If the number of users accessing a network node exceeds the upper limit it can carry, a large number of packets are likely to be lost, which will trigger network congestion, causing more packet loss and poor user experience.
Nearby access - "Near" on the Internet is different from what we understand as near on a straight line. This can be compared to a traffic network. Suppose you see another point close to you while driving, but it may not be close in fact. You need to consider two points: The first point is connectivity. Although points A and B seem to be close, there is no direct road from point A to point B, which is equivalent to network disconnection. The second point is congestion. If the road is short but congested, it may not be close. For example, when a user in Dubai and a user in Beijing are connected, it seems that the direct streaming from Dubai to Beijing is the closest, but in fact this direct path may not be accessible, so it is necessary to detour to Hong Kong to continue the transmission. Taking a detour may make the distance on the network "closer".
Quality assessment - The static method of quality assessment is post-evaluation, which is to review past data, analyze the data of users in a certain area pushing traffic to a certain area at various time points, summarize the solution of which path is better at which time point, and then manually configure the relevant data to be transmitted to the network in real time, which can improve the transmission quality.
Dynamic routing - Another method of quality assessment is dynamic assessment, which is to dynamically assess quality based on historical data. After a period of operation, the transmission network will accumulate a lot of user data. For example, the optimal path for Shenzhen users to push traffic to Beijing under different network conditions in the morning, noon, and evening. The accumulation of this data can serve as a basis for dynamically formulating routing strategies. This is dynamic routing.
Algorithmic flow control - In a real-time transmission network, we need to select an optimal path for streaming. If this optimal path still cannot meet the requirements of ultra-low latency, we need to make some compensation in the algorithm, such as channel protection, by increasing redundancy to protect the data in the channel. There are also some flow control strategies when pushing the stream. In the upstream network, if network jitter or weak network conditions are detected, the bit rate will be reduced. If the network conditions improve, the bit rate will be increased. In the downstream network, layered encoding can be used to select video streams with different bit rates for users in different network environments.

2. Echo Cancellation

What is an echo? For example, if you are a near-end user and receive the voice of a far-end user, this sound is played through the speaker and will propagate in the room. After being reflected by the ceiling, the ground, and the windows, it will be collected by the microphone together with your voice and then transmitted to the far end. After a delay of one or two seconds, the far-end user will hear his or her own voice again, which is an echo for the far-end user. In order to ensure the user experience, echo cancellation must be performed. For the audio and video engine, the sound collected by the microphone contains the echo of the far-end user and the real voice of the near-end user, which is difficult to distinguish: these two sound waves are indistinguishable sounds collected from the air, a bit like blue ink and red ink mixed together, which are difficult to separate.

Is there no way out? Actually, we still have some solutions. The original sound transmitted from the far end is the reference signal. Although it is related to the echo signal, it is not exactly the same. It is wrong to directly subtract the original sound from the sound collected by the microphone. Because the echo is formed after the reference signal is played, bounced and superimposed in the air, it is related to the reference signal, but not the same. We can understand that there is a certain functional relationship between the echo signal and the reference signal, and what we need to do is to solve this functional relationship. Use the reference signal as the input of the function to simulate the echo signal, and then subtract the simulated echo signal from the sound signal collected by the microphone to finally achieve the purpose of echo cancellation. We implement this function through a filter. The filter will continuously learn and converge, simulate the echo signal, make the simulated echo as close to the echo signal as possible, and then subtract the simulated echo signal from the sound signal collected by the microphone to achieve the purpose of echo cancellation. This step is also called linear processing.

There are three types of echo scenarios: mute, single talk and double talk. For single talk (that is, one person talking), the suppression effect after linear processing will be better, and the echo elimination will be clean. For double talk (that is, multiple people talking at the same time), the suppression effect after linear processing is not so good. At this time, the second step is needed: non-linear processing, to eliminate the remaining echo. There are not many open source things for non-linear processing as a reference. It depends on each manufacturer to study it, which can fully reflect the technical accumulation of each manufacturer.

3. Jitter Buffer

There are congestion, packet loss, disorder and jitter in the network, so network transmission will cause data damage. In particular, when using a private protocol based on UDP to transmit voice and video data, jitter buffering is required. Taking WebRTC as an example, the jitter buffer for audio data is called NetEQ, and the buffer for video data is called JitterBuffer, both of which are very valuable parts of the WebRTC open source project. Jitter buffering is to buffer and sort data packets, and compensate for network conditions such as packet loss and disorder to ensure smoothness. The queue length of the jitter buffer is essentially the queue delay time. If it is too long, the delay will be large. If it is too short, the jitter will be revealed, and the user experience will be poor. Regarding the setting of the jitter buffer length, each manufacturer has different practices. Some use the maximum equation of the jitter time of the network message as the length of the buffer queue. This is an open topic that requires each manufacturer to think about it.

Let's make a summary here. From the push end to the pull end, the whole process includes seven links: acquisition, pre-processing, encoding, push, pull, decoding and rendering. Let's take a look at the three technical difficulties mentioned above.

1) Low latency. Basically, there are three types of links that introduce latency: acquisition and rendering, encoding and decoding, and network transmission. The first type is the acquisition and rendering link, which brings relatively large delays, especially rendering. Almost no mobile system can guarantee a 100% delay of 50 milliseconds. This is caused by some hardware limitations. The second type is the encoding and decoding link. In particular, the audio codec is encoded forward, which itself will cause delays. Some audio codecs can even cause a delay of 200 milliseconds. The third type is network transmission. In the real-time transmission network of iQiyi Technology, the round-trip transmission delay can be less than 50 milliseconds. Among them, acquisition, rendering, and encoding and decoding are all implemented on the terminal.

2) Echo cancellation belongs to voice pre-processing 3A and needs to be performed in the pre-processing stage, that is, implemented in the terminal.

3) Jitter buffering is implemented at the receiving end. The jitter buffer at the receiving end determines the time interval at which the sending end should send data packets.

In summary, the three technical difficulties mentioned above are all implemented on the terminal, so the terminal is very important. Next, we will focus on comparing the implementation of the live broadcast technology on various terminals.

Comparison of live broadcasting on various terminals

The terminals for live broadcasting mainly include: native APP, browser H5, browser WebRTC, and WeChat applet. The applications on the browser include H5 and WebRTC. The former can be used for streaming, while the latter can achieve both push and pull streaming.

Live broadcast mobile terminal-Native APP

The structural diagram of the native APP terminal audio and video engine is as follows, which basically includes the audio engine, video engine and network transmission, collectively known as the real-time audio and video terminal engine. It also includes the underlying audio and video acquisition and rendering, as well as the network input and output capabilities, which are the capabilities opened by the operating system.

Native apps have a natural advantage. They interact directly with the operating system and can directly use the resources and capabilities opened by the operating system, such as audio and video acquisition and rendering, as well as network input and output. To use a fashionable advertising slogan: "No middlemen to make a profit from the price difference", directly connecting with the operating system can provide a better user experience.

The advantage of implementing live broadcasting with microphones on native APP is that it has better control over the seven links mentioned above, can achieve relatively low latency, can independently develop and implement the 3A algorithm for voice pre-processing, including echo cancellation, and has better control over the jitter buffer strategy and bitrate adaptation strategy. In addition, you can choose to use the RTMP protocol or a private protocol based on UDP, which is more secure against weak network environments.

The more popular pre-processing technologies on the market, such as beauty, widgets, voice change, etc., can be implemented or connected to these technologies by native apps through open pre-processing interfaces. Why do we emphasize this? Because browser WebRTC and WeChat applets do not have open pre-processing interfaces, developers cannot implement or connect to third-party beauty or widget technology modules by themselves.

On native apps, developers can have full control over the experience, allowing users to have a better experience. Mainstream live video platforms all have their own native app platforms, while browsers and WeChat applets are relatively auxiliary. Native apps have the best user experience and are the most controllable for developers.

What are the disadvantages of implementing live broadcasting on native apps? The development threshold is high, the development cycle is long, and the labor cost is high. In addition, from the perspective of acquiring users and spreading, it is not as convenient as browsers and WeChat applets.

Live broadcast mobile terminal - browser (H5)

Browser H5 is like a coin with two sides, with advantages and disadvantages. The advantages are low development cost and easy dissemination, while the disadvantages are that it can only pull streams, not push streams, and cannot achieve live broadcast with multiple users. In addition, the delay on browser H5 is also relatively large. If RTMP or HTTP-FLV is used, the delay will be between 1 second and 3 seconds. If HLS is used, the delay will be greater than 8 seconds or even 10 seconds. Such a large delay does not allow live broadcast with multiple users.

All three protocols are played through the player in the browser H5. In the scenario of multiple hosts interacting with each other, a player can only play one video stream. Three hosts need three players, so you cannot see multiple hosts interacting with each other in the same frame. If you want to see multiple hosts interacting with each other in the same frame, you must mix the multiple streams into one stream and play it in a single player.

In addition, the source code of the browser H5 is open. If the audio and video terminal engine is implemented on the browser, it is equivalent to making all the core source code public. Therefore, no manufacturer has ever made a complete audio and video engine on the browser H5. Even if you are willing to do it, the browser will not allow you to do so. There is a browser between the developer and the operating system. If the browser does not open the core capabilities of the operating system to the developer, the developer cannot collect and render independently, cannot control network input and output, and functions such as flow control and code control cannot be realized.

In the browser H5, it can also be transmitted through websocket, played with jsmpeg, and the video encoding and decoding format is mpeg1.

mpeg1 is an older media format that is supported by all browsers. Use jsmpeg player to play mpeg1 in the browser, which is also supported by all browsers. This can achieve a relatively low latency, but it still cannot push the stream and cannot achieve live broadcast with microphone.

Example: Online claw machine H5 version

Below, we will use the H5 version of iQiyi's online doll-grabbing app as an example to briefly introduce the application of websocket on the browser H5. As you can see from the upper left corner of the figure below, when the browser H5 terminal accesses the iQiyi real-time transmission network, we have added a video access server. On the right is the iQiyi real-time transmission network, which uses a private protocol based on UDP. The access server is used to achieve protocol conversion and media format conversion: conversion between websocket and private protocol based on UDP, and conversion between mpeg1 and H.264. If the native APP is accessed, no conversion is required. Although there is an access server, no conversion will be performed.

In addition, the H5 version of the online claw machine has no sound. In addition to the application scenario requirements, the audio engine must be implemented in H5 to produce sound. If the audio engine is implemented on the browser H5, it is equivalent to open source technology. So far, no manufacturer has done this.

Mobile terminal for live broadcasting with microphone - browser (WebRTC)

You may feel sorry that although browser H5 is easy to spread and simple to develop, the experience is poor and it cannot be live broadcasted with microphone. So can it be pushed on the browser and can live broadcasting with microphone be realized? The answer is yes, and that requires the use of WebRTC.

The WebRTC mentioned here refers to the WebRTC that has been embedded in the browser and supported by the browser, rather than the source code of WebRTC. Some mainstream browsers have embedded WebRTC, opening up the browser's real-time audio and video capabilities to developers.

The above figure is the structure diagram of WebRTC. We can see that WebRTC includes audio engine, video engine, transmission engine, etc. The bottom dotted box indicates that it can be overloaded, which means that the browser opens the bottom-level audio and video rendering and network transmission capabilities to developers, and developers can choose whether to overload according to their needs. The audio engine includes two codecs: iSAC and iLBC. The former is for broadband and ultra-wideband audio codecs, and the latter is for narrowband audio codecs. The audio engine also includes audio jitter buffer, echo cancellation and noise suppression modules. The NetEQ algorithm in the jitter buffer can be said to be one of the essences of WebRTC. The video engine includes VP8 and VP9 video codecs, and even the upcoming AV1. The video engine also includes video jitter buffer and image quality enhancement modules. For the transmission engine, WebRTC uses SRTP (Secured Realtime Transport Protocol) secure real-time transmission protocol. Finally, WebRTC adopts P2P communication mode, and there is no back-end implementation such as media server. The above is a brief introduction to WebRTC.

I will not repeat the general advantages and disadvantages of browser WebRTC here. Please search on Baidu. I will only mention the key points here. The advantage of browser WebRTC is that it implements a relatively complete audio and video terminal engine, allowing streaming on the browser and live broadcasting with microphones. However, browser WebRTC also has shortcomings:

There is no open pre-processing interface, and modules such as beauty and pendants cannot be connected to third-party or self-developed solutions.
The media server backend is not implemented. Developers need to implement the media server and then access it through an open source WebRTC gateway (such as janus).
Capabilities such as codec, jitter buffer, and voice pre-processing 3A can only rely on WebRTC and cannot be customized.
Some mainstream browsers do not support WebRTC, especially Apple's browser. Although Apple announced support for WebRTC last year, the latest version of iOS Safari does not support WebRTC well. The mainstream version of iOS Safari does not support WebRTC, and the WeChat browser on iOS does not support WebRTC either.

As shown in the figure above, since WebRTC does not provide the implementation of the media server, it is necessary to connect the browser WebRTC to the media server backend, which can be self-developed or a third-party service. The protocols and media formats between the browser WebRTC and the media server backend are different, so protocol and format conversion is required. The UDP-based SRTP used by WebRTC needs to be converted into the UDP-based private protocol of the media server. In addition, the media format also needs to be converted, because the default voice and video format in WebRTC is VP8 or VP9. At the same time, some adjustments need to be made to the signaling scheduling in the real-time transmission network. The access layer between the browser WebRTC and the media server backend can also be implemented using an open source WebRTC Gateway (such as janus).

The browser is a super application similar to the operating system. It has an important traffic entrance, but it is also the "middleman" between developers and the operating system. Developers obtain the real-time audio and video capabilities of the browser through WebRTC, but they must also endure the pain brought by WebRTC.

Live broadcast mobile terminal - WeChat applet

The title of this speech is "Lianmai Interactive Live Broadcasting x WeChat Mini Programs". Why do we only start discussing mini programs here? Please allow me to explain the reason. What is WeChat Mini Program? It is a lightweight application running on WeChat. What is WeChat? It is a super application similar to an operating system. Are these features very similar to browsers and H5? H5 is a lightweight application supported by browsers, while browsers are super applications similar to operating systems. Browsers are backed by major international technology giants, unlike WeChat, which is backed by only Tencent, an Internet giant. Therefore, from this perspective, WeChat Mini Programs, browser WebRTC and H5 have something in common.

WeChat Mini Program can be compared to the client and server structure of browser H5. HTML corresponds to WeChat Mini Program's WXML, CSS corresponds to WeChat Mini Program's WXSS, and the scripting language of the Mini Program is the same as JS, but the framework is different. WeChat Mini Program provides two tags, one is <live-pusher> and the other is <live-player>. <live-pusher> is for pushing streams, and <live-player> is for pulling streams, which can achieve one-way live broadcast or live broadcast with microphones. Mini Program provides two modes: LIVE and RTC. LIVE supports one-way live broadcast, and RTC supports low-latency live broadcast with microphones. Currently, WeChat Mini Program uses RTMP protocol for pushing streams. If you want to communicate with private protocols, protocol conversion is required.

WeChat Mini Programs have opened up real-time audio and video capabilities, which is a major benefit to the industry. However, based on the above information and logic, we also see the advantages and disadvantages of using WeChat Mini Programs to achieve interactive live broadcasts.

There are three benefits :

1) Low development cost and short development cycle, which is basically the same as the development difficulty of H5;

2) It is easy to spread and acquire customers, making full use of WeChat’s high-quality traffic;

3) It can push and pull streams, allowing for live broadcasts with microphones and real-time voice and video calls.

There are four shortcomings :

You will be limited by the real-time audio and video capabilities of the WeChat Mini Program. For example, if its echo cancellation has some problems, you can only wait for the WeChat team to optimize it at their own pace, and you have no way to optimize it yourself.
The mini program does not have an open pre-processing interface. You can only use the mini program’s built-in beauty or voice-changing functions (if any). You cannot connect to self-developed or third-party beauty or voice-changing modules.
Pushing and pulling streams through the RTMP protocol cannot interoperate with private protocols based on UDP. If you want to achieve interoperability with private protocols based on UDP, you must add an access layer to convert the protocol format or even the media format.
Without implementing a backend media server, developers must implement the media server themselves or connect the WeChat mini program to a third-party real-time communication network.

The browser opens up the real-time audio and video capabilities of the browser through WebRTC, and WeChat opens up the real-time audio and video capabilities of WeChat through mini-programs, allowing developers to implement live broadcasts and real-time audio and video calls on two operating system-like platforms. However, whether WebRTC or mini-programs only get you started on the terminal, for developers to truly implement the entire system, there is still a lot of work to be done.

The following figure shows how WeChat Mini Programs connect to the real-time audio and video transmission network. The audio and video terminal engine of WeChat Mini Programs also includes an audio engine, a video engine, and a transmission engine. The audio engine is responsible for acquisition and rendering, audio jitter buffering, voice pre-processing, and encoding and decoding. The video engine is responsible for acquisition and rendering, video jitter buffering, video pre-processing, and encoding and decoding. Regarding the transmission engine, WeChat Mini Programs use the RTMP protocol to push and pull streams. It is not clear whether the lower layer of its RTMP protocol is the TCP protocol, or whether it uses a private protocol based on UDP through QUIC. If the lower layer of RTMP is a private protocol based on UDP, then the resistance in a weak network environment will be relatively better, while the TCP protocol is a fair protocol with poor controllability in each link, and the experience in a weak network environment will be relatively poor.

If you want to connect the WeChat applet to the real-time audio and video transmission network, there must be an access server in the middle, which we call the access layer. In the access layer, we need to convert the protocol. For example, if the real-time audio and video transmission network uses a private protocol based on UDP, then the RTMP protocol must be converted to a private protocol based on UDP. There is also the conversion of media formats. If the media format is different from that of the real-time transmission network, it also needs to be converted.

Live broadcast mobile terminal - WebRTC access to applet through WebView

Are there other ways to do live broadcasting with microphones on mini programs? Do I have to use the voice and video capabilities of WeChat mini programs? Not necessarily. The following figure shows a technical solution I have seen on the market. It bypasses the real-time voice and video capabilities of WeChat mini programs and implements live broadcasting with microphones through the WeChat mini program WebView component. I would like to share it with you here.

The basic idea of this solution is to take advantage of the browser characteristics of WebView and use the Web API of WebRTC in WebView to obtain real-time audio and video capabilities on the mini program. The above figure is a topology diagram of this solution. The bottom layer is the basic capabilities of the WeChat mini program. The upper layer is WebView, which is a control of the WeChat mini program. It can be simply regarded as a component similar to a browser. It provides some features of the browser, but it is not a complete browser. The WebView of the WeChat mini program is similar to a browser, so it may support WebRTC. However, it must be noted that the WebView of the WeChat mini program supports WebRTC on the Android platform, but does not support WebRTC on the iOS platform. Although this solution can theoretically realize live broadcasting on the WeChat mini program, it has the following limitations:

1) On the iOS platform, WeChat Mini Programs do not support this solution, as mentioned above.

The mini-program WebView is not a complete browser. It performs worse than ordinary browsers and has many limitations.
There are several layers between developers and the operating system: WeChat bottom layer, mini-programs, WebView, WebRTC, and then the developer's mini-program application. Each layer of abstraction will bring performance consumption and affect the final experience.

This solution is essentially a WebRTC-based solution. It does not use the real-time audio and video capabilities open to the WeChat Mini Program. Instead, it quickly uses the WebView component and takes a different approach, using WebRTC in the WeChat Mini Program in a very pleasing way.

Interoperability of live broadcast on various terminals

As the technology of interactive live broadcasting with microphones is gradually realized on various terminals, we will face a question: Can microphones be connected and communicated on various terminals? For example, can user A on WeChat mini program communicate with user B on native APP?

Let's start with the scenario mentioned above. User A uses the RTMP protocol to push and pull streams on the WeChat applet. If user B uses the RTMP protocol for both pushing and pulling streams on the native APP, then the two can naturally communicate with each other. If the native APP pushes and pulls streams using a private protocol based on UDP, then they cannot communicate directly with each other. The protocol and format must be converted at the access layer before interactive communication can be achieved. This scenario can also be extended: Can user A on the WeChat applet communicate with user C on the browser WebRTC? The logic behind it is the same.

Taking ZEGO's solution as an example, ZEGO's native APP SDK has two versions: supporting RTMP protocol and private protocol based on UDP. If you use the native APP SDK that supports RTMP protocol, you can directly interact with the mini program through the microphone. If you use the native APP SDK based on UDP private protocol, you need to go through the access server to convert the protocol and format.

Private protocols based on UDP will perform better in weak network environments, while RTMP protocols also perform quite well in non-weak network environments and are well compatible with CDN content distribution networks. For example, Huajiao Live's live broadcast solution has always used the RTMP version of the technical solution provided by Zego Technology. It has been running online for two years and has always maintained a good user experience.

Conclusion

The technology of live broadcasting with microphones has been gradually extended to native APP, browser H5, browser WebRTC, and WeChat applet, which has led to a richer ecosystem and provided a more convenient and good user experience, which is good news for video live broadcast platforms and users. However, if you want to wear the crown, you must bear its weight. Especially in browser WebRTC and WeChat applet, developers must fully understand the characteristics and limitations of these types of terminals in order to better use the technology of live broadcasting with microphones to innovate and serve users.

【Welfare】

ZEGO Meetup Shanghai | Technical Practice of Video Live Streaming+

From 2016 to 2017, the live video industry went from explosive growth to maturity. At the turn of 2017 and 2018, the live video industry experienced a second spring, which made people wonder whether there would be a third spring.

What goes up and down are the trends and openings of the industry, while what advances steadily is the practical application of technology.

After just concluding the ZEGO Meetup in Beijing, we decided to continue to Shanghai to discuss the technology and best practices of live video streaming with our partners in Shanghai.

In this event, we invited four audio and video technology experts from Zego Technology, Momo Live, Hujiang CCtalk, and TuSDK. They will bring:

"Momo live audio and video practice and optimization"

《Live Interactive Broadcast with MC X WebRTC》

《Using RTC technology to build an excellent online education platform》

Technical Fusion of Deep Learning and Video Effects

Full of useful information, it’s this Saturday!

Click " Read original text " at the end of the article to quickly sign up~

Friends who have successfully registered can add the WeChat of Zego staff at zego_tech_consulting, and note "name-company-position" to be added to the event group in advance~

Time: 13:00-17:30, March 31, 2018

Location: 1F, LANXESS Innovation Center, Building C3, No. 700, Yishan Road, Xuhui District, Shanghai Unicorn Global Parallel Accelerator (Guilin Road Subway Station Exit 1 or 5)

Guest Agenda

Topic 1: "Momo live audio and video practice and optimization"

[[224488]]

Huang Mingxin, head of the front-end team of Momo Live

Guest profile: He used to work for Tiantian Dongting and English Liulishuo, and is now responsible for the team leader of the front-end development team at Momo Live. He is mainly technically inclined to JS and Node full-stack development, and is mainly responsible for the development and management of the company's main site, internal services, and some microservices.

Contents:

1. Some pitfalls encountered in the early stage of the Momo live broadcast project and their solutions, such as large delays, obvious freezes, slow loading speeds, and poor performance on mobile phones.

2. When the business needs to use a third-party CDN or SDK, how to integrate resources and call different SDKs to each other?

3. Momo's performance optimization and front-end technology solutions on Web/H5 - how to reduce the size of flash, reduce latency, play flv without flash, etc.

Topic 2: "Interactive Live Broadcasting with MCs X WebRTC"

[[224489]]

Xian Niu, senior technical expert and architect at Jigoo Technology

Guest Profile: Master of Computer Science from Beijing University of Posts and Telecommunications, Master of Business Administration from the University of Hong Kong. Responsible for the development and research of iQiyi's real-time audio and video engine. His current work focuses on the comprehensive adaptation of live broadcast technology on mobile terminals, focusing on industries such as live video broadcasting, audio and video social networking, the Internet of Things, and online education.

Content introduction: The interactive live broadcast technology with microphones must meet the demands of application scenarios, and the application scenarios must consider the technical characteristics of mobile terminals: native iOS and Android APPs have good experience but are not easy to spread, and browser WebRTC is easy to spread but is subject to technical limitations. The interactive live broadcast technology with microphones must fully consider the technical characteristics of various mobile terminals to allow the powerful capabilities of real-time audio and video cloud services to fully cover various application scenarios. This technical speech will start with WebRTC, while covering mobile terminals such as WeChat applets, browser H5 and native APPs, and combine the technical characteristics of these terminals to share the thinking and practice of live broadcast technology in the process of adapting to these terminals.

Topic 3: "Using RTC technology to build an excellent online education platform"

[[224490]]

Wu Haibin, CCTalk Audio and Video Architect at Hujiang

Guest profile: He has worked in the Multimedia Theme Institute of Shanda Innovation Academy, Alibaba and other companies. He is currently a CCtalk audio and video architect in Hujiang, engaged in the research and development of CCtalk audio and video related.

Content introduction: Hujiang has been committed to building CCtalk into an excellent education platform to meet the needs of all kinds of people and institutions to start classes and teach. This requires CCtalk not only to achieve complete functions and low cost, but also to achieve low latency in interaction and high stability in operation. This sharing mainly introduces what problems we have encountered recently in the development of CCtalk, what technologies and strategies to solve these problems.

Topic 4: "Technical Fusion of Deep Learning and Video Special Effects"

[[224491]]

Wang Sheng, Director of R&D Technology of Tutu Tutu

Guest profile: 5 years of experience in deep learning, and has served Internet companies such as Meitu, Yufan, and Yan Jian. The research scope mainly focuses on mobile face recognition, face detection alignment and tracking. Currently, TuSdk is responsible for the product planning and implementation of facial recognition algorithms for various industries, and the person in charge of mobile unlocking project R&D.

Content introduction : With the popularity of live broadcast technology and deep learning technology, deploying deep learning on the mobile phone has become an increasingly common choice. FaceU's face stickers, Youtube's video background segmentation, and various special effects derived from it have made it more reasonable to design deep learning that can run efficiently on mobile phones a technical pain point. Here we share the application of deep learning in live broadcast through the optimization of a series of algorithms and the use of GPUs to accelerate the application of deep learning in live broadcast.

The event is currently registering for the hottest. Click "Read original text" at the bottom to register quickly !

Read the original link: http://www.huodongxing.com/event/6432925190011

About ZEGO

Jigou Technology was founded by Lin Youyao, former general manager of QQ in 2015. It received IDG investment in the A round. The core team comes from Tencent QQ, bringing together top voice and video talents from manufacturers such as YY and Huawei. Jigou ZEGO is committed to providing the world's clearest and most stable real-time voice and video cloud services to help corporate business innovation and change users' online communication methods. Jigou ZEGO has been deeply engaged in the fields of video live broadcast, video social networking, game voice, online baby grabbing and online education, and has won the trust of top manufacturers such as Yingke, Huajiao Live, Yibo, Himalaya FM, Momo Games, Freedom Battle 2, and Good Future.

<<: How to correctly implement line spacing and line height in iOS

>>: Apart from the price, the new iPad may not be as good as you think

How to build user portraits, 4 methods!

Blog

Li Min: How to create a complete version of a high-quality concentration course for children to improve learning efficiency

Blog

Qiu Ma said, Guangdiantong optimization basics - 5 steps to play with Tencent resources

The cost of acquiring customers on online lending platforms is over a thousand yuan, but the conversion rate is less than 1%.

Blog

How to build a membership system from 0 to 1?

Blog

Don't bend: Fighting nation boycotts iPhone 6

Blog

Recommend

The latest national maternity leave map is out! The provinces with the longest paternity leave for men are actually these provinces...

Recently, many places across the country have ann...

Tea polyphenols may cause liver toxicity. The EU has introduced new regulations to limit intake. Can we still drink more green tea?

First of all, green tea is a good thing. Green te...