In 1882, a sudden illness took away all the colors, sounds, and movements in Helen Keller's life when she was just 19 months old. Like her life, the family in this small town of Tuscumbia entered a dark and silent world. It was not until six years later that Teacher Sullivan appeared in Helen's life that things took a turn for the better. With Sullivan's help, Helen Keller not only learned to read Braille, but also wrote the sentence "Only the deaf will cherish hearing" when she became an adult. Almost everyone is familiar with this story in elementary school, because Chinese teachers always assign an essay titled "Thoughts after reading "Three Days to See"". However, not all hearing-impaired people are as lucky as Helen Keller and have Sullivan's unique help. Currently, among the 466 million people in the world who are deaf or hard of hearing, only a few can afford to purchase manual transcription services such as CART in the United States, Palantypist in the United Kingdom, or STTR in other countries to communicate with others in real time. More people are having silent conversations. To change this, on February 4, 2019, Google launched a new App-based beta version of Live Transcribe, an application that instantly transcribes real-world speech and converts it into real-time subtitles using just the phone’s microphone. In March, it was officially launched on the Play Store. Behind the Design: Real-Time Transcription with ARS Dimitri, a Russian who lost his hearing at the age of one, is now a scientist at Google. He has difficulty speaking, and when he told the clerk "I had a good day," there were noticeable pauses between words. However, he doesn't need any human help now. Live Transcribe on the phone is synchronizing every word the clerk says in real time on a black background: What would you like to drink? The small circle in the upper right corner keeps changing size to indicate the noise level of the surrounding environment. "Live Transcribe's transcription delay is less than 200 milliseconds, which is close to real-time," said Sagar Salva, the product manager of the app, to Geek Park. This delay is like the change in the direction of 50HZ alternating current, which is difficult to detect and ensures the interactivity of the communication between the two parties. At the same time, according to Salva, it can support more than 70 languages and dialects, covering 80% of the world's population. For bilingual families, the app also has a button for quickly switching between the two languages. Two years ago, when he joined Google's AI research group with 30 years of experience in speech recognition, this product had not yet been born. Every time he had a meeting, he had to book the CART service in advance and rely on the captioner to virtually join the meeting to type the voice conversation on the screen for transcription. Salva and his colleagues imagined how to reduce his preparation process by using Google's current technology.
Now, from Mountain View to Taipei, this model has been continuously optimized and eventually evolved into Live Transcribe. The development of a new app in a short period of time is still due to Google's own technological accumulation. According to Salva, the core technology behind Live Transcribe is the Automatic Speech Recognition (ASR) technology that Google has been using in various voice search applications. ASR mainly includes four parts: feature extraction, acoustic model, language model, and dictionary and decoding. In short, its task is to accurately and efficiently convert voice signals into text information. At present, the highly accurate real-time subtitle transcription on YouTube is supported by this technology from Google. Live transcribe: Everyone's Sullivan teacher But the development process was not smooth sailing. Salva said that one is the choice of actual usage scenarios for users. They can choose to display the transcription results on hardware devices such as computers, tablets, or mobile phones, or they can have bolder designs. For example, he also tried to use a small projector to print the transcribed subtitles on Salva's T-shirt. But for people with hearing loss, the return on labor and income are relatively low. According to the Statistical Bulletin on the Development of the Disabled in 2018 released by the China Disabled Persons’ Federation, the number of certified disabled people employed in urban and rural areas nationwide is 9.484 million, of which 2.546 million are in flexible employment (including community and home employment) and 4.801 million are engaged in agricultural planting and breeding, accounting for almost the vast majority. The per capita disposable income of disabled families is also far behind the social average. Considering these reasons, Salva and his team finally chose smartphones among all smart devices: "There are already 2 billion people using Android phones around the world, and this hardware platform choice is low-cost."
In order to allow low-end mobile phones to use Live Transcribe, Salva and his team chose to use two different neural networks behind the app. One is to run a neural network on the device, mainly to complete the work of sound classification, such as the sound of a baby crying, glass breaking, etc. In the real-time transcription of these sounds, they can be quickly classified and analyzed. The second is a cloud-based neural network model used to transcribe speech into text. "The fact that it's in the cloud, on Google's servers, using machine learning and these neural network models for speech recognition is really important, which means that this product can run on some low-end phones," Salva said. "When it's running, it only consumes about 4MB of memory. We've optimized the battery usage, so it can last about 10 hours on a single charge." What Google wants to do is to make this free app truly affordable for everyone with hearing loss. In fact, this idea has already been traced. There is a trace: 20% of innovation projects In March 2016, Google launched Accessibility Scanner, an automated tool that evaluates apps and suggests ways to improve them for users with visual and hearing impairments, for example, by enlarging small touch targets or changing contrast. In August 2018, Google released a new open specification aimed at jumpstarting the development of hearing aids that can run on Android phones using Bluetooth Low Energy (LE), with sufficiently low latency and minimal impact on battery life. The source of these evolutionary traces was an accident. Salva said that Live transcribe was originally just a 20% innovation project. 20% is a famous encouragement culture within Google: employees are encouraged to use 20% of their time for innovation. For example, in a 5-day work week, employees can use 1 day to study other projects of interest outside of their job duties. If these innovations are further proven, they will have the opportunity to be promoted and improved. For example, the well-known Google News and Gmail are both products born from innovation. When Live transcribe was born, it was well received by other hearing-impaired colleagues within the Google team, and thus gradually made its way to the Play Store. During the development process, in order to reduce the impact of ambient noise and make the transcription better, Google also launched another related app: Sound Amplifier. This app can use a sound amplifier to make the audio clearer and easier to hear. You can use the sound amplifier with wired headphones on your Android smartphone to filter, enhance and amplify the sounds in the environment. The World Health Organization estimates that by 2055, the number of people with hearing loss in the world will reach 900 million. Although it is uncertain whether medical treatment will be able to stop people from suffering from hearing loss by then, it is certain that at least today, as Google hopes, Live Transcribe and Sound Amplifier are helping hundreds of millions of deaf people communicate more clearly. |
>>: Apple releases new iOS 12.3 beta: Improved speed and stability
Some people make short videos, but the traffic is...
The famous American technology website BI publish...
Recently, many friends have asked me if I know ho...
1 Graphics and text editor ◆135 Editor www.135edi...
As people's income and living standards gradu...
This article was reviewed by Zhang Zhaomin, Maste...
[Mobile software: Bo Ke Yuan] Physicists at Ames ...
Teacher Zhuang Wen's Go video tutorial 40 les...
Ling'er Investment Research Diary "T+0 Pr...
Author: Liu Huajiang, deputy chief physician, Fir...
Recently, US President Biden and Hong Kong celebr...
Recently, the China Automobile Dealers Associatio...
How to control traffic under oCPC smart bidding? ...
The most enviable physique is definitely the one ...