Modern life is noisy in most situations. If you don't like the hustle and bustle around you, you can choose to wear noise-canceling headphones to block out the noisy sounds around you. One problem, however, is that current noise-canceling headphones indiscriminately filter out all sounds, including some that you actually want to hear . While Apple's second-generation AirPods Pro can automatically adjust the volume for the wearer — sensing when the wearer is talking, for example — they give them little control over who they listen to or when. Now, a new artificial intelligence (AI) technology may give birth to a pair of headphones that will revolutionize tradition - with just one look, the whole world will be filled with its voice. A research team from the University of Washington has developed an artificial intelligence headset system called Target Speech Hearing (TSH). The wearer only needs to look at the target speaker for 3-5 seconds to "lock on" the speaker, eliminating all other sounds in the environment and only listening to the voice of the "locked" speaker . The TSH system can operate normally even when the wearer is walking around in a noisy place and is no longer facing the speaker. " We usually think of AI today as just web-based chatbots that answer questions ," said Shyamnath Gollakota, the paper's corresponding author and a professor at the Paul G. Allen School of Computer Science & Engineering at the University of Washington. " But in this project, we developed AI that can change the wearer's auditory perception based on their preferences ." The research team said that the TSH system can not only listen to a certain person's voice, but also remove a certain person's voice. This is helpful in some cases, such as you want to filter out one person's disruptive speech while still hearing the speech of others. Previously, the research team had presented this research result at the ACM CHI Conference on Human Factors in Computing Systems, the most important international conference in the field of human-computer interaction. The code for this proof-of-concept device is currently available for others to use, but it has not yet been commercialized, and they are in talks to embed it in a popular brand of noise-canceling headphones. In future work, they hope to extend the TSH system to earbuds and hearing aids. The sound of being "locked" According to the paper, when using the TSH system, the wearer only needs to aim his head at the target speaker and then tap a button to complete the "lock". The work builds on the team’s previous research in semantic hearing, which allows users to select specific categories of sound they want to hear (like bird calls or voices) and cancel out other sounds in the environment. The sound waves of the "locked" speaker will reach the microphones on both sides of the headset at the same time. The headset sends the signal to the embedded computer, where the machine learning software begins to learn the vocal patterns of the "locked" speaker. The TSH system captures these sounds and plays them continuously to the wearer, even as they move around with the headphones. As the "targeted" person continues to speak, the system's ability to pay attention to their voice will improve, providing the system with more training data. They tested the system on 21 subjects, who rated the clarity of the "locked" sound on average nearly twice as high as that of the unfiltered audio. Shortcomings and Prospects However, this study also has some limitations. For example, current TSH systems can only "lock on" one speaker at a time , and can only lock on to the target speaker if there is no other louder voice in the same direction as the speaker. In future work, the research team hopes to expand the TSH system to support "locking" multiple target speakers at the same time. They proposed two possible methods: 1) Run a separate network instance for each speaker. The problem with this approach is that it requires more computing resources because each speaker requires an independent processing flow. 2) Train a network that can handle multiple speakers simultaneously, using some form of “aggregate multi-speaker embedding” that does not require running a separate instance for each speaker, but instead separates the speech of all speakers in one pass, making it more efficient to handle multiple speakers. Furthermore, human voice characteristics may change with factors such as aging, health status and emotional changes , which may cause the TSH system to be unable to recognize subtle differences in voice and thus be unable to "lock on" the target speaker. The research team says that wearers can use binaural hearables to capture a registration sample of the target speech before extracting the target speaker, so this factor may not change much in the short term. At the same time, the greater the similarity between the target speaker and the interfering speaker, the more difficult it is to completely eliminate the interfering speaker . To enhance the robustness of the system, multiple "locked" records at different time points can be used instead of just one. In addition, although the research team used synthetic data for training and was able to generalize to speakers not seen in the real world, indoor and outdoor environments, and support mobility, in actual applications, the model's generalization ability for different environments and speakers may need to be further verified and improved . Finally, they also explored some more efficient ways to "lock" the target speaker . For example, the target speaker can be supported to move, which will reduce the probability of another strong interfering speaker in the same direction; even in static scenes, the training network only focuses on the speaker closest or loudest in the direction the wearer is looking. Reference Links: https://dl.acm.org/doi/10.1145/3613904.3642057 https://www.washington.edu/news/2024/05/23/ai-headphones-noise-cancelling-target-speech-hearing/ |
>>: World Otter Day | What bad intentions could an otter have?
As a bidder, the most painful thing is that you a...
Original title: Hua Chunying tweeted: China has p...
In the past two years, community operation has be...
Since LeEco first proposed the LeEco ecosystem co...
Wu Wenjun at work (May 12, 1919 - May 7, 2017). I...
Over the years of work, I have worked on many pro...
The factors affecting the quotation of Taizhou wo...
Private WeChat appointment arrangements for Cheng...
As an operations staff, although you are not a pr...
At the 2016 Guangzhou Auto Show, SAIC Roewe relea...
On the afternoon of August 8, NetQin once again i...
One minute with the doctor, the postures are cons...
Facial recognition technology is now prevalent in...
I believe you have seen such a girl in many place...
This article was reviewed by Dr. Guo Xiaoqiang, a...