Speaking is the most basic ability and method of interpersonal communication, but there are many people in the world who "cannot express themselves". With aphasia, most commonly caused by stroke, their voices are lost, their needs are unheard, they suffer social isolation, and their silence is deafening. Everyone who has lost their ability to speak due to a stroke longs to regain the ability to communicate completely and naturally. Although paralysis cannot be cured worldwide, with the help of AI, paralyzed patients who have lost the ability to speak can now regain their voices and communicate with people in real time with rich expressions and movements. Author | Tower Editor | Sanyang This article was first published on HyperAI WeChat public platform~ Zweig once said, " The greatest luck in a person's life is to discover his mission in the middle of his life, when he is young and strong. " And what is man’s greatest misfortune? In my opinion, the greatest misfortune in a person's life is to suddenly lose all ability to speak and move at the prime of life - overnight, dreams, careers, and aspirations all become a bubble, and life is turned upside down. Ann is an unfortunate example. 30 years old, aphasia due to stroke One day in 2005, Ann, who had always been in good health, suddenly developed symptoms such as dizziness, slurred speech, quadriplegia and muscle weakness. She was diagnosed with brainstem infarction (what we commonly call "stroke"), accompanied by left vertebral artery dissection and basilar artery occlusion . This unexpected stroke left Ann with a byproduct called " locked-in syndrome " - people with this disease have all their senses and awareness, but cannot mobilize any muscles in the body. Patients can neither move nor speak independently, and some cannot even breathe. As the word "locked" literally reflects, the body that takes ordinary people through thousands of mountains and rivers has become a cage that seals the patient's soul. At that time, Ann was only 30 years old, married for 2 years and 2 months, her daughter was just 13 months old, and she was a math teacher at a high school in Canada. " Everything was taken away from me overnight. " Ann later used the device to slowly type this sentence on the computer. Ann, who participated in the study After years of physical therapy, Ann could breathe, move her head slightly, blink her eyes and speak a few words, but that was it. You should know that in normal life, the average person speaks at a speed of 160-200 words per minute. A 2007 study from the Department of Psychology at the University of Arizona showed that men speak an average of 15,669 words per day, and women speak an average of 16,215 words per day (one word corresponds to 1.5-2 Chinese characters on average). In a world where language is the main means of interpersonal communication, it is conceivable that many of Ann's needs, who are limited in expression, are silenced. What is lost with aphasia is not only the quality of life, but also personality and identity. And how many paralyzed aphasic people in the world are in the same situation as Ann? Paralyzed for 18 years, he speaks again Restoring the ability to communicate completely and naturally is the greatest desire of every person who has lost his voice due to paralysis. In today's highly developed science and technology, is there a way to use the power of technology to restore the ability of interpersonal communication to patients? have! Recently, a research team from the University of California, San Francisco and the University of California, Berkeley used AI to develop a new brain-computer technology that allowed Ann, who had suffered from aphasia for 18 years, to "speak" again, and generated vivid facial expressions based on the digital avatar, helping the patient to communicate with others in real time at a speed and quality consistent with normal social interactions. Ann uses a digital avatar to talk to people This is the first time in human history that speech and facial expressions have been synthesized from brain signals! Previous research by the UC team has shown that it is possible to decode language from the brain activity of paralyzed people, but only in the form of text output, and at a limited speed and vocabulary. This time they wanted to go a step further: to enable faster text communication with large vocabularies while also restoring the speech and facial movements associated with speaking. Based on machine learning and brain-computer interface technology, the research team achieved the following results, published in Nature on August 23, 2023: ► For text , the subjects' brain signals were decoded into text at a rate of 78 words per minute, with an average word error rate of 25%, which is more than 4 times faster than the communication devices currently used by the subjects (14 words/minute); ► For speech audio , brain signals are quickly synthesized into understandable and personalized sounds that are consistent with the subject's pre-injury voice; ►For facial digital avatars , virtual facial motion control for speech and non-speech communication gestures is implemented. Paper link: https://www.nature.com/articles/s41586-023-06443-4 You must be curious about how this groundbreaking miracle was achieved. Next, let's take a closer look at the paper and see how the researchers managed to bring the disease back to life. 1. Underlying logic: brain signals → speech + facial expressions The human brain outputs information through peripheral nerves and muscle tissue, while language ability is controlled by the "language center" in the cerebral cortex. The reason why stroke patients suffer from aphasia is that blood circulation is obstructed and the language area of the brain is damaged due to lack of oxygen and important nutrients, resulting in one or more language communication mechanisms not being able to function properly, resulting in language dysfunction. In response to this, a research team from the University of California, San Francisco and Berkeley designed a " multimodal speech neural prosthesis " that uses a large-scale, high-density cortical electroencephalogram (ECoG) to decode the text and audiovisual speech output represented by the vocal tract distributed throughout the sensory cortex (SMC). That is, it captures brain signals at the source and "translates" them into corresponding text, speech, and even facial expressions through technical means. Multimodal speech decoding in patients with vocal tract paralysis 2. Process and implementation: brain-computer interface + AI algorithm The first is physical means. The researchers implanted a high-density EEG array and a transcutaneous base connector through the dura mater on the parietal surface of the left hemisphere of Ann's brain, covering areas associated with speech production and speech perception. The array consists of 253 disc-shaped electrodes that intercept brain signals that would otherwise go to Ann's tongue, jaw, throat, and facial muscles. A cable plugs into a port on Ann's head, connecting the electrodes to a bank of computers. The electrode array was implanted in the language control area on the surface of the subject's cerebral cortex. The second is algorithm construction. To identify Ann’s unique brain speech signature, the research team spent several weeks working with her to train and evaluate a deep learning model. The researchers created a set of 1,024-word generic sentences based on the nltk Twitter corpus and the Cornell Film Corpus, and instructed Ann to speak silently at a natural pace. She silently chanted different phrases from her 1,024-word conversational vocabulary over and over until the computer recognized patterns of brain activity associated with those sounds. It is worth noting that this model does not train AI to recognize entire words, but instead creates a system to decode words from "phonemes" . For example, "Hello" contains four phonemes: "HH", "AH", "L" and "OW". Based on this approach, computers only need to learn 39 phonemes to decipher any English word, which not only improves accuracy but also increases speed by 3 times. Note: Phoneme is the smallest sound unit of a language, which can describe the pronunciation characteristics of speech, including the place of articulation, pronunciation method and vocal cord vibration. For example, the phonemes of an are composed of /ə/ and /n/. This process of phoneme decoding is similar to the process of babies learning to speak. According to the generally accepted view in the field of developmental linguistics, newborn babies can distinguish 800 phonemes in the world's languages. Preschool children may not understand the writing and meaning of words and sentences, but they can gradually learn to pronounce and understand language through the perception, distinction and imitation of phonemes. Finally, there is speech and facial expression synthesis. Now that the foundation has been laid, the next step is to make speech and facial expressions explicit, something researchers are solving through speech synthesis and digital avatars. For speech, the researchers developed a synthetic speech algorithm using recordings of Ann's voice before her stroke to make the digital avatar sound as much like her as possible. Facially, Ann's digital avatar was created using software developed by Speech Graphics and appears as an animation of a female face on the screen. The researchers customized the machine learning process so that the software coordinated with the signals from Ann's brain when she tried to speak, resulting in the jaw opening and closing, lips protruding and contracting, tongue moving up and down, and facial movements and gestures that convey happiness, sadness, and surprise. Ann is working with researchers on algorithm training Future Outlook “ Our goal is to restore a full, embodied form of communication that is the most natural way for us to converse with others, ” said Edward Chang, MD, chief of neurosurgery at UCSF. “The goal of combining audible speech with a live avatar is to bring the full range of human communication to life, which is much more than just language.” The next step for the research team is to create a wireless version that gets rid of the physical connection of the brain-computer interface , allowing paralyzed people to use this technology to freely control their personal mobile phones and computers, which will have a profound impact on their independence and social interaction. From voice assistants on mobile phones, electronic face-scanning payments to robotic arms in factories and sorting robots on production lines, AI is extending human limbs and senses and gradually penetrating into every aspect of our production and life. Researchers focus on the special group of people with paralysis and aphasia, using the power of AI to help them restore their natural communication ability, which is expected to promote communication between patients and their relatives and friends, expand their opportunities to regain interpersonal interaction, and ultimately improve the quality of life of patients. We are excited about this achievement and look forward to hearing more good news about how AI benefits humanity. Reference Links: [1] https://www.sciencedaily.com/releases/2023/08/230823122530.htm [2] http://mrw.so/6nWwSB This article was first published on HyperAI WeChat public platform~ |
<<: What is the magic of G219, China's most beautiful self-driving highway?
The online live streaming industry market has exp...
From the arcade N73, to the T9 flagship N95, to t...
With the rise of drinking water health issues, th...
Douyin short video sales tutorial, earning 100,00...
It has been half a year since Wumi ignited the an...
This article was reviewed by Dr. Tao Ning, Associ...
[Original article from 51CTO.com] A CTO who only ...
Google and Oracle are nearing the end of a years-...
Before I start sharing how to build a traffic cir...
There are three ultimate questions in philosophy:...
Foreign trade Google optimization, how to conduct...
The operations director who achieved 10 million ap...
The species calendar I wrote before was about the...
Training course video content introduction: This ...
[[158392]] Hold on, what we are talking about tod...