With just 15 seconds of audio, AI can help aphasics “regain their voice”?

OpenAI shared some of their progress in AI speech synthesis on its official website, announcing initial insights and results from a small-scale preview of a model called "Voice Engine."

According to the report, the model uses text input and a single 15-second audio sample to generate natural speech that is very similar to the original speaker . It is worth noting that a small model with only a 15-second sample can generate emotional and realistic voices.

As early as the end of 2022, OpenAI developed Voice Engine and used it to support preset voices in the text-to-speech API as well as ChatGPT voice and reading.

Today, through some real-world cases, OpenAI shared some early applications of Voice Engine.

For example, the Voice Engine was used to help restore the voice of a young patient who had lost his ability to speak fluently due to a vascular brain tumor.

In addition, Voice Engine can also be used to provide reading assistance, translate content, provide support for people who cannot speak, etc.

1) Provide reading assistance to non-readers and children through natural-sounding and emotional voices

These voices represent a wider range of speakers rather than canned voices. Age of Learning is an educational technology company that has been using Voice Engine to generate canned voice-over content. They are also using Voice Engine and GPT-4 to create real-time, personalized responses to interact with students.

2) Translate content such as videos and podcasts

Voice Engine allows creators and businesses to communicate fluently with their voices to more people around the world. According to OpenAI, HeyGen is one of the early adopters in this regard. HeyGen is an AI visual storytelling platform that uses Voice Engine for video translation, translating the speaker's voice into multiple languages and reaching a global audience. When used for translation, Voice Engine retains the native accent of the original speaker: for example, generating English with an audio sample of a French speaker will produce speech with a French accent.

3) Provide support for the mute population

Voice Engine can provide therapeutic applications for people with diseases that affect language, educational enhancements for people with learning needs, and more. Livox is an AI alternative communication application that provides support for assistive and alternative communication (AAC) devices to enable people with disabilities to communicate. Voice Engine is able to provide unique non-robotic voices in multiple languages for people who cannot speak. Users can choose the voice that best represents themselves, and for multilingual users, each spoken language can maintain a consistent voice. In addition, Voice Engine also reaches into the global community by improving basic service provision in remote areas. For example, Dimagi is developing tools for community health workers to provide various basic services such as "counseling for breastfeeding mothers." To help these workers improve their skills, Dimagi uses Voice Engine and GPT-4 to provide interactive feedback in each worker's primary language, including Swahili or more informal languages.

OpenAI said that due to the potential for misuse of synthetic speech, they took a cautious and informed approach to a wider release, choosing to preview but not release the technology widely at this time.

The terms they sign with these partners require explicit and informed consent from the original speakers and do not allow developers to create their own voices for individual users. These partners must also clearly disclose to the audience that the voice they hear is generated by artificial intelligence.

Additionally, OpenAI has implemented a number of security measures, including watermarking to track the origin of any audio generated by Voice Engine, and actively monitoring its usage.

OpenAI said they encourage the accelerated development and adoption of technologies that track the origin of audiovisual content in the future, so that people are always clear whether they are interacting with real people or artificial intelligence, and help the public understand the capabilities and limitations of artificial intelligence technology, including the possibility of deceptive content from artificial intelligence.

References:

https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices

<<: If you want to lose 10 pounds before summer, start eating like this now!

>>: What kind of tea is Matcha? Why was it so popular that the emperor of the Song Dynasty personally promoted it?

Apple: If you accidentally drop your iPhone XR, the cost of repair could be as high as 73% of the price of a new phone

With just 15 seconds of audio, AI can help aphasics “regain their voice”?

Apple: If you accidentally drop your iPhone XR, the cost of repair could be as high as 73% of the price of a new phone

Dai Xu's resume: What are the benefits of search engine optimization? Is enterprise SEO necessary?

If Hangzhou were a super long long long long scroll!

Strange radio signals coming from the center of the Milky Way, are they sent by aliens? The truth is…

Essential: 126 tools and 15 promotion channels for a complete event planning

Heroes return, just in time! How long does it take for astronauts to recover after returning home?

Are humans the only ones who use tools? Crow: Don't be too confident...

Liandi Commercial launches new mobile payment terminal M37Q

Tencent WeChat Enterprise Account starts public beta testing to provide mobile application access

Understanding the RxJava threading model

Recommend

Why Apple and Google came together? The story behind the development of the contact tracing API

The classic "flash" is reborn in "Glitter Pac-Man" with a new experience

The "icebreaker" on the "Xuelong" is China's first female seafarer to cross the Arctic Ocean

Methods and key points of website promotion. What are the key points of website promotion?

What does it mean to see 100 million butterflies flying together? You will know when you go to Jinping County

How does the "fat drama" in the body affect our health?

Why didn’t QQ die out like MSN under the strong impact of WeChat?

This "national treasure" can truly reproduce all earthquake activities recorded by humans!

Tik Tok live broadcast 1 yuan flash sale, low-cost daily income of 100,000+ gameplay [Video Course]

Can’t use your phone during thunderstorms? Don’t be fooled by rumors!

He brought movable type printing to the screen, allowing Chinese characters to "surf" the Internet

8 golden marketing ideas, all in use!

iOS 13: More system apps and components written in Swift

Methodology! How operators can take advantage of hot topics to plan events!

The difference between primary and advanced operations: one trick can achieve twice the result with half the effort