With just 15 seconds of audio, AI can help aphasics “regain their voice”?

With just 15 seconds of audio, AI can help aphasics “regain their voice”?

OpenAI shared some of their progress in AI speech synthesis on its official website, announcing initial insights and results from a small-scale preview of a model called "Voice Engine."

According to the report, the model uses text input and a single 15-second audio sample to generate natural speech that is very similar to the original speaker . It is worth noting that a small model with only a 15-second sample can generate emotional and realistic voices.

As early as the end of 2022, OpenAI developed Voice Engine and used it to support preset voices in the text-to-speech API as well as ChatGPT voice and reading.

Today, through some real-world cases, OpenAI shared some early applications of Voice Engine.

For example, the Voice Engine was used to help restore the voice of a young patient who had lost his ability to speak fluently due to a vascular brain tumor.

In addition, Voice Engine can also be used to provide reading assistance, translate content, provide support for people who cannot speak, etc.

Copyright images in the gallery. Reprinting and using them may lead to copyright disputes.

1) Provide reading assistance to non-readers and children through natural-sounding and emotional voices

These voices represent a wider range of speakers rather than canned voices. Age of Learning is an educational technology company that has been using Voice Engine to generate canned voice-over content. They are also using Voice Engine and GPT-4 to create real-time, personalized responses to interact with students.

2) Translate content such as videos and podcasts

Voice Engine allows creators and businesses to communicate fluently with their voices to more people around the world. According to OpenAI, HeyGen is one of the early adopters in this regard. HeyGen is an AI visual storytelling platform that uses Voice Engine for video translation, translating the speaker's voice into multiple languages ​​and reaching a global audience. When used for translation, Voice Engine retains the native accent of the original speaker: for example, generating English with an audio sample of a French speaker will produce speech with a French accent.

3) Provide support for the mute population

Voice Engine can provide therapeutic applications for people with diseases that affect language, educational enhancements for people with learning needs, and more. Livox is an AI alternative communication application that provides support for assistive and alternative communication (AAC) devices to enable people with disabilities to communicate. Voice Engine is able to provide unique non-robotic voices in multiple languages ​​for people who cannot speak. Users can choose the voice that best represents themselves, and for multilingual users, each spoken language can maintain a consistent voice. In addition, Voice Engine also reaches into the global community by improving basic service provision in remote areas. For example, Dimagi is developing tools for community health workers to provide various basic services such as "counseling for breastfeeding mothers." To help these workers improve their skills, Dimagi uses Voice Engine and GPT-4 to provide interactive feedback in each worker's primary language, including Swahili or more informal languages.

OpenAI said that due to the potential for misuse of synthetic speech, they took a cautious and informed approach to a wider release, choosing to preview but not release the technology widely at this time.

The terms they sign with these partners require explicit and informed consent from the original speakers and do not allow developers to create their own voices for individual users. These partners must also clearly disclose to the audience that the voice they hear is generated by artificial intelligence.

Additionally, OpenAI has implemented a number of security measures, including watermarking to track the origin of any audio generated by Voice Engine, and actively monitoring its usage.

OpenAI said they encourage the accelerated development and adoption of technologies that track the origin of audiovisual content in the future, so that people are always clear whether they are interacting with real people or artificial intelligence, and help the public understand the capabilities and limitations of artificial intelligence technology, including the possibility of deceptive content from artificial intelligence.

References:

https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices

<<:  If you want to lose 10 pounds before summer, start eating like this now!

>>:  What kind of tea is Matcha? Why was it so popular that the emperor of the Song Dynasty personally promoted it?

Recommend

Weibo operation skills: How to harvest Weibo traffic?

Weibo, more and more people are gathering to watc...

iPad Pro or MacBook? If you are undecided, please read this

The launch of Apple's iPad Pro has sparked a ...

Introduction to Huawei App Market paid advertising promotion service!

1. Introduction to paid promotion business Relyin...

Why are some rocks as old as the Earth, while others may be just born?

We often lament the importance of air and water t...

Android's four major components Service

Recently, I haven't found a job yet, so I'...

Short video operation: How to become a "profitable" vlog blogger?

Tik Tok has been in its fourth year and its produ...

Promotion tips: How to define new users?

In our product operation practice, the definition...