OpenAI shared some of their progress in AI speech synthesis on its official website, announcing initial insights and results from a small-scale preview of a model called "Voice Engine." According to the report, the model uses text input and a single 15-second audio sample to generate natural speech that is very similar to the original speaker . It is worth noting that a small model with only a 15-second sample can generate emotional and realistic voices. As early as the end of 2022, OpenAI developed Voice Engine and used it to support preset voices in the text-to-speech API as well as ChatGPT voice and reading. Today, through some real-world cases, OpenAI shared some early applications of Voice Engine. For example, the Voice Engine was used to help restore the voice of a young patient who had lost his ability to speak fluently due to a vascular brain tumor. In addition, Voice Engine can also be used to provide reading assistance, translate content, provide support for people who cannot speak, etc. Copyright images in the gallery. Reprinting and using them may lead to copyright disputes. 1) Provide reading assistance to non-readers and children through natural-sounding and emotional voices These voices represent a wider range of speakers rather than canned voices. Age of Learning is an educational technology company that has been using Voice Engine to generate canned voice-over content. They are also using Voice Engine and GPT-4 to create real-time, personalized responses to interact with students. 2) Translate content such as videos and podcasts Voice Engine allows creators and businesses to communicate fluently with their voices to more people around the world. According to OpenAI, HeyGen is one of the early adopters in this regard. HeyGen is an AI visual storytelling platform that uses Voice Engine for video translation, translating the speaker's voice into multiple languages and reaching a global audience. When used for translation, Voice Engine retains the native accent of the original speaker: for example, generating English with an audio sample of a French speaker will produce speech with a French accent. 3) Provide support for the mute population Voice Engine can provide therapeutic applications for people with diseases that affect language, educational enhancements for people with learning needs, and more. Livox is an AI alternative communication application that provides support for assistive and alternative communication (AAC) devices to enable people with disabilities to communicate. Voice Engine is able to provide unique non-robotic voices in multiple languages for people who cannot speak. Users can choose the voice that best represents themselves, and for multilingual users, each spoken language can maintain a consistent voice. In addition, Voice Engine also reaches into the global community by improving basic service provision in remote areas. For example, Dimagi is developing tools for community health workers to provide various basic services such as "counseling for breastfeeding mothers." To help these workers improve their skills, Dimagi uses Voice Engine and GPT-4 to provide interactive feedback in each worker's primary language, including Swahili or more informal languages. OpenAI said that due to the potential for misuse of synthetic speech, they took a cautious and informed approach to a wider release, choosing to preview but not release the technology widely at this time. The terms they sign with these partners require explicit and informed consent from the original speakers and do not allow developers to create their own voices for individual users. These partners must also clearly disclose to the audience that the voice they hear is generated by artificial intelligence. Additionally, OpenAI has implemented a number of security measures, including watermarking to track the origin of any audio generated by Voice Engine, and actively monitoring its usage. OpenAI said they encourage the accelerated development and adoption of technologies that track the origin of audiovisual content in the future, so that people are always clear whether they are interacting with real people or artificial intelligence, and help the public understand the capabilities and limitations of artificial intelligence technology, including the possibility of deceptive content from artificial intelligence. References: https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices |
<<: If you want to lose 10 pounds before summer, start eating like this now!
Weibo, more and more people are gathering to watc...
The launch of Apple's iPad Pro has sparked a ...
1. Introduction to paid promotion business Relyin...
We often lament the importance of air and water t...
If you are "unfortunate" enough to oper...
According to the official measurement of the Chin...
What is the most valuable thing on the Internet? ...
Bananas are sweet, soft and smooth, and are loved...
Recently, I haven't found a job yet, so I'...
Recently, the new coronavirus epidemic caused by ...
Tik Tok has been in its fourth year and its produ...
Starting a business requires costs, and mini prog...
In our product operation practice, the definition...
In recent years, "true wireless" headph...
Recently, I read three books on marketing and pro...