I believe most people are familiar with voice assistants. Many people have also had conversations with Siri, the voice assistant in iOS, whether they are just playing with it or really need its help. It's not difficult for Siri to understand what you say, but if you really want to have a conversation with it, you will definitely feel weird. Regardless of whether it can answer your questions correctly, the voice it uses to reply to you will make you feel that you are not chatting with a person. Indeed, in the matter of speech recognition, the best companies at home and abroad have been able to achieve an accuracy rate of about 95%. However, in terms of speech generation, almost no company can make the robot speak the same as human speech. Even for some simple phrases, you can tell whether it is machine-generated or real-person. But as more and more people use voice interaction, how to make the computer sound more human has become a big challenge facing many software companies and programmers. According to the New York Times, IBM spent 18 months at the turn of the century getting its Watson robot to speak, but despite its intelligence, Watson's ability to speak was poor because it didn't sound like a human voice at all.
Michael Picheny, senior manager of IBM Labs. Image from The New York Times Nowadays, computer voices are all synthesized by machines (except for some weather forecasts and navigation prompts which are completely recorded manually). The real voice database used to synthesize the final voice is usually very large. The database contains the real pronunciation of a certain word, the pronunciation of the word in different tones, and even the partial pronunciation of the word. A voice actor usually needs at least 10 hours to complete the entry of a voice database. Although the voice database is already very large, it is still difficult to synthesize speech close to real people. The biggest difficulty is to make the synthesized voice have human emotions. Alan Black, a computer scientist at the Language Technology Institute of Carnegie Mellon University, told the New York Times that they have no way to tell the speech synthesizer that this passage should be read with emotion. Of course, designers often emphasize that they do not want to use synthetic voice to deceive people into thinking it is real voice. But they still hope that the voice interaction between machines and people can be more natural and more like communication between people. In fact, if the pronunciation of a machine is too close to that of a real person, it will make people feel uncomfortable. In 1970, Japanese robotics scientist Masahiro Mori published an article titled "The Uncanny Valley", the core of which is that when robots are too similar to humans, even the slightest flaw in the robot will make people feel uneasy. According to Masahiro Mori's hypothesis, as the degree of anthropomorphism of human objects increases, humans' emotional response to them shows an increase-decrease-increase curve. The uncanny valley is when the robot reaches a "close to human" similarity, and human favorability suddenly drops to the range of disgust. "Active humanoids" have a greater range of changes than "static humanoids". Image from Wikipedia ToyTalk is a company that makes human voices for children's toys. Its CEO Brian Langner said that when a machine can do some things right, people will think it can do everything right. So in his products, he will let the machine make some mistakes intentionally. After all, he makes toys, and there is nothing wrong with making some mistakes to make people laugh. The problem now is that after so many scientists’ efforts, we don’t need to worry about the arrival of the “uncanny valley” when it comes to synthesized speech. In order to make Watson "speak properly", IBM recruited 25 voice actors. After a lot of experiments and adjustments, they finally synthesized a voice that sounded more comfortable - although people could still clearly tell that it was not a real person speaking. If voice interaction is to develop rapidly, synthetic speech must be more comfortable for people to hear. Otherwise, this kind of interaction can only be described as voice input and machine execution, and there is no real communication between humans and machines. |
<<: Apple's business is declining? The opposite may be true
Yibin Embroidery Mini Program investment promotio...
The launch of mini programs has brought convenien...
Yuan Hai's "Case Study: MUJI: Concept-dr...
Retention of new users is a very important part o...
In the past two years, information flow advertisi...
How to find cooperation channels with anchors who...
This article mainly introduces how to make scroll...
Before bidding analysis, you should first pay att...
In-depth understanding of viewport in mobile fron...
The story of Columbus's discovery of the New ...
Recently, many businesses have just come into con...
With the fading of traffic dividends in recent ye...
Huawei is struggling to replace Google apps on it...
What is Baidu search promotion Baidu search promo...
Beijing time, May 21 morning news, American techn...