Why are natural language interaction tools that are more human-like the more likely they are to disappoint people?

Why are natural language interaction tools that are more human-like the more likely they are to disappoint people?

With Siri as a precedent, anthropomorphism has become a necessary capability for natural language interaction tools. Whether it is an AI voice assistant serving individual users, or smart customer service provided by enterprises, or even various home appliances with voice functions, they all need to create IP and personas, almost like becoming spirits.

Most of the time, we think that the anthropomorphism of natural language interaction tools can reduce the user's "uncanny valley effect" and make users prefer to communicate with them. However, the results of the latest research show that this may not be the case.

The Thousand Routines of Becoming Human

First, let’s take a look at the “thousands of routines” that are personified in natural language interaction tools.

The first step is to give yourself a harmless name.

We often say that if you pick up a small animal and give it a name, it will most likely become your pet. The same is true for AI. When a natural language interaction tool has a name, it is basically destined to go further and further on the road to becoming a spirit. The names of natural language interaction tools are usually "small", which makes them seem weak and harmless, and regardless of gender, politically correct.

The second step is to use speech generation technology to imitate human tone.

After having a name, you certainly can't use cold electronic voices anymore. Even the previous speech generation technology of real-person recording + rule matching is a bit rigid. At this time, neural network speech generation represented by Google WaveNet appeared. By capturing multiple features of real people's speaking methods, taking into account semantics, part of speech, grammar, context and other parameters, it finally generates a real-person speaking tone with pauses and thinking, just like Google Assistant.

The third step is to make the conversation more humane.

In the process of natural language interaction, speech generation needs to be based on text content. In addition to satisfying the anthropomorphism of the "speaking tone", the "speaking content" must also be more humane. At this time, the maturity of technologies such as semantic understanding, multi-round dialogue, and natural language generation becomes very important. For example, the full-duplex natural language interaction applied by Microsoft on Microsoft Xiaoice can achieve "listening and thinking" and "rhythm control"-understanding the user's intentions through the entire dialogue process, reducing the user's waiting time, and being able to actively trigger new topics to break the silence, and adjust the content and timing of the answer by itself. Such dialogue content is "displayed" through speech generation technology, which can be confused with the real thing, making people think that they are really talking to humans.

[[263973]]

The last step is to put on "human skin".

In addition to technology, some peripheral modes should be used to make natural language interaction tools more humanized. For example, design a cute cartoon image for them, add a few instructions to let them learn some cute and coquettish verbal expressions, and add some details to the interactive interface so that people don't realize that they are talking to machines.

With these steps, you can basically create a natural language interaction tool that "takes human form".

The more human, the cuter?

Managing expectations with natural language interaction tools

But one question we have never thought about is, in actual application, are natural language interaction tools really better the more humanized they are? Recently, the Media Effects Research Laboratory at Pennsylvania State University conducted such an experiment.

The researchers told the volunteers that they would be purchasing a digital camera on an e-commerce platform and would need to talk to online customer service for consultation. Behind these customer service systems are intelligent natural language interaction systems, but the researchers differentiated them in terms of humanization and responsiveness. Different groups of volunteers were exposed to different online customer service systems. Some directly told the other party that they were machine customer service during the conversation, some only displayed the content of the dialog box, and some "disguised" themselves as humans through real-life avatars and names.

At the same time, these intelligent customer service agents with different levels of anthropomorphism have different levels of response. Some can answer user questions quickly and accurately, while others cannot understand human language and evade the question.

[[263974]]

When the subjects were surveyed about their satisfaction after the interaction, the results were surprising.

In general logic, we would think that the higher the responsiveness of intelligent customer service during interaction, the higher people's satisfaction will naturally be. But the actual situation is that at the same level of responsiveness, the user's satisfaction is related to the degree of humanity of the intelligent customer service. For example, for the same interactive content, the experimenters who clearly know that the other party is a machine customer service will give an 80-point satisfaction rating, while those machine customer service disguised as humans can only get a 60-point satisfaction rating. The reason is that when machine customer service shows higher human characteristics, users' expectations of them will also increase, hoping that they can help them solve problems like humans. If they don't get the answers they want, their disappointment will be magnified.

In fact, we have the same feeling when we use natural language interaction ourselves. When voice assistants, intelligent customer service and other products cannot solve the problem and have to force themselves to be cute and tell jokes, our irritability index tends to rise sharply.

Ultimately, whether natural language interaction is humane or not is a question of "user expectation management". Sometimes over-raising user expectations can backfire.

It is easy to be a person, but difficult to be a tool

But an important trend we can see at present is that the development of the humanity and instrumentality of natural language interaction is uneven.

From the perspective of the difficulty of technological development, making natural language interaction tools closer to humans is much easier than making them more effective.

Whether it is Google's WaveNet or Microsoft's full-duplex natural language interaction, they are enough to make the pronunciation pattern, conversation rhythm and other details of natural language interaction as close to humans as possible. In the future, combined with the capabilities of computer vision and even robot manufacturing technology, we can create a conversationalist that is no different from humans.

[[263975]]

In fact, today we can see "AI speakers" that are visually humanized, such as AI anchors or Sophia launched by Harmony.

However, the ability of these natural language interactions to solve problems has not improved. Specifically, there is still a certain gap in the understanding of human corpora, especially the relatively unpopular corpora of minority languages, the elderly, children, etc.; the cognition of vocabulary in different fields is not comprehensive enough. Many times when it comes to some vertical industries, AI often falls into knowledge blind spots.

In this way, helping the "instrumentality" of natural language interaction catch up with "humanity" may become an industry trend for a long time in the future. For example, building knowledge graphs for various industry segments, accumulating vocabulary libraries, or collecting corpora of different dialects and languages ​​from different groups of people for AI training.

As technology continues to catch up, it is inevitable that people's expectations for natural language interaction tools will continue to increase. In order to avoid the "shortboard effect", we should perhaps devote more energy to pursuing things other than "human nature".

<<:  Three veterans in charge of iPhone design resigned, causing a major personnel shakeup in Apple's industrial design team

>>:  Google launches an app that helps hearing-impaired people communicate freely

Recommend

10 Growth Hack Case Studies!

Growth is an important indicator of success for m...

Why Android is better than iOS? The open and free Android is more powerful.

When buying an Android phone, users actually have...

Miaoshen Talk: The era of cross-platform development has arrived (again)

[[135054]] This article mainly wants to talk abou...

Be careful! Your phone is leaving you naked

Have you ever had this annoyance: when you search...

How to select products for live streaming sales? 7 practical methods!

With the growing popularity of short video platfo...

Three amazing technical experts I met during my programming career

I have been programming for more than ten years w...

How to make use of private domain traffic for brand promotion and marketing?

Through this article, you will be popularized wit...

Analysis of B station product operations!

From a UGC video platform that started out in the...

Bilibili’s product logic and user operation strategy!

Today I will continue to talk to you about how Bi...

Douyin operation strategy in e-commerce industry

1. Current status of e-commerce industry 1. Accor...

In the AI ​​era, will APP operations be eliminated?

As early as August 8, the official account of Chi...

Mid-Autumn Festival brand marketing promotion routine!

Mid-Autumn Festival, one of China's four majo...