Why are natural language interaction tools that are more human-like the more likely they are to disappoint people?

Why are natural language interaction tools that are more human-like the more likely they are to disappoint people?

With Siri as a precedent, anthropomorphism has become a necessary capability for natural language interaction tools. Whether it is an AI voice assistant serving individual users, or smart customer service provided by enterprises, or even various home appliances with voice functions, they all need to create IP and personas, almost like becoming spirits.

Most of the time, we think that the anthropomorphism of natural language interaction tools can reduce the user's "uncanny valley effect" and make users prefer to communicate with them. However, the results of the latest research show that this may not be the case.

The Thousand Routines of Becoming Human

First, let’s take a look at the “thousands of routines” that are personified in natural language interaction tools.

The first step is to give yourself a harmless name.

We often say that if you pick up a small animal and give it a name, it will most likely become your pet. The same is true for AI. When a natural language interaction tool has a name, it is basically destined to go further and further on the road to becoming a spirit. The names of natural language interaction tools are usually "small", which makes them seem weak and harmless, and regardless of gender, politically correct.

The second step is to use speech generation technology to imitate human tone.

After having a name, you certainly can't use cold electronic voices anymore. Even the previous speech generation technology of real-person recording + rule matching is a bit rigid. At this time, neural network speech generation represented by Google WaveNet appeared. By capturing multiple features of real people's speaking methods, taking into account semantics, part of speech, grammar, context and other parameters, it finally generates a real-person speaking tone with pauses and thinking, just like Google Assistant.

The third step is to make the conversation more humane.

In the process of natural language interaction, speech generation needs to be based on text content. In addition to satisfying the anthropomorphism of the "speaking tone", the "speaking content" must also be more humane. At this time, the maturity of technologies such as semantic understanding, multi-round dialogue, and natural language generation becomes very important. For example, the full-duplex natural language interaction applied by Microsoft on Microsoft Xiaoice can achieve "listening and thinking" and "rhythm control"-understanding the user's intentions through the entire dialogue process, reducing the user's waiting time, and being able to actively trigger new topics to break the silence, and adjust the content and timing of the answer by itself. Such dialogue content is "displayed" through speech generation technology, which can be confused with the real thing, making people think that they are really talking to humans.

[[263973]]

The last step is to put on "human skin".

In addition to technology, some peripheral modes should be used to make natural language interaction tools more humanized. For example, design a cute cartoon image for them, add a few instructions to let them learn some cute and coquettish verbal expressions, and add some details to the interactive interface so that people don't realize that they are talking to machines.

With these steps, you can basically create a natural language interaction tool that "takes human form".

The more human, the cuter?

Managing expectations with natural language interaction tools

But one question we have never thought about is, in actual application, are natural language interaction tools really better the more humanized they are? Recently, the Media Effects Research Laboratory at Pennsylvania State University conducted such an experiment.

The researchers told the volunteers that they would be purchasing a digital camera on an e-commerce platform and would need to talk to online customer service for consultation. Behind these customer service systems are intelligent natural language interaction systems, but the researchers differentiated them in terms of humanization and responsiveness. Different groups of volunteers were exposed to different online customer service systems. Some directly told the other party that they were machine customer service during the conversation, some only displayed the content of the dialog box, and some "disguised" themselves as humans through real-life avatars and names.

At the same time, these intelligent customer service agents with different levels of anthropomorphism have different levels of response. Some can answer user questions quickly and accurately, while others cannot understand human language and evade the question.

[[263974]]

When the subjects were surveyed about their satisfaction after the interaction, the results were surprising.

In general logic, we would think that the higher the responsiveness of intelligent customer service during interaction, the higher people's satisfaction will naturally be. But the actual situation is that at the same level of responsiveness, the user's satisfaction is related to the degree of humanity of the intelligent customer service. For example, for the same interactive content, the experimenters who clearly know that the other party is a machine customer service will give an 80-point satisfaction rating, while those machine customer service disguised as humans can only get a 60-point satisfaction rating. The reason is that when machine customer service shows higher human characteristics, users' expectations of them will also increase, hoping that they can help them solve problems like humans. If they don't get the answers they want, their disappointment will be magnified.

In fact, we have the same feeling when we use natural language interaction ourselves. When voice assistants, intelligent customer service and other products cannot solve the problem and have to force themselves to be cute and tell jokes, our irritability index tends to rise sharply.

Ultimately, whether natural language interaction is humane or not is a question of "user expectation management". Sometimes over-raising user expectations can backfire.

It is easy to be a person, but difficult to be a tool

But an important trend we can see at present is that the development of the humanity and instrumentality of natural language interaction is uneven.

From the perspective of the difficulty of technological development, making natural language interaction tools closer to humans is much easier than making them more effective.

Whether it is Google's WaveNet or Microsoft's full-duplex natural language interaction, they are enough to make the pronunciation pattern, conversation rhythm and other details of natural language interaction as close to humans as possible. In the future, combined with the capabilities of computer vision and even robot manufacturing technology, we can create a conversationalist that is no different from humans.

[[263975]]

In fact, today we can see "AI speakers" that are visually humanized, such as AI anchors or Sophia launched by Harmony.

However, the ability of these natural language interactions to solve problems has not improved. Specifically, there is still a certain gap in the understanding of human corpora, especially the relatively unpopular corpora of minority languages, the elderly, children, etc.; the cognition of vocabulary in different fields is not comprehensive enough. Many times when it comes to some vertical industries, AI often falls into knowledge blind spots.

In this way, helping the "instrumentality" of natural language interaction catch up with "humanity" may become an industry trend for a long time in the future. For example, building knowledge graphs for various industry segments, accumulating vocabulary libraries, or collecting corpora of different dialects and languages ​​from different groups of people for AI training.

As technology continues to catch up, it is inevitable that people's expectations for natural language interaction tools will continue to increase. In order to avoid the "shortboard effect", we should perhaps devote more energy to pursuing things other than "human nature".

<<:  Three veterans in charge of iPhone design resigned, causing a major personnel shakeup in Apple's industrial design team

>>:  Google launches an app that helps hearing-impaired people communicate freely

Recommend

These new media operation tools are awesome!

As an Internet person, in addition to being able ...

Deep Blue S05 is launched, the strongest rival of Geely Galaxy E5 is here

On October 20, Deep Blue S05 was officially launc...

Analysis of the most effective Taobao promotion methods for Taobao operations

Many sellers currently have a problem, that is, t...

iQiyi product operation analysis!

As a video operator platform, iQiyi has developed...

IDC: Enterprise AI spending will exceed $30 billion in 2027

IDC’s recent forecast suggests that by 2027, ente...

Twitter APP is now compatible with iOS 10’s rich notification features

As one of the important new features of iOS 10, t...

From 1996 to 2013: How is the once popular website doing now?

From 1996 to 2013: How is the once popular websit...

How should an interactive landing page be designed? Teach you some tricks!

September and October are the peak decoration sea...

my country's first! The "Noah's Ark" of the seed industry is here

On October 20, China's first provincial compr...

How much does it cost to develop a teaching material mini program in Liling?

What is the price for developing the Liling textb...

In order to remove the parasites on my body, I choose to "stick" with sharks

Brother Shark, come and stick with me! (This stic...

The new interface of iOS 19 is exposed, it’s amazing!

As we enter 2025, the update cycle of iOS 18 is a...