How to make voice interaction more natural? Master these 6 key knowledge points first!

How to make voice interaction more natural? Master these 6 key knowledge points first!

I recently read a few very good articles about robots, and I thought about translating them while writing down my own thoughts. I immerse myself in the robot platform, multi-round scenarios, and various parsers in my daily work, and I need to be stimulated from different angles, and perhaps some new ideas will come out.

[[284133]]

△ Photo by Franck V. on Unsplash

Today’s article is from Anna Prist’s Medium post “How to make your Chatbot Sound Natural”

Let’s first summarize the six points that Anna mentioned when designing robot dialogues.

  • Context (the robot needs to understand the context during the conversation)
  • Personality (robots need to have their own personality)
  • Concise (the robot's wording needs to be concise and clear)
  • Flexibility (need to take into account the diversity of user expressions)
  • Naturalness (using natural expressions in human conversation, such as polite expressions)
  • Initiative (lead the conversation, don’t let it die)

We are so used to the rapid innovation of technology that we can’t even imagine a future without it. As we move forward, interactive devices and interactive design are constantly improving. Thanks to those novels and movies, we know how to interact with machines—we can use voice commands, gestures, and virtual screens, just like Tom Cruise did in the movie:

(You can search "Minority Report's gesture-based user interface" on YouTube to watch it)

In daily interactions, you can use interaction methods like touch, voice, and gestures, which are easy for us and do not require learning. The word "natural" is used because interaction is a basic behavior of our human beings. From the first day of our lives, we naturally interact with everything around us, try to grab or move things, try to speak and communicate. These interaction methods will also be natural in human-computer interaction.

Bill Buxton, chief researcher at Microsoft, once said that voice user interface may be the most natural user interface, especially when driving a car. Obviously, when you put your hands on the steering wheel and your eyes are focused on the road ahead, a lot of information can be transmitted to you through voice, which becomes the most effective way of communication (interaction) in the current scenario. Through the advancement of technology, we can talk and interact with machines.

Voice is a common human skill, so it can be assumed that your users already have it. The next challenge for VUI developers is to create conversations/skills/behaviors and train chatbots/virtual assistants to communicate and make them useful.

This challenge is quite daunting because for machines to understand our intentions they must also connect and understand the context of the conversation. To sound natural, it should also have personality, etc. Below we have listed some tips that can be used to create chatbots and virtual assistants.

Context

As humans, we use context so naturally that we don't even have to think about it. We know how to talk to different people and in different places. We use different tones and different ways of speaking to our children, parents, friends, and coworkers. We can be loud and forthright at home, but in public, we have to keep up appearances and be mindful of our tone and words.

Chatbots and virtual assistants do not have this kind of scenario-based knowledge and awareness. That’s why we need to mention “context”. Some basic data information, such as the user’s query records/answers, information obtained after the user’s authorization, information expressed by the user, etc. Don’t ask about things that you (the robot) already know, and don’t continue to provide novice guidance to experienced users.

personality

When a chatbot or virtual assistant has a personality, it sounds natural. For example, Alexa is very funny and has opinions on various things. Even its opinions and preferences vary depending on the country. For example, asking it what kind of beer it likes in the United States is different from asking it in Germany. When Amazon's developers started creating Alexa, they just wanted it not to sound like an emotionless machine. However, they didn't expect so many people to fall in love with Alexa's personality. And Alexa's personality also increases trust in user interactions. This principle limits the ability to automatically generate responses to a certain extent, but it is crucial to the user experience.

concise

Short words can reduce cognitive load, save time, and sound more natural. Shorten the text to show the really important information, and omit facts and instructions that the user already knows. If your robot has a display, you can also put some information on the screen to summarize or hide it, etc.

flexibility

You must anticipate that users will change information at any time during the conversation, and even use a variety of different expressions to answer your questions.

Natural

The robot's speech must be natural, and it is necessary to avoid repetition and official language. Use implicit confirmation and active monitoring techniques as much as possible to tell users the key information and content you have obtained. And don't forget those polite expressions, such as goodbye, thank you, please, etc.

Proactive

In order to avoid the situation where the user does not know what to do next, the way out (next step) needs to be taken into consideration in the monitoring dialog box, such as conducting the dialogue through questions or guiding marks, providing relevant button instructions, etc.

Although conversation design is limited by the level of technology development, using these techniques can help to have a relaxed and natural conversation to a certain extent. This field is still relatively new, and we all need to learn from constant trial and error, so don't be afraid to make mistakes.

Remember, good conversations are natural conversations.

The above is the translation content

other

Next, I would like to talk about a few more points.

About robot personality

When we talk to robots, there are generally four different types of conversations involved: open-domain chat, task-driven conversation, question and answer (FAQ), and recommendation.

But often, these different types of support come from different teams. Different teams assign different functions to the robot. When the robot talks to the user, the user will obviously feel unnatural. Let's think back to when you are chatting with your friend, if the other party suddenly becomes your friend's boyfriend/girlfriend and is typing and talking, you can usually feel it.

About flexibility

The expression of human language is really too flexible and rich. The same word can represent different meanings in different contexts and even different tones. This really increases the difficulty for robots to understand human language.

About the scene

For the same function, humans have different requirements in different scenarios. For example, this is how I felt when I interacted with my Tmall Genie recently.

Sometimes I sleep late, around 2 a.m., and I ask "Tmall Genie, set an alarm for 8 a.m.", and the Tmall Genie will answer me at the original volume. Yes, if I have listened to music happily and loudly during the day, I will be frightened by the Tmall Genie's loud voice (I have been frightened many times).

So I would lower the volume again and again. However, in the morning, at about 7:59, Tmall Genie would suddenly say "Your alarm is about to go off" at a normal volume (even though I had already lowered the volume), and I would be suddenly awakened by this sentence...

So for a function like setting an alarm, we hope it can be more natural, considerate and smarter in different scenarios.

Well, that’s all for today.

<<:  Eight version updates in two months, what problems did iOS 13 encounter?

>>:  QQ’s anxieties are all written into WeChat mini-programs

Recommend

Choosing and cooking pork in this way ensures it is healthy and delicious

Meat is an indispensable delicacy on many people&...

iQiyi and Youku: The secret of attracting millions of users a day

The eternal topic - the pressure of user growth A...

They are the ones who understand God the most.

Everyone is the protagonist of life and the prota...

Hong Kong version iPhone 6 briefly activated in Zhongguancun store

Faced with the rise of e-commerce companies such ...

Modify the default font globally, which can also be done through reflection

[[205199]] sequence Using custom fonts on Android...

Cailianshe Selected VIP Column – All-Weather Investment Trends 2022

Cailianshe Selected VIP Column – All-Weather Inve...

What can become popular on Tik Tok?

1. Grasp the best time for hot events Friends who...

Silicon Valley Big Data Project Data Warehouse 5.0 Preview Edition

Silicon Valley Big Data Project Data Warehouse 5....

Review of the "User Growth Fission" experiment!

Activity review framework: 1. Background (What is...