He studies large models at Westlake University, hoping to make AI understand human happiness and sadness

He studies large models at Westlake University, hoping to make AI understand human happiness and sadness

There are two iceberg model pictures of conversations between men and women circulating on the Internet: what the girl expresses is just the tip of the iceberg of all her thoughts, and the boy can only understand this small part; while what the boy means is just the tip of the iceberg, but the girl thinks there are undercurrents beneath the water.

Like most straight men, the inarticulate Lan Zhenzhong is often trapped by the difference in thinking between men and women. When communicating with his wife, he instinctively wants to help her solve problems, but the results are often not ideal. "Later, I realized that most of the time, all she needs is listening and empathy. She is capable of solving the problem itself."

Can AI help with empathy and improving communication efficiency?

While studying for his doctorate at Carnegie Mellon, Lan Zhenzhong began to think about how to make AI popular. His wife was a classmate at Carnegie Mellon and was also a top student. "She did better than me in school."

After graduation, he joined Google and witnessed and participated in wave after wave of artificial intelligence. He could never forget the idea he had when he was studying for his doctorate.

He firmly believes that AI has both IQ and EQ, just like the AI ​​assistant Samantha in the Hollywood movie "Her" or the robot Baymax in "Big Hero 6".

Dabai is a warm robot

A large model that can understand the meaning behind words

Xihu Xinchen's office is located in Yunchuang Gallium Valley, less than 500 meters away from the Yungu Campus of Westlake University. It is also the pilot park of the Westlake University/Westlake Laboratory's achievement transformation base.

After returning to China from Google, Lan Zhenzhong first joined Westlake University and then founded Westlake Xinchen, where he led a group of young people with an average age of 25 to plunge into research on how to make AI better understand human emotions and intentions. Among them were AI technical talents from companies such as Google, Meta, and Amazon, as well as talented students majoring in psychological counseling from New York University, Emory University, and Capital Medical University.

The general end-to-end speech model "Xinchen Lingo", which was first launched at the 2024 Bund Conference on September 5, is their latest achievement. Lan Zhenzhong also won the first Ant InTech Technology Award. This purely public welfare award of Ant Group is awarded to young Chinese scholars who have played a key role in promoting scientific research progress in the field of computer science.

You are playing Black Myth: Wukong: "I have reached the Big Head Monk. I have fought this level more than 10 times."

Friend A: "Wow, even the big-headed monk can hold you back, this level is really touching."

You are feeling down: "I am a little unhappy. I had a little friction with my colleagues today."

Friend B: "Hey, the friction at work is really unpleasant. What happened? Was it a misunderstanding at work or a miscommunication?"

Playing with bad friend A and confidant sister B are some of the application scenarios shown in the demonstration of Lingo's real-time interaction with people.

"Compared with other AI, the end-to-end Lingo can completely simulate human behavior, emotions and reaction patterns, and can be very human-like. " Lan Zhenzhong said that users can interrupt it at any time, or change character settings (voice, professional role) to communicate.

What is end-to-end?

Some of the AI ​​voice tools we have experienced before rely on TTS. This is a technology that converts written text into spoken speech. It allows machines to speak and solves the problem of voice output, but does not involve intent recognition and dialogue understanding. The advantage of the end-to-end voice big model lies in ultra-low latency and controllability. It can hear other information besides text, such as emotions, tone, environmental noise, etc., to help the big model understand the voice content more comprehensively.

"Whether you want it to turn up the volume or imitate a specific timbre, it is relatively easier to control." Lan Zhenzhong explained that the end-to-end speech model integrates multiple links such as speech recognition, natural language processing, intent recognition, dialogue management, and speech synthesis, realizing a complete interactive process from speech input to speech feedback.

Based on this underlying capability, various smart devices combined with Lingo can read and respond to the real intention behind the user's words. For example, if you hear "the balcony floor is a bit dirty", the sweeping robot will take the initiative to clean it; if you hear "the sun is a bit dazzling", the smart curtain controller will automatically adjust the blackout curtain.

From machine vision to natural language processing

Lan Zhenzhong is from Chaozhou, Guangdong Province. He was born in 1986 in a family of teachers. In 2007, when he was studying software engineering and statistics at Sun Yat-sen University, he began to get involved in artificial intelligence. In 2012, he was admitted to the Language Technologies Institute (LTI) of the School of Computer Science at Carnegie Mellon University, the top computer school in the United States, specializing in computer vision and multimedia analysis. In 2018, he joined Google AI Research Institute and was responsible for the research and development of multiple computer vision and natural language processing projects. The research and development results have been applied to products such as Google News and Google Assistant...

Looking through his resume, it seems difficult to find out directly why he "changed career path" from vision to language and specialized in AI emotional companionship.

For an i person, socializing is a drain. Although he can clearly classify the communication in daily life into three categories according to the purpose, namely problem solving, emotional orientation and relationship building, he also knows that he is good at the first one and needs help.

The more direct reason was that when he was about to graduate with his doctorate, he learned that a classmate had ended his young life due to depression.

This incident deeply touched him. If external forces intervene in time, even if it is just psychological companionship and primary services, will those who are seriously troubled by psychological problems feel a little warmth and beauty in this world again?

In 2020, Lan Zhenzhong resigned from Google and returned to China to join Westlake University as the head of the deep learning laboratory and doctoral supervisor. He wanted to build a conversational robot that could accompany and assist in psychological counseling anytime and anywhere, and language processing is the core of the conversational system.

In July of the following year, West Lake Xinchen was born. That year, Lan Zhenzhong was also selected by MIT as one of the “35 Technological Innovators Under 35 Years Old” in the Asia-Pacific region.

Image source: Westlake University official website

Looking back on this experience, Lan Zhenzhong feels "very lucky": when he was at Google in 2018, he happened to encounter a change in the machine learning paradigm - from supervised learning to self-supervised learning. In the era of self-supervised learning, there is no need for manual labeling. Machines can complete learning by reading large amounts of text and images, which greatly enhances their ability to understand language and visual content.

I prefer to work in the laboratory to develop technology

Xihu Xinchen’s first product is the free psychological counseling platform “Liaohui Xiaotian”.

Lan Zhenzhong and his team consulted with psychology experts, psychiatrists, and other doctors, and also interviewed patients. After accumulating a large amount of corpus and studying real psychological counseling cases, coupled with self-developed emotional computing and empathy modules, Xiaotian can listen and communicate with emotion.

On the second anniversary of the company's establishment, Xihu Xinchen released a multimodal general model "Xihu Big Model", which has the capabilities of long-term memory, emotional perception and active chat. Based on this, Xiaotian has now reached the level of an intermediate psychological counselor.

A month ago, the AI ​​psychologist "Shiyi Xiaoxi" was launched in cooperation with Hangzhou First People's Hospital. In addition to online psychological counseling, it can also provide professional report interpretation based on the medical knowledge base.

Jinke Tom Cat, which invested twice in Westlake Xinchen last year, recently used the capabilities of Xinchen Lingo to upgrade the "Talking Tom Cat" to a "Chatting Tom Cat" robot...

The scenarios for technology implementation are constantly expanding.

On one hand, he is an entrepreneur, and on the other hand, he is doing academic research. He switches between the two roles every day. Lan Zhenzhong seems to prefer to work in the laboratory to deepen his research on technology. "My main focus now is also here. Technology is iterating too fast. Looking back, there are not many works that can really leave something behind." Therefore, he wants to continue to do something that touches the "essence" and can promote the progress of the discipline.

Lan Zhenzhong admires He Kaiming, who is also an AI scientist, and believes that his work is very "essential". The ResNet proposed by the latter is a popular architecture in the field of computer vision.

In March last year, Lan Zhenzhong widely posted a "hero invitation" on WeChat Moments, recruiting a CEO for the company to be responsible for the transformation of scientific research results, integrating resources, grasping the market, and getting customers...

The person who now holds this position is Xingchen, who used to work for Alibaba Group and participated in the establishment of the Lakeside Innovation and Research Center as a member of the founding team.

Lan Zhenzhong can concentrate more on scientific research. A group photo is pinned on his WeChat Moments. In the photo, the family of four smiles brightly. Family is always the most important thing. In his spare time, he loves sports, running, yoga, basketball, swimming... He started practicing yoga when he was a student, "which helps to relax and relieve stress."

Dialogue with the “New Youth”

Try to launch AI mental health service hotline around October

9,000 Light Years: How did you and your team come up with the idea of ​​developing a large speech model? What was the biggest challenge or difficulty you encountered during the process?

Lan Zhenzhong: We started with text, but soon found that it was far from enough. Text would lose a lot of information, and in psychological counseling, many people prefer to communicate over the phone rather than typing. Typing often requires pre-organization of language, and this process itself may increase the burden of thinking. When people are tired or emotionally unstable, they will be more eager to vent their emotions through language without scruples.

Last year, seven or eight people in the team formed a project team and started training the voice model. How to obtain data, how to ensure the stability of pre-training, how to adjust the voice... these are all problems. The most difficult part is the integration with the "brain", that is, how to convert the text model to the voice model. In addition, Lingo is a content producer, and it must ensure the safety of the interaction to prevent it from outputting inappropriate words.

Jiuqian Guangnian: In addition to psychological counseling, in what other scenarios can Xinchen Lingo be used?

Lan Zhenzhong: It can provide universal basic voice services for various fields, such as daily sales, education and training, medical consultation, smart device interaction, child companionship, etc. Around October, we will try to launch an AI mental health service phone.

AI is like a smart tree hole

Nine Thousand Light Years: Human emotions are so complex, can AI have both IQ and EQ and provide enough emotional value?

Lan Zhenzhong: You can think of AI as a tool that can simulate everything in the world. It has even surpassed humans in many aspects. As long as enough data is provided, AI can perform unlimited imitation learning.

Many lines in "Wulin Wai Zhuan" have hidden meanings. We tried to use some of these sentences to test the ability of the big model to understand Chinese metaphors, and concluded that it can basically reach the human level.

AI also has unique advantages in providing emotional support, such as its infinite patience. Human listening often requires a lot of energy, but AI can provide companionship tirelessly.

Nine Thousand Light Years: Are there some scenarios where people need real emotional support, and if AI is provided at this time, the other party may feel disappointed?

Lan Zhenzhong: It is true that in some situations, emotional exchanges between people are irreplaceable. AI is more suitable for scenarios where a person wants to be alone or has some troubles that are not suitable to be confided to others. It is like an intelligent tree hole where you can confide in one person or interact with others to get catharsis and comfort.

There is a complete set of assessment and intervention processes behind Xiaotian

Nine Thousand Light Years: Xiaotian participated in the Future Life Festival of Express two years ago. In the early version, some of its reply scripts still needed the guidance of a psychological counselor. Is there still human behind it now?

Lan Zhenzhong: After iterations, Xiaotian is now a 100% autonomous program. After connecting to Lingo, it can also make voice and phone calls. So far this year, it has served 100,000 registered users. You can find it on Alipay, WeChat, and college apps such as Zhejiang University and University of Science and Technology.

9,000 Light Years: Are there any users who have serious psychological problems or even suicidal tendencies among those who chat with Xiaotian? How will you handle this situation?

Lan Zhenzhong: Xiaotian is good at psychological companionship and support. It will evaluate the effect of the chat at any time and then decide the direction of guidance. When it finds that the visitor may have serious psychological problems or mental disorders, it will recommend that the other party be transferred to a relevant hospital for diagnosis and treatment; once it is tested that there is a tendency to commit suicide, it will push the visitor to the suicide intervention hotline. If the other party expresses it many times, manual intervention will be made. We have a complete set of evaluation and intervention processes.

Starting a business is a bit like rowing a leaky boat

Nine Thousand Light Years: What insights can you share from your entrepreneurial experience over the past few years?

Lan Zhenzhong: We often say that people are born to pursue a sense of order. In an environment without order, they will feel uneasy and uncertain. The process of starting a business is a bit like rowing a leaky boat. Only by rowing fast enough can you reach your destination safely.

Studying for a doctorate is also accompanied by uncertainty, but many people can overcome this challenge. Starting a business is even more difficult, as it is always "burning money", which requires entrepreneurs to learn to find certainty in a constantly changing environment.

Nine Thousand Light Years: Can you tell us what your next research focus will be?

Lan Zhenzhong: It’s still about the “brain”, how to accurately capture human emotions, what words to use to respond, etc. In fact, this has always been the focus.

<<:  Why does cutting an onion make you cry? It's not because of the lyrics "peeling off my heart layer by layer"...

>>:  Parasites that cats avoid may be transformed into medical weapons by scientists

Recommend

Asia-Pacific region can take six measures to build climate-resilient green roads

Sustainable road development balances economic gr...

Product operation: born from pain points, died from growth!

Recently, I discovered a surprising phenomenon: m...

How should product operations build a user recall system?

A product is like a traffic pool, with fresh bloo...

People from Yunnan must be at the top of the food chain!

This graphic and text are jointly produced by Yun...

iOS componentization exploration: creation of private libraries

iOS componentization is basically based on cocoap...

Precise traffic generation techniques for Toutiao today!

one Speaking of Toutiao, I believe everyone is fa...

Guixianbei University Sisters Xueba Parents Class for All Ages

Introduction to the resources of the all-age pare...