Even if humans make mistakes, AI Agent can learn quickly!

To teach AI agents new skills, such as opening a kitchen cabinet, researchers often use reinforcement learning, a trial-and-error process in which an AI agent is rewarded for taking the right actions to get closer to a goal.

In most cases, human experts need to carefully design a reward function to encourage the AI agent to explore more. As the AI agent explores and tries, human experts need to constantly update this reward function. This process is time-consuming and inefficient, especially when the task is complex and has many steps, and it is very difficult to scale .

Recently, a research team from the Massachusetts Institute of Technology (MIT), Harvard University, and the University of Washington developed a new reinforcement learning method that does not rely on reward functions designed by experts, but instead uses crowdsourced feedback from many non-expert users to guide AI agents to achieve learning goals.

The new approach enables AI agents to learn more quickly despite the often-erroneous nature of crowdsourced data, unlike other approaches that try to use non-expert feedback, which often makes other approaches fail.

In addition, this new approach supports asynchronous feedback collection, allowing non-expert users around the world to participate in the process of teaching the AI agent .

“One of the most time-consuming and challenging parts of designing an AI agent is setting the reward function,” said Pulkit Agrawal, assistant professor of electrical engineering and computer science at MIT and director of the Improbable AI Lab. “Currently, reward functions are designed primarily by experts, which is difficult to scale if we want robots to learn a variety of tasks. Our research proposes a solution to design reward functions through crowdsourcing and to involve non-experts in providing effective feedback, thereby expanding the scope of robot learning .”

In the future, this approach could help robots quickly learn specific tasks in people’s homes, without requiring people to demonstrate each task in person. The robot could explore independently, guided by crowdsourced, non-expert feedback.

“In our approach, the reward function does not directly tell the AI agent how to complete the task, but instead guides the direction it should explore. Therefore, even if there is a certain degree of inaccuracy and noise in human supervision, the AI agent can still conduct effective exploration and thus learn better,” explained Marcel Torne, a research assistant at Improbable AI Lab and one of the main authors of the paper.

Complete the task even if the answer received is incorrect

One way to collect user feedback for reinforcement learning is to show the user photos of two states reached by the AI agent and ask which state is closer to the goal. For example, imagine a robot with a goal to open a kitchen cabinet, one photo might show it successfully opening the cabinet, and another might show it opening the microwave. The user needs to choose the photo that represents the better state.

Some early approaches have attempted to use this crowdsourced form of binary feedback to optimize the reward function that the AI agent uses to learn the task. The problem is that non-experts are prone to making mistakes, which can cause the reward function to become so confusing that the AI agent may not be able to achieve its goal .

“In reality, the AI agent will take the reward function too seriously and try to fit it perfectly,” Torne said. “So instead of optimizing the reward function directly, we use it to guide where the robot should explore.”

The research team divided the process into two independent parts, each driven by its own algorithm. They named this new reinforcement learning method Human Guided Exploration (HuGE) .

On the one hand , the target selection algorithm continuously receives crowdsourced human feedback and updates it. This feedback is not used as a reward function, but is used to guide the AI agent's exploration direction. In short, the guidance provided by non-professional users is like "bread crumbs" scattered along the way, gradually guiding the AI agent to approach the goal .

On the other hand , the AI agent also conducts exploration on its own, a process that is self-supervised and guided by a goal selector. It collects images or videos of the actions it attempts, which are then sent to humans to update the goal selector.

Doing so helps narrow the area the AI agent needs to explore, guiding it to promising areas that are closer to the goal. But if there is no feedback, or the feedback is delayed, the AI agent will continue to learn on its own, albeit at a slower pace. This approach allows feedback to be collected less frequently and asynchronously.

“ The exploration process can be autonomous and continuous because it’s constantly exploring and learning new things. As it receives more accurate signals, it can explore in a more specific way. They can operate at their own pace, ” Torne added.

Because the feedback only slightly guides the AI agent's behavior, the AI agent will eventually learn how to complete the task even if the answer provided by the user is incorrect.

Faster learning

The research team tested the approach across a range of tasks in both simulated and real-world environments.

For example, in a simulated environment , they used HuGE to efficiently learn a series of complex actions, such as stacking blocks in a specific order or navigating a maze.

In real-world tests , they used HuGE to train a robot arm to draw the letter “U” and to pick up and place objects. The tests involved data from 109 non-expert users in 13 countries across three continents.

HuGE enables AI agents to learn to complete tasks faster than other methods, both in the real world and in simulations.

Furthermore, the non-expert crowdsourced data performed better than the synthetic data that was created and annotated . Annotating 30 images or videos took less than two minutes for a non-expert user. “This shows the huge potential of this approach for expanded applications,” Torne added.

In a related study, the team presented at a recent conference on robotic learning how they improved HuGE so that the AI agent could not only learn to complete a task, but also autonomously reset the environment to continue learning. For example, if the AI agent learned to open a cupboard, this approach could also teach it to close it.

“Now we can enable it to learn completely autonomously without human intervention,” he said.

The research team also emphasizes that in this and other learning methods, it is crucial to ensure that the AI agent is aligned with human values.

In the future, the research team plans to further improve HuGE to enable AI agents to learn in more ways, such as through natural language and physical interactions with robots. They also expressed interest in applying this method to training multiple AI agents at the same time.

Reference Links:

https://news.mit.edu/2023/method-uses-crowdsourced-feedback-help-train-robots-1127

https://arxiv.org/pdf/2307.11049.pdf

https://human-guided-exploration.github.io/HuGE/

<<: "Looking up at the sky" through time and space: the past and present of the planetarium

>>: It is the "holy grail" of contemporary medicine and the "hidden weapon" of deep-sea assassins.

The sincere love of the grass and the rolling green shade (Part 1) - You "fern" would never have thought that the "fern scholars" are also so "involuted"

Blog

How does ancient DNA technology “find gold in the waves”?

BYD Formula Leopard joins hands with Huawei Qiankun Intelligent Driving to open up cooperation and accelerate the development of China's intelligent driving technology

Blog

"Zunjie" is here, and the market value of Jianghuai, which is "attached" to Huawei, has soared to over 100 billion yuan. This time it really took off

In the past two days, the stock price of JAC Moto...

16 industries, 29 cases, just read this article for information flow advertising!

The final effect of an advertisement is often det...

Even if humans make mistakes, AI Agent can learn quickly!

The sincere love of the grass and the rolling green shade (Part 1) - You "fern" would never have thought that the "fern scholars" are also so "involuted"

How does ancient DNA technology “find gold in the waves”?

2020 housing price forecast for third- and fourth-tier cities. Are houses worth buying in third- and fourth-tier cities?

How can we reduce the uninstall rate of APP users?

Is iOS jailbreaking really necessary?

Canada geese: They really exist, they caused a disaster, and they were cleaned up

Encouragement: Sharing the experience of failure in the first entrepreneurial project

When will the 2020 epidemic end and when will the 2020 epidemic be lifted?

Kuaishou fans-boosting website platform tells you how to increase your fans? How to actually do it?

BYD Formula Leopard joins hands with Huawei Qiankun Intelligent Driving to open up cooperation and accelerate the development of China's intelligent driving technology

Recommend

Three companies compete: China Telecom is struggling, China Unicom is losing

Xiaokele mobile phone review: a thousand-yuan camera magic tool (hardware imaging)

Apple in the post-Steve Jobs era: More users but fewer fans

Tips for creating short video scripts

Beijing Health Code was attacked! What are the tactics of cyber attacks?

The prerequisite for SSD price to stop increasing is to abandon mechanical hard drives?

Faraday Future uses virtual reality to create FFZERO1 concept car

[Popular Science: Chasing Dreams among the Stars in China] Long Photo | Huang Zhen: Dream Chaser on the “Cliff”

Want to quickly acquire customers? Here are 2 ways to promote your app to others

Everyone is talking about communities, how do we build them? How to operate a community? How to monetize the community?

66 old black-and-white movies before liberation (1941-1949)

Whaley Smart Projector F1 Experience: Upgrading Living Room Entertainment at an Entry-Level TV Price

A female Trump appeared in France and wanted to force the car companies to return to their factories

"Zunjie" is here, and the market value of Jianghuai, which is "attached" to Huawei, has soared to over 100 billion yuan. This time it really took off

16 industries, 29 cases, just read this article for information flow advertising!