To teach AI agents new skills, such as opening a kitchen cabinet, researchers often use reinforcement learning, a trial-and-error process in which an AI agent is rewarded for taking the right actions to get closer to a goal. In most cases, human experts need to carefully design a reward function to encourage the AI agent to explore more. As the AI agent explores and tries, human experts need to constantly update this reward function. This process is time-consuming and inefficient, especially when the task is complex and has many steps, and it is very difficult to scale . Recently, a research team from the Massachusetts Institute of Technology (MIT), Harvard University, and the University of Washington developed a new reinforcement learning method that does not rely on reward functions designed by experts, but instead uses crowdsourced feedback from many non-expert users to guide AI agents to achieve learning goals. The new approach enables AI agents to learn more quickly despite the often-erroneous nature of crowdsourced data, unlike other approaches that try to use non-expert feedback, which often makes other approaches fail. In addition, this new approach supports asynchronous feedback collection, allowing non-expert users around the world to participate in the process of teaching the AI agent . “One of the most time-consuming and challenging parts of designing an AI agent is setting the reward function,” said Pulkit Agrawal, assistant professor of electrical engineering and computer science at MIT and director of the Improbable AI Lab. “Currently, reward functions are designed primarily by experts, which is difficult to scale if we want robots to learn a variety of tasks. Our research proposes a solution to design reward functions through crowdsourcing and to involve non-experts in providing effective feedback, thereby expanding the scope of robot learning .” In the future, this approach could help robots quickly learn specific tasks in people’s homes, without requiring people to demonstrate each task in person. The robot could explore independently, guided by crowdsourced, non-expert feedback. “In our approach, the reward function does not directly tell the AI agent how to complete the task, but instead guides the direction it should explore. Therefore, even if there is a certain degree of inaccuracy and noise in human supervision, the AI agent can still conduct effective exploration and thus learn better,” explained Marcel Torne, a research assistant at Improbable AI Lab and one of the main authors of the paper. Complete the task even if the answer received is incorrect One way to collect user feedback for reinforcement learning is to show the user photos of two states reached by the AI agent and ask which state is closer to the goal. For example, imagine a robot with a goal to open a kitchen cabinet, one photo might show it successfully opening the cabinet, and another might show it opening the microwave. The user needs to choose the photo that represents the better state. Some early approaches have attempted to use this crowdsourced form of binary feedback to optimize the reward function that the AI agent uses to learn the task. The problem is that non-experts are prone to making mistakes, which can cause the reward function to become so confusing that the AI agent may not be able to achieve its goal . “In reality, the AI agent will take the reward function too seriously and try to fit it perfectly,” Torne said. “So instead of optimizing the reward function directly, we use it to guide where the robot should explore.” The research team divided the process into two independent parts, each driven by its own algorithm. They named this new reinforcement learning method Human Guided Exploration (HuGE) . On the one hand , the target selection algorithm continuously receives crowdsourced human feedback and updates it. This feedback is not used as a reward function, but is used to guide the AI agent's exploration direction. In short, the guidance provided by non-professional users is like "bread crumbs" scattered along the way, gradually guiding the AI agent to approach the goal . On the other hand , the AI agent also conducts exploration on its own, a process that is self-supervised and guided by a goal selector. It collects images or videos of the actions it attempts, which are then sent to humans to update the goal selector. Doing so helps narrow the area the AI agent needs to explore, guiding it to promising areas that are closer to the goal. But if there is no feedback, or the feedback is delayed, the AI agent will continue to learn on its own, albeit at a slower pace. This approach allows feedback to be collected less frequently and asynchronously. “ The exploration process can be autonomous and continuous because it’s constantly exploring and learning new things. As it receives more accurate signals, it can explore in a more specific way. They can operate at their own pace, ” Torne added. Because the feedback only slightly guides the AI agent's behavior, the AI agent will eventually learn how to complete the task even if the answer provided by the user is incorrect. Faster learning The research team tested the approach across a range of tasks in both simulated and real-world environments. For example, in a simulated environment , they used HuGE to efficiently learn a series of complex actions, such as stacking blocks in a specific order or navigating a maze. In real-world tests , they used HuGE to train a robot arm to draw the letter “U” and to pick up and place objects. The tests involved data from 109 non-expert users in 13 countries across three continents. HuGE enables AI agents to learn to complete tasks faster than other methods, both in the real world and in simulations. Furthermore, the non-expert crowdsourced data performed better than the synthetic data that was created and annotated . Annotating 30 images or videos took less than two minutes for a non-expert user. “This shows the huge potential of this approach for expanded applications,” Torne added. In a related study, the team presented at a recent conference on robotic learning how they improved HuGE so that the AI agent could not only learn to complete a task, but also autonomously reset the environment to continue learning. For example, if the AI agent learned to open a cupboard, this approach could also teach it to close it. “Now we can enable it to learn completely autonomously without human intervention,” he said. The research team also emphasizes that in this and other learning methods, it is crucial to ensure that the AI agent is aligned with human values. In the future, the research team plans to further improve HuGE to enable AI agents to learn in more ways, such as through natural language and physical interactions with robots. They also expressed interest in applying this method to training multiple AI agents at the same time. Reference Links: https://news.mit.edu/2023/method-uses-crowdsourced-feedback-help-train-robots-1127 https://arxiv.org/pdf/2307.11049.pdf https://human-guided-exploration.github.io/HuGE/ |
<<: "Looking up at the sky" through time and space: the past and present of the planetarium
>>: It is the "holy grail" of contemporary medicine and the "hidden weapon" of deep-sea assassins.
In recent years, with the development of domestic...
According to local media reports in the United St...
I believe that every time there is a more intuiti...
Recently, many friends have been discussing Julia...
How much does it cost to be a Jixi agent for a wa...
1. For families with children studying, placing t...
Many users who have used the updated version 6.7....
On September 24, a commemorative event for the 11...
Zbrush course advanced animation full process case...
Many of the earliest Chinese students studying ab...
iResearch predicts that in the next two years, in...
This is a very common scenario. When you take on ...
On December 13, 2018, Baidu Smart Mini Program Op...
On the day JWT was merged with digital marketing ...