In just a few minutes, this AI agent can learn human expert behavior

In just a few minutes, this AI agent can learn human expert behavior

It only takes a few minutes to successfully imitate expert behavior and remember all learned knowledge . The AI ​​Agent developed by Google DeepMind was published in a Nature journal.

It is reported that in 3D simulation, the agent is able to imitate experts in real time in tasks it has seen for the first time, and reliably acquire knowledge from human partners in real time from a third-person perspective.

Although the agent has never seen a human before, it can quickly learn from humans and AI experts in a variety of challenging navigation problems. For example, it can navigate complex terrain with a large number of obstacles.

The related research paper, titled “Learning few-shot imitation as cultural transmission”, has been published in Nature Communications, a subsidiary of Nature.

The research team believes that the results of this study are a proof-of-concept for the rapid dissemination of knowledge through embodied AI and the first step towards the evolution of an open culture of human-AI interaction .

In addition, AI practitioners can draw inspiration from human social learning to build embodied AI agents that adapt to current human partners and properly protect privacy . In addition, AI agents with social learning capabilities may also provide new modeling tools for studying the development of human cultural capabilities.

Possess real-time cultural communication capabilities

Cultural transmission is a universal skill that enables people to acquire and use information in real time in social situations with a high degree of accuracy and memory. In human society, cultural evolution enables skills, tools, and knowledge to be passed down from generation to generation, and they are constantly accumulated and improved in the process.

In this work, the research team successfully generated an AI agent with real-time cultural communication capabilities by applying the agent environment co-adaptation method .

As shown above, to achieve this goal, they introduced a virtual 3D mission space, each of which contains procedurally generated terrain, obstacles, and target balls .

In each task, the AI ​​agent needs to visit the goal balls in a specific loop order to get a reward, and this order is randomly determined at the beginning of the task. However, the AI ​​agent does not know the correct order, so it must figure it out through experiments or learning from experts. However, this task space is designed to be complex, and the difficulty of navigation can be changed by adjusting parameters such as the size of the world, the number of obstacles, the ruggedness of the terrain, and the number of goal balls.

Through carefully designed experiments, the researchers found that the emergence of cultural transmission in AI agents requires a minimal and sufficient set of training elements, named MEDAL-ADR , including function approximation, memory, expert co-participants, expert loss, attention bias towards experts, and automatic domain randomization.

Figure|MEDAL-ADR elements

Memory is implemented through LSTM networks, expert co-participants are hard-coded robots, and automatic domain randomization helps train AI agents to exhibit better behavior in a variety of tasks.

The clever combination of these components forms a powerful AI agent whose cultural transmission capabilities excel in three aspects: recall, generalization, and fidelity.

Recall assesses the agent’s ability to copy demonstrations without the presence of an expert, generalization measures whether the agent is able to culturally transfer on unknown tasks, and fidelity calculates how consistent the agent’s choices are with those of an expert demonstrator.

Most strikingly, the neurons in the "brain" of this AI agent showed strong explanatory power, specifically responsible for encoding social information and goal states. This approach not only enables the AI ​​agent to generalize beyond the training distribution, but also to recall demonstrations in a single context after the expert leaves, providing more possibilities for practical application scenarios, especially when human data collection is difficult, tasks vary, and privacy is critical.

Some limitations

Although the MEDAL-ADR method proposed in this study can enable AI agents to adapt to diverse cultural environments in open learning.

However, the research team also pointed out some limitations in the evaluation method.

First, the study did not test cultural transmission from multiple individuals, but rather selected a single participant within the research team. Therefore, the study cannot make statistically significant claims about robustness across populations.

Second, navigation tasks have certain limitations on the diversity of plausible human behaviors. To gain a deeper understanding of universal cultural transmission, research is needed on tasks with a wider range and depth of strategies.

Finally, the researchers did not clearly distinguish whether the trained agent had memorized the geographical path and whether it had memorized the correct order of the spheres.

Is MEDAL-ADR more general outside of the GoalCycle3D task space? The answer is probably a qualified "no".

GoalCycle3D is a large, procedurally generated task space that acts like a navigational representative of a broader class of tasks involving activities that require repeated sequences of strategic choices, such as cooking, navigation, and problem solving.

However, this method requires some environmental conditions, including visibility, exit, and program generation of experts. If the approximate conditions cannot be created in a certain environment, then the method cannot be applied.

In addition, the researchers do not believe that the MEDAL-ADR method is a direct model of the development of human cultural transmission. However, they encourage future researchers to conduct more experimental demonstrations, such as comparing the MEDAL-ADR model with the behavior of children or non-human animals at different stages, and studying the cultural accumulation of humans and AI in laboratory settings. Such empirical research is expected to deepen the understanding of issues related to cultural transmission, meta-learning, and open-ended learning.

The research team said they look forward to future interdisciplinary exchanges in the fields of AI and cultural evolutionary psychology.

Paper link:

https://www.nature.com/articles/s41467-023-42875-2

Author: Yan Yimi

Editor: Academic

<<:  Are people around you having fever and coughing? Common respiratory infections include these →

>>:  It is said that "the West emphasizes science, while the East emphasizes technology". Is science really useless?

Recommend

Video tutorial on tips and methods for shooting short videos!

Nowadays, many people have started to shoot short...

Will there be more extreme rainstorms in the future? Here is the explanation

Produced by: Science Popularization China Author:...

Open the door quickly, the drone is here to deliver fresh seafood!

Nowadays, driven by the strategy of building a st...

Time dilutes everything Google stops supporting Microsoft Windows Phone

[[244004]] The failure of Windows Phone in the mo...

Quantum entanglement: What is the "ghostly action at a distance"?

Of all the strange quantum effects, quantum entan...

Apple's large-screen iPhone: a passive upgrade without innovation

Leaked blunder The last time an Apple product was...

Do the flavors of liquor really exist?

When it comes to liquor, there are many types. Fr...

Case analysis: The operating model of "Lian Coffee"!

The user social growth system of "Growth Cof...

No teeth, bald buttocks, big pectoral muscles, why modern birds look so weird

Confucius would never have imagined that his name...