Video games are an important testing ground for artificial intelligence (AI) systems. Like the real world, games are rich learning environments, with responsive real-time settings and constantly changing objectives. Google DeepMind has a long history in AI and gaming, from its early work with Atari Games to its AlphaStar system’s ability to play StarCraft II at a human master’s level. Recently, Google DeepMind announced a new milestone - shifting its focus from a single game to general, coachable game-playing AI agents. In a new technical report, Google DeepMind introduced SIMA (Scalable Instructable Multiworld Agent), a generalist AI agent for 3D virtual environments. Google DeepMind worked with game developers to train SIMA on a variety of video games. This research marks the first time an AI agent has demonstrated the ability to understand a variety of game worlds and perform tasks in them in the same way humans do by following natural language instructions. This work isn’t about getting high scores. Learning to play even one video game is a technical feat for an AI system, but learning to follow instructions in a variety of gaming environments could make AI agents more effective in any setting. Google DeepMind's research shows how to translate the capabilities of advanced AI models into useful actions in the real world through a language interface. They hope that SIMA and other agent research will use video games as a sandbox to better understand how AI systems can become more helpful. Learning from Video Games To expose SIMA to more environments, Google DeepMind has established a number of partnerships with game developers to conduct research. They worked with eight game studios to train and test SIMA in nine different video games, such as No Man's Sky from Hello Games and Teardown from Tuxedo Labs. Each game in the SIMA portfolio opens up a new interactive world and includes a range of skills to learn, from simple navigation and menu usage to mining resources, flying a spaceship, or crafting a helmet. Google DeepMind also used four research environments — including a new environment built with Unity called the Architecture Lab, in which agents were asked to build sculptures out of blocks, testing their manipulation of objects and their intuitive understanding of the physical world. SIMA captured the connection between language and gaming behavior by learning from different game worlds. The first approach was to record pairs of human players in a combination of games, with one player observing and instructing the other. They also had the players play the game freely, then re-observed their behavior and recorded the instructions that might have led to their gaming behavior. Figure | SIMA consists of a pre-trained visual model and a main model, which includes a memory that can output keyboard and mouse operations. SIMA: Multifunctional AI Agent SIMA is an AI agent that can perceive and understand various environments, then take actions to achieve command goals. It consists of a model for accurate image-language mapping and a video model for predicting what will happen next on the screen. Google DeepMind fine-tuned these models based on training data for specific 3D settings in the SIMA portfolio. According to reports, SIMA does not need access to the source code of the game, nor does it require a customized application program interface. It only needs two inputs: images on the screen and simple natural language instructions provided by the user. SIMA uses keyboard and mouse output to control the central character of the game to execute these instructions. This simple interface is used by humans, which means that SIMA can interact with any virtual environment. The current version of SIMA assesses 600 basic skills, including navigation (such as "turn left"), object interaction ("climb a ladder"), and menu usage ("open a map"). Google DeepMind has trained SIMA to complete simple tasks in 10 seconds. Google DeepMind hopes that future agents will be able to handle tasks that require advanced strategic planning and multiple subtasks to complete, such as "find resources and build a camp." This is an important goal for artificial intelligence because while large language models (LLMs) have produced powerful systems that can capture knowledge about the world and generate plans, they currently lack the ability to act on our behalf. Cross-game induction Google DeepMind has found that agents trained on multiple games outperform those that learned to play just one game. In evaluations, SIMA agents trained on nine 3D games significantly outperformed all specialized agents trained on a single game. More importantly, agents trained on all but one game performed nearly identically on average to specialized agents on unseen games. Importantly, this ability to function in entirely new environments highlights SIMA’s general capabilities beyond training. This is a promising initial result, but more research is needed to bring SIMA to human-level performance on both seen and unseen games. The results also showed that SIMA’s performance was language-dependent. In control tests, where the agent received no verbal training or instructions, it behaved appropriately but aimlessly. For example, the agent might often collect resources instead of walking as instructed. Figure | Google DeepMind evaluated SIMA's ability to follow instructions to complete nearly 1,500 unique game tasks, some of which were evaluated by human judges. As a baseline comparison, they used the performance of environment-specific SIMA agents (trained and evaluated to follow instructions in a single environment) and compared this performance with three general-purpose SIMA agents, each of which was trained in multiple environments. Advancing AI agent research Google DeepMind said that SIMA's research results show that they have the potential to develop a new batch of general, language-driven AI agents. This is an early study, and they look forward to further developing SIMA in more training environments and incorporating more capable models. As SIMA is applied to more training environments, Google DeepMind hopes that its generality and versatility will become stronger. With more advanced models, they hope to improve SIMA's ability to understand and execute instructions in higher-level languages, thereby achieving more complex goals. Ultimately, Google DeepMind's research will move toward more general AI systems and agents that can understand and safely perform a wide range of tasks to help people online and in the real world. Original link: https://deepmind.google/discover/blog/sima-generalist-ai-agent-for-3d-virtual-environments/ |
>>: Frequent respiratory infections in young adults! How to fight them?
The three words "Pinduoduo" are synonym...
[[245706]] Alipay's official Weibo account re...
[[286646]] "WeChat has changed again." ...
Last night, Apple quietly released the cheap versi...
Every friend who owns an Android phone, please no...
Online traffic costs remain high, and corporate o...
This article was reviewed by: Xie Xinhui, Jixi Ji...
It is said that sunbathing helps to grow taller, ...
Today's "consumer behavior" is very...
One of the things that has always been criticized...
Tips: Your vacation balance is insufficient. As t...
Cancer doesn't happen overnight. So, who gave...
The high price of Surface Pro 3 makes people think...
Pinduoduo’s activities always bring surprises. Do...
Have you ever had this embarrassing moment: after...