Written by | Ma Xuewei Preface Robots can now play table tennis and have reached the level of intermediate human players! Without further ado, let's see how it wreaks havoc on human novices. According to reports, this robot was created by the Google DeepMind research team and won 45% (13/29) of the 29 robot-human competitions . It is worth noting that all human players were new to the robot. While the robot lost all of its matches against the top players, it beat 100% of the beginners and 55% of the intermediate players. Photo|Playing table tennis with a professional coach. In response, professional table tennis coach Barney J. Reed said, "It's amazing to watch the robot compete with players of all levels and styles. Our goal is to get the robot to an intermediate level. I think this robot even exceeded my expectations." The related research paper, titled “Achieving Human Level Competitive Robot Table Tennis”, has been published on the preprint website arXiv. How to make a robot play table tennis? Currently, table tennis is a major highlight of the Paris Olympics. Table tennis players demonstrate extremely high physical fitness levels, high-speed movement capabilities, precise control of various balls and superhuman sensitivity in the competition. That’s why researchers have used table tennis as a benchmark for robots since the 1980s, developing many table tennis robots and making progress in returning the ball to the opponent’s half of the court, hitting the target, smashing, cooperative play, and many other key aspects of table tennis. However, no robot has yet played a full table tennis game against an unseen human opponent. In this study, through techniques such as hierarchical and modular policy architecture, iterative definition of task distribution, simulation-to-simulation adaptation layer, domain randomization, real-time adaptation to unknown opponents and hardware deployment, the Google DeepMind team achieved amateur human-level performance in competitive table tennis between robots and human players. Figure | Overview of the method. 1. Hierarchical and modular strategy architecture based on skill library Low-Level Controllers (LLC) : This library contains various table tennis skills, such as forehand attack, backhand positioning, forehand serve, etc. Each LLC is an independent policy that focuses on the training of a specific skill. These LLCs are learned through neural networks and simulated and trained using the MuJoCo physics engine. Figure|LLC training library. High Level Controller (HLC) : The HLC is responsible for selecting the most appropriate LLC based on the current game situation and opponent capabilities. It consists of the following modules: Style Selection Strategy: This strategy chooses to use either the forehand or backhand depending on the type of ball (serve or attack). Spin Classifier: This classifier determines whether the incoming ball has topspin or backspin. LLC Skill Descriptors: These descriptors record each LLC's performance metrics under different ball conditions, such as hit rate and ball placement. Strategy selection module: This module generates a candidate list of LLCs based on LLC skill descriptors, match statistics, and opponent capabilities. LLC preference (H-value): This module uses the gradient bandit algorithm to learn the preference value of each LLC online and selects the final LLC based on the preference value. Figure | Once the ball is hit, the HLC first determines which LLC to return the ball to by applying a style policy to the current ball state to determine forehand or backhand (forehand is chosen in this example). 2. Techniques for achieving zero-sample simulation to reality Iteratively define task distribution: This method collects initial ball state data from human-human game data and trains LLC and HLC in a simulated environment. The data generated by the simulated training is then added to the real-world dataset and the process is repeated to gradually refine the training task distribution. Simulation-to-simulation adaptation layer: To solve the problem caused by the difference in model parameters of upspin and downspin in the simulation environment, the paper proposes two solutions: rotation-allowing and simulation-to-simulation adaptation layer. Rotation-allowing is solved by adjusting the training dataset of LLC, while the simulation-to-simulation adaptation layer uses the FiLM layer to learn the mapping relationship between upspin and downspin. Domain randomization: During training, the paper randomizes parameters such as observation noise, delay, table and racket damping, friction, etc. in the simulated environment to simulate the uncertainty in the real world. Figure | Zero-shot simulation-to-real transformation. 3. Adapt to unknown opponents in real time Real-time tracking of game statistics: HLC tracks game statistics in real time, such as the scores and turnovers of the robot opponent and the opponent, and adjusts LLC's preference values based on this data to adapt to changes in the opponent. Online learning of LLC preferences: Through the gradient bandit algorithm, HLC can learn the preference value of each LLC online and select a more suitable LLC according to the opponent's weaknesses. Figure|Hierarchical control. The research team collected a small amount of human-on-human play data to initialize task conditions. Then, they used reinforcement learning (RL) to train the agent in simulation and adopted a variety of techniques to deploy the policy zero-shot to real hardware. This agent played against human players to generate more training task conditions, and then repeated the training-deployment cycle. As the robot improved, the criteria for the game became more and more complex while still being based on real-world task conditions. This hybrid simulation-real cycle created an automated task curriculum that allowed the robot's skills to improve over time. How was the fight? To evaluate the agent’s skill level, the robot played competitive matches against 29 table tennis players of different skill levels—beginner, intermediate, advanced, and advanced+—as determined by a professional table tennis coach. Against all opponents, the bot won 45% of its matches and 46% of its rounds. Breaking it down by skill level, we see that the bot won all its matches against beginners, lost all its matches against advanced and advanced+ players, and won 55% of its matches against intermediate players. This strongly suggests that the agent has reached the level of an intermediate human player in rounds. Figure | Against all opponents, the robot won 45% of matches and 46% of games, winning 100% of matches against beginners and 55% of matches against intermediate players. Study participants enjoyed playing with the robot, rating it highly for being “fun” and “engaging.” These ratings were consistent across skill levels and regardless of whether participants won or lost. They also overwhelmingly responded that they would “definitely” play with the robot again. When given free time to play with the robot, they played for an average of 4:06, for a total of 5 minutes. Advanced players were able to exploit weaknesses in the robot's strategy, but they still enjoyed playing with it. In post-match interviews, they considered it a more dynamic practice partner than a ball-serving machine. Figure | Participants enjoyed playing with the robot and rated it highly for being "fun" and "engaging." Shortcomings and Prospects The research team said that this robot learning system still has some limitations , such as limited response to fast balls and low balls, low rotation detection accuracy, and lack of multi-ball strategies and tactics. Future research directions include improving the robot's ability to handle various balls, learning more complex strategies, and improving motion capture technology. The research team also stated that the hierarchical strategy architecture and zero-sample simulation-to-real conversion method proposed in this study can be applied to other robot learning tasks. In addition, real-time adaptation technology can help robots better adapt to changing environments and tasks. In addition, system design principles are also crucial for developing high-performance and robust robot learning systems. |
>>: Can you pay off the "debt" incurred by staying up late just by catching up on sleep?
[[127713]] You have to admit that in the world of...
Right now You are looking at this dinosaur No dou...
Audit expert: Hu Qizhou Associate Professor, Scho...
[[152469]] The adoption of any new technology oft...
WeChat, as the largest traffic portal on mobile t...
Seeing the first batch of color photos released b...
This article is the first in a series of analysis...
After submitting an app for review, developers wi...
[51CTO.com original article] As an ordinary perso...
Have you ever been extremely tired in your life, ...
On the morning of December 16, a fiery glow appea...
In order to reduce the problem of repeated examin...
As the weather gets warmer, many sisters will beg...
If "WeChat" and "fission" are...
Changan Eado XT has been well-known for several y...