Let the robot play table tennis, can it catch the ball with both forehand and backhand, and spin? Netizens: See you at the Olympics!

Written by | Ma Xuewei

Preface

Robots can now play table tennis and have reached the level of intermediate human players!

Without further ado, let's see how it wreaks havoc on human novices.

According to reports, this robot was created by the Google DeepMind research team and won 45% (13/29) of the 29 robot-human competitions . It is worth noting that all human players were new to the robot.

While the robot lost all of its matches against the top players, it beat 100% of the beginners and 55% of the intermediate players.

Photo｜Playing table tennis with a professional coach.

In response, professional table tennis coach Barney J. Reed said, "It's amazing to watch the robot compete with players of all levels and styles. Our goal is to get the robot to an intermediate level. I think this robot even exceeded my expectations."

The related research paper, titled “Achieving Human Level Competitive Robot Table Tennis”, has been published on the preprint website arXiv.

How to make a robot play table tennis?

Currently, table tennis is a major highlight of the Paris Olympics. Table tennis players demonstrate extremely high physical fitness levels, high-speed movement capabilities, precise control of various balls and superhuman sensitivity in the competition.

That’s why researchers have used table tennis as a benchmark for robots since the 1980s, developing many table tennis robots and making progress in returning the ball to the opponent’s half of the court, hitting the target, smashing, cooperative play, and many other key aspects of table tennis. However, no robot has yet played a full table tennis game against an unseen human opponent.

In this study, through techniques such as hierarchical and modular policy architecture, iterative definition of task distribution, simulation-to-simulation adaptation layer, domain randomization, real-time adaptation to unknown opponents and hardware deployment, the Google DeepMind team achieved amateur human-level performance in competitive table tennis between robots and human players.

Figure | Overview of the method.

1. Hierarchical and modular strategy architecture based on skill library

Low-Level Controllers (LLC) : This library contains various table tennis skills, such as forehand attack, backhand positioning, forehand serve, etc. Each LLC is an independent policy that focuses on the training of a specific skill. These LLCs are learned through neural networks and simulated and trained using the MuJoCo physics engine.

Figure｜LLC training library.

High Level Controller (HLC) : The HLC is responsible for selecting the most appropriate LLC based on the current game situation and opponent capabilities. It consists of the following modules:

Style Selection Strategy: This strategy chooses to use either the forehand or backhand depending on the type of ball (serve or attack).

Spin Classifier: This classifier determines whether the incoming ball has topspin or backspin.

LLC Skill Descriptors: These descriptors record each LLC's performance metrics under different ball conditions, such as hit rate and ball placement.

Strategy selection module: This module generates a candidate list of LLCs based on LLC skill descriptors, match statistics, and opponent capabilities.

LLC preference (H-value): This module uses the gradient bandit algorithm to learn the preference value of each LLC online and selects the final LLC based on the preference value.

Figure | Once the ball is hit, the HLC first determines which LLC to return the ball to by applying a style policy to the current ball state to determine forehand or backhand (forehand is chosen in this example).

2. Techniques for achieving zero-sample simulation to reality

Iteratively define task distribution: This method collects initial ball state data from human-human game data and trains LLC and HLC in a simulated environment. The data generated by the simulated training is then added to the real-world dataset and the process is repeated to gradually refine the training task distribution.

Simulation-to-simulation adaptation layer: To solve the problem caused by the difference in model parameters of upspin and downspin in the simulation environment, the paper proposes two solutions: rotation-allowing and simulation-to-simulation adaptation layer. Rotation-allowing is solved by adjusting the training dataset of LLC, while the simulation-to-simulation adaptation layer uses the FiLM layer to learn the mapping relationship between upspin and downspin.

Domain randomization: During training, the paper randomizes parameters such as observation noise, delay, table and racket damping, friction, etc. in the simulated environment to simulate the uncertainty in the real world.

Figure | Zero-shot simulation-to-real transformation.

3. Adapt to unknown opponents in real time

Real-time tracking of game statistics: HLC tracks game statistics in real time, such as the scores and turnovers of the robot opponent and the opponent, and adjusts LLC's preference values based on this data to adapt to changes in the opponent.

Online learning of LLC preferences: Through the gradient bandit algorithm, HLC can learn the preference value of each LLC online and select a more suitable LLC according to the opponent's weaknesses.

Figure｜Hierarchical control.

The research team collected a small amount of human-on-human play data to initialize task conditions. Then, they used reinforcement learning (RL) to train the agent in simulation and adopted a variety of techniques to deploy the policy zero-shot to real hardware. This agent played against human players to generate more training task conditions, and then repeated the training-deployment cycle. As the robot improved, the criteria for the game became more and more complex while still being based on real-world task conditions. This hybrid simulation-real cycle created an automated task curriculum that allowed the robot's skills to improve over time.

How was the fight?

To evaluate the agent’s skill level, the robot played competitive matches against 29 table tennis players of different skill levels—beginner, intermediate, advanced, and advanced+—as determined by a professional table tennis coach.

Against all opponents, the bot won 45% of its matches and 46% of its rounds. Breaking it down by skill level, we see that the bot won all its matches against beginners, lost all its matches against advanced and advanced+ players, and won 55% of its matches against intermediate players. This strongly suggests that the agent has reached the level of an intermediate human player in rounds.

Figure | Against all opponents, the robot won 45% of matches and 46% of games, winning 100% of matches against beginners and 55% of matches against intermediate players.

Study participants enjoyed playing with the robot, rating it highly for being “fun” and “engaging.” These ratings were consistent across skill levels and regardless of whether participants won or lost. They also overwhelmingly responded that they would “definitely” play with the robot again. When given free time to play with the robot, they played for an average of 4:06, for a total of 5 minutes.

Advanced players were able to exploit weaknesses in the robot's strategy, but they still enjoyed playing with it. In post-match interviews, they considered it a more dynamic practice partner than a ball-serving machine.

Figure ｜ Participants enjoyed playing with the robot and rated it highly for being "fun" and "engaging."

Shortcomings and Prospects

The research team said that this robot learning system still has some limitations , such as limited response to fast balls and low balls, low rotation detection accuracy, and lack of multi-ball strategies and tactics.

Future research directions include improving the robot's ability to handle various balls, learning more complex strategies, and improving motion capture technology.

The research team also stated that the hierarchical strategy architecture and zero-sample simulation-to-real conversion method proposed in this study can be applied to other robot learning tasks. In addition, real-time adaptation technology can help robots better adapt to changing environments and tasks. In addition, system design principles are also crucial for developing high-performance and robust robot learning systems.

<<: Excessive sun protection will affect the synthesis of vitamin D and cause osteomalacia? Dermatologists say

>>: Can you pay off the "debt" incurred by staying up late just by catching up on sleep?

Fanstong No. 1: Become a Fanstong master in 3 minutes

Blog

A picture to teach you to understand the logic of DSP precision marketing

Blog

Today in Science and Technology History | 2023·7·17 The first domestically-made large cruise ship successfully completed its maiden voyage

Blog

How to use Toutiao's recommendation rules to create a hit article with over 1 million readers? (10,000 words of dry goods)

Blog

New Horizons explores Pluto, uncovers the mystery of the birth of the Kuiper Belt's "space snowman"

Urgent reminder! "Double typhoons" are approaching, and the intensity may exceed expectations! How to respond and prevent scientifically?

Central Meteorological Observatory issues typhoon...

How much does it cost to develop the Baishan Hotel mini program? How much is the price for developing Baishan Hotel Mini Program?

WeChat Mini Program is an application that users ...

2020 Teacher Gu's Modeling and Color Intermediate Class

2020 Teacher Gu's Modeling and Color Intermed...

What factors should be paid attention to when developing social e-commerce mini programs and e-commerce mini programs?

The application scenarios of WeChat mini programs...

The State Administration of Radio, Film and Television pushes TVOS, but it is its own people who are caught in the crossfire

The cross-screen era is coming. Who will become t...

Weekly Science Talk | How did giant pandas transform from carnivores to “vegetarians”?

The cute giant pandas are loved by people. They a...

As the year draws to a close, what new year goods should you buy that will be useful to the whole family? Huawei home storage is worth buying

What are the problems that most people have in us...

Let the robot play table tennis, can it catch the ball with both forehand and backhand, and spin? Netizens: See you at the Olympics!

Fanstong No. 1: Become a Fanstong master in 3 minutes

A picture to teach you to understand the logic of DSP precision marketing

Today in Science and Technology History | 2023·7·17 The first domestically-made large cruise ship successfully completed its maiden voyage

How to use Toutiao's recommendation rules to create a hit article with over 1 million readers? (10,000 words of dry goods)

New Horizons explores Pluto, uncovers the mystery of the birth of the Kuiper Belt's "space snowman"

Huawei announced that it would charge 5G patent fees. Qualcomm’s protection has expired. What should Xiaomi and OV do?

There are few new 8K TVs at CES 2023, sales are declining, and is ultra-high definition a false demand?

Price inquiry for the production of Quzhou Mall Mini Program. How much does it cost to produce Quzhou Mall Mini Program?

Xiaomi Router CR6606: It’s OK to use it as a gift from the operator, but don’t buy it at your own expense

How to pour the perfect beer while watching a game?

Recommend

The Pain of ZTE Incident: The Emperor's New Clothes for China's Whole Machine Industry

How to establish a membership operation system?

There are so many pots, how do you choose a good and suitable one?

Urgent reminder! "Double typhoons" are approaching, and the intensity may exceed expectations! How to respond and prevent scientifically?

How much does it cost to develop the Baishan Hotel mini program? How much is the price for developing Baishan Hotel Mini Program?

2020 Teacher Gu's Modeling and Color Intermediate Class

What factors should be paid attention to when developing social e-commerce mini programs and e-commerce mini programs?

The State Administration of Radio, Film and Television pushes TVOS, but it is its own people who are caught in the crossfire

Weekly Science Talk | How did giant pandas transform from carnivores to “vegetarians”?

As the year draws to a close, what new year goods should you buy that will be useful to the whole family? Huawei home storage is worth buying

"Bohemian Rhapsody" - Why is it named Bohemia?

ZTE "Big Q" experience review: 1,000 yuan 4G can also be unique

Based on these six criteria, they invested in Zhubajie, which is valued at 10 billion yuan.

Innovation and practice of multi-scenario modeling in Dewu transaction search

Top 10 marketing promotion trends in 2020