Oxylabs experts discuss whether reinforcement (machine) learning is overhyped?

Oxylabs experts discuss whether reinforcement (machine) learning is overhyped?

Suppose you sit down to play chess with a friend, but this friend is not a normal person, but a computer program that does not know the rules of the game. However, it does have one goal: to win.

Because the friend doesn't know the rules, it will start by making random moves. Some of these moves make no sense, and it will be easy for you to win. But let's assume that you enjoy playing chess with this friend so much that you decide to do nothing but play chess for the rest of your life (or even the afterlife, if you believe in it).

This digital friend will eventually win because it will gradually learn the methods it needs to defeat you. This scenario may seem far-fetched, but it should give you a basic understanding of how reinforcement learning (RL) as a field of machine learning (ML) works.

How smart is reinforcement learning?

Artificial intelligence has many characteristics, including knowledge acquisition, the desire to expand intelligence, and intuitive thinking. However, when chess champion Garry Kasparov lost to an IBM computer called Deep Blue, our intelligence was widely questioned. Doomsday scenarios depicting a future where robots rule the world have not only captured public attention, but have also occupied mainstream consciousness.

However, Deep Blue was no ordinary opponent. Playing chess against this program was like playing against a thousand-year-old man who had spent his entire life playing chess. As a result, Deep Blue was proficient at playing a particular type of chess, but not at other intellectual activities, such as playing an instrument, writing a book, conducting scientific experiments, raising children, or repairing a car.

Oxylabs has absolutely no intention of diminishing the greatness of Deep Blue’s masterpiece. What we are saying is that when considering whether computers can surpass human intelligence, we need to take a closer look, starting with a detailed understanding of RL mechanisms.

How reinforcement learning works

As mentioned earlier, RL is a subset of ML that focuses on the concept of how intelligent agents should act in an environment to maximize cumulative rewards.

In layman's terms, RL robotic agents are trained with a reward-penalty mechanism where they are rewarded for correct actions and penalized for incorrect ones. RL robots do not "think" about the best actions to take, but simply take all possible actions to maximize their chances of success.

Disadvantages of Reinforcement Learning

The main drawback of reinforcement learning is that it requires a lot of resources to achieve its goals. This is well demonstrated by the success of RL in Go, a popular two-player game where the goal is to occupy the most territory on the board with round pieces while avoiding losing territory yourself.

AlphaGo Master is the computer program that beat human players at the game of Go. Its success was made possible by a massive investment, including many engineers, thousands of years of experience playing Go, and an astonishing 256 GPUs and 128,000 CPU cores.

That's a lot of work to learn to win the game. This raises the question of whether it's rational to design AI that can't think intuitively. Shouldn't AI research be modeled after human intelligence?

One view in favor of RL is that one should not expect AI agents to behave like humans, and that their usefulness in solving complex problems is worth further development. On the other hand, one view against RL is that AI research should focus on enabling machines to do things that currently only humans and animals can do. From this perspective, the comparison between AI and human intelligence is appropriate.

Quantum reinforcement learning

There is an emerging field of reinforcement learning that claims to address some of these issues. Quantum reinforcement learning (QRL) has been studied as a way to speed up computation.

QRL should primarily speed up learning by optimizing the exploration (finding a strategy) and exploitation (picking the best strategy) phases. Some current applications and advocates of quantum computing could improve database searches, factorize large numbers into prime numbers, and so on.

Although QRL is not yet a breakthrough, it is expected to address some of the major challenges facing conventional reinforcement learning.

The Business Case for RL

As mentioned before, I have absolutely no intention to downplay the importance of RL research and development. In fact, Oxylabs has been working on developing RL models to optimize web scraping resource allocation.

Here are some real-world use cases for RL, picked out from a McKinsey report that highlights current use cases across various industries:

• Optimize silicon and chip designs, optimize manufacturing processes, and increase yields in the semiconductor industry.

• Increase yields, optimize logistics to reduce waste and costs, and improve agricultural profitability.

• Accelerate time to market for new systems in the aerospace and defense industries.

• Optimize design processes and increase manufacturing yields in the automotive industry.

• Increase revenue, improve customer experience, and deliver advanced personalization to customers in the financial services sector through real-time trading and pricing strategies.

• Optimize mine design, manage power generation and apply overall logistics scheduling to optimize operations, reduce costs and increase mining production.

• Increase production through real-time monitoring and precision drilling, optimize tanker routes and support predictive maintenance to prevent equipment failure and downtime in the oil and gas industry.

• Accelerate new drug development, optimize research processes, automate production and optimize biological methods in the pharmaceutical industry.

• Optimize supply chains, enable advanced inventory modeling and provide advanced personalization to retail customers.

• Optimize and manage networks and apply customer personalization in the telecommunications industry.

• Optimize routes, network planning, warehouse operations in transportation and logistics.

• Use new generation agents to extract data from websites.

Rethinking reinforcement learning

Reinforcement learning may not be powerful enough yet, but it is far from overrated. Moreover, as RL R&D increases, so do the potential use cases in nearly every economic sector.

Large-scale adoption depends on many factors, including optimal algorithm design, configuration of the learning environment, and the availability of computing power.

Oxylabs is dedicated to using AI and ML to optimize web scraping, the process of extracting data from websites to gain specialized insights.

Author: Aleksandras Sulzenko, Product Manager at Oxylabs.io

As a winner of Toutiao's Qingyun Plan and Baijiahao's Bai+ Plan, the 2019 Baidu Digital Author of the Year, the Baijiahao's Most Popular Author in the Technology Field, the 2019 Sogou Technology and Culture Author, and the 2021 Baijiahao Quarterly Influential Creator, he has won many awards, including the 2013 Sohu Best Industry Media Person, the 2015 China New Media Entrepreneurship Competition Beijing Third Place, the 2015 Guangmang Experience Award, the 2015 China New Media Entrepreneurship Competition Finals Third Place, and the 2018 Baidu Dynamic Annual Powerful Celebrity.

<<:  Which one has more potential in the 1,000 yuan market? R9 370X vs. GTX 950

>>:  AMD's 15cm R9 Nano graphics card also suffers from "electrical whistling"

Recommend

Smali disassembly language data types and methods

Introduction to Smali Smali is a disassembly lang...

Research on Li Ziqi's word-of-mouth marketing

There are many reasons why internet celebrities c...

Tencent Silicon Valley recruits to build an autonomous driving R&D team

According to foreign media reports, Tencent recen...

Sony Z2 beats HTC M8

In a foreign media test earlier, HTC M8 beat Sony&...

WeChat mini program name registration, how to choose a mini program name?

1. Mini Program Name Setting 1. The name of the m...

How to increase the opening rate of pictures and texts in public accounts?

Recently, Ma Huateng , deputy to the National Peo...

Content, activities, and users: how to operate new products from three aspects

But first we need to start with job positioning. ...