A brief discussion on the deep learning technology behind AlphaGo

Introduction: There are many comments on Alfa Go, but not many can really communicate with the development team. I would like to thank friends from DeepMind, the development team of Alfa Go, for their attention and discussion on the content of this article. They pointed out that the wording I used in the previous version of the article was not precise enough, so I made some adjustments here. The "global" mentioned in my previous article refers to the entire game across time points, which can easily be mistaken for the entire chessboard at a specific time point, so all of them were changed to "overall chess game" later. In addition, regarding the overall chess game evaluation, in addition to the evaluation network learned through offline data, it is also possible to evaluate the differences in different strategies calculated in real time according to the current state (this technology is called Rollouts). It can also locally consider the effect of the overall chess game by caching the calculation results. Thanks again to DeepMind friends for their corrections.

Today, after humans lost three games to AlphaGo, it is a good time to let everyone have a better understanding of the deep learning technology involved in AlphaGo (rather than imagining the panic of Ultron coming in the Avengers). Before explaining Alpha Go's deep learning technology, I will first use a few simple facts to clarify the most common misunderstandings:

●The technology used by AlphaGo this time is fundamentally different from that of Deep Blue. It no longer uses brute force problem-solving to defeat humans.

●Yes, AlphaGo is able to grasp more abstract concepts through deep learning, but computers still do not have self-awareness and thinking.

●AlphaGo does not understand the aesthetics and strategy of Go, it just found two beautiful and powerful functions to determine its moves.

What are Neural Networks?

In fact, neural network technology is very old. In 1943, Warren McCulloch and Walter Pitts first proposed the mathematical model of neurons. Then in 1958, psychologist Rosenblatt proposed the concept of perceptron, adding a mechanism for training and modifying parameters (also known as learning) to the structure of the former neurons. At this time, the basic theoretical framework of neural network was completed. The neurons of neural network actually collect various signals from the front end (similar to the dendrites of nerves), then add up the weights of each signal according to the weight, and then convert it into a new signal through the activation function and transmit it out (similar to the axon of the neuron).

As for the neural network, neurons are connected in series. We can distinguish them into input layer (indicating input variables), output layer (indicating variables to be predicted), and the hidden layer in the middle is used to increase the complexity of neurons so that it can simulate more complex function conversion structures. Each neuron has a connection, and each of them has its own weight to process the weighting of the signal.

Traditional neural network technology is to randomly assign weights, and then use recursive calculations to correct the weights one by one according to the input training data to minimize the overall error rate. With the development of technologies such as back conduction networks and unsupervised learning, neural networks became popular at that time, but humans soon encountered difficulties, that is, insufficient computing power. Because when there is only one hidden layer, in most cases, the classification prediction effect of neural networks is not much worse than the traditional statistical logistic regression, but it consumes more computing power. However, with the increase of hidden layer neurons or the increase of hidden layers, the number of weights required to be calculated will increase sharply. Therefore, in the late 1980s, the entire research on neural networks entered a cold winter. You may only be able to experience its small power in the washing machine (now many washing machines use neural networks to evaluate the amount of water and running time based on the clothes poured in). To be honest, neural networks are not considered powerful at all.

This cold winter lasted until 2006, when Hinton and the Lecun team proposed the paper "A fast learning algorithm for deep belief nets", and there was finally hope for recovery. They proposed the idea that if the weights of the neurons in the neural network are not assigned randomly, then the calculation time of the neural network should be greatly shortened. The method they proposed was to use the unsupervised learning of the neural network as the initial weight assignment of the neural network. At that time, since all the journals basically regarded the papers as garbage and did not publish them as long as they saw the words "neural network", they proposed the new word "deep learning" to break through. In addition to Hinton's efforts, thanks to the effect of Moore's Law, we can use faster computing power. Hinton later used this method with GPU computing in 2010 to increase the computing speed of speech recognition by more than 70 times. A new wave of deep learning came in 2012, when deep learning entered the ImageNet competition for the first time (1.2 million photos as training set, 50,000 photos as test set, and 1,000 categories were required). The error rate, which had only slightly changed in the past few years, was reduced from 26% to 15%. In the same year, the paper released by the Microsoft team showed that they reduced the error rate of the ImageNet 2012 dataset to 4.94% through deep learning, which is lower than the human error rate of 5.1%. Last year (2015), Microsoft won the ImageNet 2015 championship again, and the error rate had dropped to an ultra-low level of 3.57%. Microsoft used a 152-layer deep learning network (I was scared to death when I saw this number)...

Convolutional Neural Network

In the problem of image recognition, we are dealing with a two-dimensional neural network structure. For a 100*100 pixel image, the input data is actually a vector of 10,000 pixels (this is still a grayscale image, if it is a color image, it is 30,000). If the neurons in the hidden layer are equivalent to the input layer, we have to calculate 10 to the 8th power of weights. This number is a headache, and it is probably difficult to achieve even through parallel computing or distributed computing. Therefore, convolutional neural networks put forward two very important points:

1. Local Perceptual Domain: From a human perspective, when our vision is focused on a corner of an image, pixels farther away should not affect our vision. Therefore, the concept of local perceptual domain is that a pixel finger needs to be connected to neighboring pixels, so that the number of neural connections we need to calculate can be greatly reduced. For example, a neuron finger needs to be connected to neighboring 10*10 pixels, so our calculation can be reduced from 10 to the 8th power to 100*100*(10*10)=10 to the 6th power.

2. Weight sharing: But 10 to the power of 6 is still a lot, so the second concept to be introduced at this time is weight sharing. Because human vision does not recognize the absolute position of pixels on the image, when the image is translated or the position changes, we can still understand the image, which means that the weights I trained from a local area (such as a 10*10 convolution kernel) should be applicable to all positions of the photo. In other words, the features learned in this 10*10 range can be turned into a filter and applied to the entire range of the image. Weight sharing causes the same weights to be shared within the 10*10 convolution kernel. A convolution kernel can be understood as a feature, so multiple convolution kernels can be designed in the neural network to extract more features. The following figure is a schematic diagram of a 3*3 convolution kernel extracting features from a 5*5 photo.

After the convolution layer finds the features, it can be used as input variables to train the classification model of general neural network. However, when the network structure becomes more and more complex, if the number of samples is not extremely large, it is easy to have the problem of over-fitting (the neural network remembers the structure of the modeling data instead of finding the rules). Therefore, we later introduced the concept of pooling or subsampling, which is to summarize the small n*n area in the convolution kernel to highlight the most significant features of this area to avoid the problem of over-fitting.

Therefore, common image recognition technology (such as ImageNet) is through the combination of multi-stage convolution layer + pooling layer, and finally connected to the general neural network architecture for classification prediction. The figure below is an example of image recognition. C2, C4, and C6 are all convolution layers, while S3 and S5 are pooling layers. Convolutional neural network constructs a neural network technology that solves abstract problems through a two-dimensional matrix. Image recognition no longer needs to manually find image features for neural network learning as in the past, but through the convolutional network structure, they can find features from the data by themselves, and the more convolution layers, the higher-level and more abstract the features that can be identified. So if you want to train a neural network to identify cats or dogs from photos, you no longer need to find the feature annotations of cats or dogs yourself, but just give a large number of cat or dog photos to the neural network, and it will find the abstract definition of cats or dogs by itself.

At this point, have you noticed any similarities between convolutional neural networks for image recognition and Go? Yes, Go is a 19*19 square array, and the rules of Go are not as clear as those of chess or checkers, and it has a high degree of intuition to judge the placement of pieces. At this time, deep learning can play an extremely good role, because programmers do not need to input the rules of Go into the computer themselves, it can find the corresponding logic and abstract concepts through a large number of chess records.

Why is Go difficult?

Why Deep Blue can beat humans in chess but cannot beat Go? This is because Deep Blue uses its powerful computing power to deduce the possibility of winning or losing in the future by using the tree structure of the future situation. But you should know that for chess or Chinese chess, its branching factor is about 40, which means that predicting the next 20 steps requires calculating 40 to the 20th power (how big is that? Even a 1GHz processor will take 3486528500050735 years to calculate. Please note that this is still a relatively simple chess game). Therefore, it uses MinMax search algorithms and Alpha-Beta pruning methods to reduce the possible calculation range. Basically, based on the winning rate of the upper layer, it calculates more layers for the possible winning part and less for the losing part, and does not count irrelevant to winning or losing, using brute force problem solving to find the best strategy. But unfortunately, the branching factor of Go is 250. With a 19*19 Go array, there are 361 places for the pieces, so the total number of permutations and combinations of the entire Go game is as high as 10 to the 171th power. There are many reports saying that this is more than the number of atoms in the universe. This is based on an ancient study that said the number of atoms in the universe is 10 to the 75th power. But I just laughed at this. I think this also underestimates the size of the universe.

The main mechanism of AlphaGo

In terms of architecture, AlphaGo can be said to have two brains, two independent networks with almost identical neural network structures: the policy network and the evaluation network. These two networks are basically composed of a 13-layer convolutional neural network with a convolution kernel size of 5*5, so they are basically the same as image recognition neural networks that access fixed length and width pixels, except that we replace the input value of the matrix with the placement status of each coordinate point on the chessboard.

The first brain, the "strategy network," is basically a simple supervised learning method used to determine the most likely position of the opponent's moves. It inputs a large number of chess records of professional chess players in the world to predict the most likely position of the opponent's moves. In this network, there is no need to think about "winning" at all, you just need to be able to predict the opponent's moves. Currently, AlphaGo's accuracy in predicting the opponent's moves is 57% (this is the data when the Nature article was published, and it must be higher now). You may think that AlphaGo's weakness should be in the strategy network. On the one hand, the prediction accuracy is not high, and on the other hand, if it plays a game that it has never seen before, will it have a chance to beat it? Unfortunately, it is not, because AlphaGo's policy network has been enhanced in two levels. The first level is to use a technology called reinforced-learning (RL) policy network. He first uses part of the samples to train a basic version of the policy network, and uses the complete samples to build an advanced version of the policy network, and then let the two networks play against each other. The latter advanced version of the policy network is equivalent to the "expert" standing in front of the basic version, so that the basic network can quickly get the position data of the expert's possible moves, and then produce an enhanced version. This enhanced version becomes the "expert" of the original advanced version. In this way, the cycle of correction can continuously improve the prediction of the opponent's (expert) moves. The second level is that the current policy network no longer needs to find the most likely position in the 19*19 square. The improved policy network can first exclude some areas from calculation through the convolution kernel, and then find the most likely position based on the remaining area. Although this may reduce the power of AlphaGo's policy network, this mechanism can increase AlphaGo's calculation speed by more than 1,000 times. It is precisely because Alpha Go has been guessing the opponent's possible moves based on the overall situation that the tricks played by humans, such as deliberately making a few moves in the hope of disrupting the computer's moves, are actually meaningless.

The second brain is the evaluation network. In the evaluation network, the focus is on the "final" winning rate of each position under the current situation (this is what I call the overall chess game), rather than short-term siege. In other words, the strategy network is a classification problem (where will the opponent play), and the evaluation network is an evaluation problem (what is my winning rate if I play here). The evaluation network is not an evaluation mechanism for an exact solution, because if you want to calculate the exact solution, it may consume a lot of computing power, so it is only an approximate solution network, and through the convolutional neural network method to calculate the average winning rate of the convolution kernel range (the purpose of this approach is mainly to smooth the evaluation function and avoid the problem of over-learning), the final answer will be left to the final Monte Carlo search tree. Of course, the winning rate mentioned here will be related to the number of steps predicted downward. The more steps predicted downward, the more complex the calculation. AlphaGo currently has the ability to judge the number of predicted steps that need to be expanded. But how can we ensure that the past samples can correctly reflect the winning rate and are not affected by the prior judgment of the strength of the two players (maybe playing a certain place will win not because you should win, but because this person is more powerful), so. This part is solved by playing two AlphaGos against each other, because the strength of the two AlphaGos can be regarded as the same, so the final win or loss must have nothing to do with the original strength of the two people, but with the position of the move. Therefore, the evaluation network is not trained through the known chess records in the world, because human chess games will be affected by the strength of both sides. Through the two-to-one method, when he played against the European chess king, the training group samples used were only 30 million chess records, but when he played against Lee Sedol, it increased to 100 million. Since human chess games may take several hours, but AlphaGo games may be completed in one second, this method can quickly accumulate correct evaluation samples. So the evaluation mechanism, which was mentioned earlier as the biggest difficulty in machine Go, was solved in this way through convolutional neural networks.

The final link of AlphaGo technology is the Monte Carlo search tree. Compared with the search used by Deep Blue before (with the MinMax search algorithm and Alpha-Beta pruning method, which will not be described here), since we do not have infinite computing power (please note that if there are limited permutations and combinations, the Monte Carlo search tree may indeed be able to conduct a comprehensive evaluation of all combinations, but there is no way in the Go scenario. Even if this is done, it will probably cause a significant increase in computing time), it is impossible to apply the old method. However, in the previous strategy network and evaluation network, AlphaGo can already narrow down the possibility of the next move (including the opponent) to a controllable range. Then he can quickly use the Monte Carlo search tree to calculate the best solution in the limited combinations. Generally speaking, the Monte Carlo search tree includes 4 steps:

1. Selection: First, based on the current situation, choose several possible opponent move modes.

2. Expand: Based on the opponent's moves, expand to the move pattern with the highest chance of winning (which we call the first-order Monte Carlo tree). Therefore, not all combinations are actually expanded in AlphaGo's search tree.

3. Evaluation: How to evaluate the best move (where should AlphaGo make the move?). One way is to throw the chess game after the move into the evaluation network to evaluate the winning rate. The second way is to make a deeper Monte Carlo tree (predicting several more possible results). The results of these two methods may be completely different. AlphaGo uses a mixing coefficient to integrate the two evaluation results. The mixing coefficient currently published in Nature is 50%-50% (but I guess it is definitely not in reality)

4. Backward conduction: After deciding on the best position to move, we quickly evaluate the opponent's possible next move based on this position through the strategy network, as well as the corresponding search evaluation. So the most terrifying thing about AlphaGo is that when Lee Sedol was thinking about where to move, AlphaGo not only had already guessed his possible position, but was also using the time he was thinking to continue calculating the next move.

According to the actual test of the AlphaGo team, if a single brain or Monte Carlo search tree technology is used, it can reach the amateur level (European chess champion Fan Mo's strength level is about 2500~2600, and Lee Sedol is above 3500). But when these technologies are integrated, they can show a stronger power. However, when publishing the Nature paper, his estimated strength is probably only professional 3~4 dan (Lee Sedol is 9 dan), but as mentioned above, he strengthened the strategy network through enhanced technology and optimized the evaluation network through two AlphaGos, which can make him stronger in a short time. Moreover, computers have no emotions and are not afraid of pressure, and will not underestimate the enemy because of the opponent's performance (AlphaGo's strategy network has always only predicted the strong), so even if humans have stronger strength, they may not be able to withstand the pressure of winning or losing and perform their best.

Does Lee Sedol have a chance to win?

In many comments, I think there are many incorrect speculations about AlphaGo. The first is whether AlphaGo has the ability to evaluate the "whole chess game". It must be said that the whole AlphaGo has the ability, which is mainly from the calculation results of the evaluation network (because it calculates the final winning rate), but the obtained is a smoothed average winning rate of the pooled area. In AlphaGo, the strategy network mainly evaluates the opponent's next move, and the Monte Carlo search tree uses the parameters of the evaluation network (the result of offline training) and the Rollouts technology that calculates the value difference in real time according to the current state, so it can make simulations that take the whole chess game into consideration. However, humans control the "overall chess game" through intuition, which should be stronger than computers. Moreover, if AlphaGo currently evaluates the average winning rate through the results of convolution kernel pooling (mainly for smoothing and avoiding over-learning), if Lee Sedol can use AlphaGo to predict his behavior and make subsequent decisions, set traps, and create errors in winning rate evaluation (the average winning rate is high within the pooling range, but a wrong move in a certain position will cause the "overall chess game" to overturn, which is the error in winning rate prediction), then humans may win (of course, I am just proposing the possibility here, but knowing is easier said than done, and the actual execution possibility of such actions is low). The reason why Lee Sedol is bound to lose now is that it has been guessing AlphaGo's chess path, but in fact, AlphaGo has been making decisions by guessing Lee Sedol's next step, so he should change his thinking and deceive AlphaGo through his own fake moves, which may have the possibility of winning.

Weak AI and Strong AI

Now that computers have defeated humans in Go, which is known as the last bastion of mankind, should we worry about the day when artificial intelligence will rule over humans? In fact, there is no need to worry too much, because in terms of the classification of artificial intelligence, it is divided into weak artificial intelligence (Artificial Narrow Intelligence) and strong artificial intelligence (Artificial General Intelligence) (In fact, some people have proposed high artificial intelligence Artificial Super Intelligence, which is considered to be artificial intelligence that is more powerful than human intelligence and has creative innovation and social skills, but I think this is too sci-fi and will not be discussed anymore). The biggest difference is that weak artificial intelligence does not have self-awareness, does not have the ability to understand problems, and does not have the ability to think and plan to solve problems. You may question how AlphaGo can play so well if it does not understand Go? Please note that AlphaGo is essentially a deep learning neural network. It just found a Monte Carlo search tree that can predict the opponent's moves (strategy network), calculate the winning rate (evaluation network), and calculate the best solution based on a limited number of options through the network architecture and a large number of samples. In other words, it finds the best action based on these three functions, rather than really understanding what Go is. So the difference between AlphaGo and Microsoft's Cortana or iPhone's Siri is that it is specialized in playing Go, and it does not have any additional thinking mechanism. I have also seen some reports saying that AlphaGo is a general network, so it can quickly learn to play World of Warcraft or study medicine. This is also a big fallacy. If you have read the above explanation, you will know that AlphaGo is an artificial intelligence designed for playing Go. If you want to use it to solve other problems, the neural structure and algorithm must be redesigned. So it is better to say that Lee Sedol lost to mathematics rather than AlphaGo, proving that intuition is not as good as rational judgment of mathematics. Some people think that humans have lost the last bastion, and the art of Go will also be destroyed... In fact, you really don't need to worry too much. Why didn't humans panic when they couldn't run faster than cars? Running is still a good sport, and not all Olympic gold medals were taken by Ferrari... So there is really no need to be too nervous.

Will strong artificial intelligence appear one day? In 2013, Bostrom conducted a questionnaire survey among hundreds of the world's most cutting-edge artificial intelligence experts, asking them when they expected strong artificial intelligence to appear. He derived three answers based on the results of the questionnaire: the optimistic estimate (median of 10% of the questionnaires) is 2022, the normal estimate (median of 50% of the questionnaires) is 2040, and the pessimistic estimate (median of 90% of the questionnaires) is 2075. So it's still a long way from us. However, when the development of weak artificial intelligence enters the stage of cost reduction and commercialization, instead of worrying about whether artificial intelligence will rule the earth, it is more practical to worry about whether your work skills will be replaced by computers.
Source: Huayuan Data

<<: Are TV games revisiting classics game remakes or just rehashing old stuff?

>>: AlphaGo's victory: a historic leap forward for artificial intelligence

Tesla, Dyson and others have all lost to ventilators abroad, and the price of ventilators in the domestic market has skyrocketed

According to foreign media reports, last month Te...

A brief discussion on the deep learning technology behind AlphaGo

Cherry House Buying Training 2

"Extreme" strong winds + temperature drop! Everyone who went out to work today, how are you?

49 Starlink satellites encountered geomagnetic storm. Is it the sun's fault?

Three growth models and implementation principles of Internet finance!

White is not the base color of the earth!

[Refutation] Even if the product is shit, you have to operate it

Baidu bidding promotion optimization strategy

Efficient Posture Assessment Baidu Cloud Download

Insurance Vision Academy: Teach you step by step how to create a video account for insurers [Video Course]

Plague warning! Catching pikas and touching groundhogs, are you risking your life?!

Recommend

Sea salt, iodized salt, low sodium salt... How to choose between different types of salt? Many people eat the wrong salt

It's snowing outside, but your body is so lively? | Science Museum

What is the use of Tik Tok expert certification? What are the requirements for Douyin expert certification?

Do you know the story behind Beidou satellite?

Is it effective to eat Cordyceps sinensis as a tonic in winter? Can I choose to cut the grass?

Domestic mobile phones set their sights on India: Is it really that easy to make money?

Huawei AppGallery Brand Resource Bidding Promotion Service Rules

A brief analysis of the charm of the "Canyon Treasure Hunt" event planning in King of Glory

Tmall responds to the fake number of reservations for Hammer smartphones: It was done by programmers

Baidu information flow advertising in 7 major industries

User Operation | How to do user behavior path analysis?

Tesla, Dyson and others have all lost to ventilators abroad, and the price of ventilators in the domestic market has skyrocketed

Don’t do these things within 30 minutes before doing nucleic acid testing!

A guide to operating short videos on food “restaurant exploration”!

Weekly crooked review: The Internet is booming, but it seems to have nothing to do with us