How to explain deep learning to non-professionals?

Since last year, I have been doing a lot of AI popular science work. For a long time, I have been thinking about how to explain what deep learning is to someone who does not have a CS background, so that a non-technical investor, business manager, industry expert, media reporter, and even the general public can understand why deep learning is particularly effective and how AI helps people solve specific problems. Inspired by a short answer on Quora, I roughly formed the idea of using water flow veins to compare neural networks. I have tried this metaphor-based explanation method in speeches to people in the banking, education, and investment circles, and the effect is very good. Gradually, such an article was formed, and it was recently included in the popular science book "Artificial Intelligence" co-authored by Kai-Fu Lee and me.

[Note] It is particularly important to note that this article deliberately avoids mathematical formulas and mathematical arguments in its explanation of the concept of deep learning. This method of using a water pipe network to popularize deep learning is only suitable for the general public. For professionals who understand mathematics and computer science, such a description is quite incomplete and inaccurate. The metaphor of the flow control valve is not completely equivalent to the weight adjustment associated with each neuron in a deep neural network in mathematics. The overall description of the water pipe network also intentionally ignores important concepts in deep learning algorithms such as cost functions, gradient descent, and back propagation. Professionals who want to learn deep learning should still start with professional tutorials.

Fundamentally, deep learning, like all machine learning methods, is a process of mathematically modeling a specific real-world problem in order to solve similar problems in that field.

First of all, deep learning is a type of machine learning. Since it is called "learning", it is naturally similar to the learning process of human beings to some extent. Think back to how a human child learns?

How do human children learn? How do machines learn?

For example, many children use flashcards to learn Chinese characters. From the tracing books used by people in ancient times, such as "Shang Daren and Kong Yiji", to the flashcard apps that teach children to read characters on mobile phones and tablets today, the most basic idea is to let children repeatedly read the various ways of writing each Chinese character in order from simple to complex (older children even have to learn to recognize different calligraphy fonts). After reading more, they will naturally remember it. The next time they see the same character, they will be able to recognize it easily.

This interesting process of learning Chinese characters may seem simple, but it is actually full of mysteries. When learning Chinese characters, the child's brain must have been stimulated by similar images many times, and then it has summarized some regularity for each Chinese character. The next time the brain sees a pattern that conforms to this regularity, it will know what the character is.

In fact, the same principle applies to teaching computers to recognize Chinese characters. The computer must first look at the pattern of each character many, many times, and then summarize a pattern in the computer's brain (processor plus memory). When the computer sees a similar pattern in the future, as long as it conforms to the previously summarized pattern, the computer will know what the pattern is.

In professional terms, the pictures that computers use to learn and look at repeatedly are called "training data sets"; in the "training data set", the properties or characteristics that distinguish one type of data from another type of data are called "features"; the process of the computer summarizing patterns in its "brain" is called "modeling"; the patterns summarized by the computer in its "brain" are what we often call "models"; and the process of computers summarizing patterns by repeatedly looking at pictures and then learning to recognize words is called "machine learning."

How do computers learn? What are the rules that computers summarize? This depends on what kind of machine learning algorithm we use.

There is a very simple algorithm that imitates the way children learn to read. Parents and teachers may have such experience: when children start to learn to read, for example, when we first teach children to distinguish "一", "二", and "三", we will tell them that the word written with one stroke is "一", the word written with two strokes is "二", and the word written with three strokes is "三". This rule is easy to remember and use. However, when we start to learn new characters, this rule may not work. For example, "口" is also written with three strokes, but it is not "三". We usually tell children that the word that forms a square is "口", and the word that is arranged in a horizontal row is "三". This rule is enriched by another layer, but it still cannot resist the increase in the number of characters. Soon, the children found that "田" is also a square, but it is not "口". At this time, we will tell the children that the word with a "十" in the square is "田". Later, we will probably tell the children that the word that sticks out from the top of "田" is "由", the word that sticks out from the bottom is "甲", and the word that sticks out from both the top and the bottom is "申". Many children, under the guidance of such step-by-step enriched characteristic rules, gradually learn to summarize the rules and remember new Chinese characters by themselves, and then learn thousands of Chinese characters.

There is a machine learning method called decision tree, which is very similar to the process of recognizing characters based on feature rules mentioned above. When the computer only needs to recognize the three characters "一", "二", and "三", the computer can distinguish them by simply counting the number of strokes of the Chinese characters to be recognized. When we add "口" and "田" to the set of Chinese characters to be recognized (training data set), the computer's previous judgment method fails, and other judgment conditions must be introduced. In this way, step by step, the computer can recognize more and more characters.

The attached figure shows the difference in the decision tree inside the computer before and after the computer learned the three new Chinese characters "由", "甲", and "申". This shows that after we "showed" the computer the three new Chinese characters and their characteristics, the computer summarized and remembered the new rules like a child and "recognized" more Chinese characters. This process is a basic machine learning.

Of course, this decision tree-based learning method is too simple and difficult to expand and adapt to different situations in the real world. As a result, scientists and engineers have invented many different machine learning methods.

For example, we can map the features of the Chinese characters "由", "甲", and "申", including whether there is a protruding stroke, the positional relationship between the strokes, etc., to a point in a certain space (I know, mathematical terms appear here again. But it doesn't matter. Whether you understand the true meaning of "mapping" does not affect the subsequent reading at all). In other words, in the training data set, the large number of different writing methods of these three characters become a large number of points in space in the eyes of the computer. As long as we extract the features of each character well enough, a large number of points in space will be roughly distributed in three different ranges.

At this time, let the computer observe the pattern of these points and see if it can use a simple segmentation method (such as drawing straight lines in space) to divide the space into several independent areas, so that the points corresponding to each word in the training data set are located in the same area. If this segmentation is feasible, it means that the computer has "learned" the distribution pattern of these words in space and established a model for these words.

Next, when seeing a new Chinese character image, the computer simply converts the image into a point in space, and then determines which character area the point falls in. Now, can't we know what character the image is, right?

Many people may have already noticed that using the method of drawing straight lines to divide a plane space (as shown in the attached figure) is difficult to adapt to thousands of Chinese characters and at least tens of thousands of different writing methods. If you want to correspond each different deformation of each Chinese character to a point in space, it is extremely difficult to find a mathematically straightforward method to divide and enclose the points corresponding to each Chinese character in different areas.

For many years, mathematicians and computer scientists have been troubled by similar problems. People continue to improve machine learning methods. For example, use complex high-order functions to draw ever-changing curves in order to separate the intersecting points in space, or simply find a way to turn two-dimensional space into three-dimensional space, four-dimensional space, or even hundreds, thousands, or tens of thousands of dimensions. Before deep learning became practical, people invented many traditional, non-deep machine learning methods. Although these methods have achieved certain results in specific fields, the world is so complex and diverse that no matter how elegant the modeling method people choose for computers, it is difficult to truly simulate the characteristic laws of all things in the world. This is like a painter trying to paint the true face of the world with a limited number of colors. Even if his painting skills are superb, it is difficult for him to achieve the word "realism".

So, how can we greatly expand the basic means by which computers can describe the laws of the world? Is it possible to design a highly flexible way of expression for computers, and then let the computers keep trying and searching in a large-scale learning process, summarizing the laws by themselves, until they finally find a way of expression that conforms to the characteristics of the real world?

Now, we are finally getting to deep learning!

Deep learning is a machine learning method that is flexible in its expressive power and allows computers to keep trying until they finally approach the target. In terms of mathematical essence, deep learning is not substantially different from the traditional machine learning methods mentioned above. Both methods hope to distinguish different categories of objects based on their characteristics in high-dimensional space. However, the expressive power of deep learning is vastly different from that of traditional machine learning.

Simply put, deep learning is to treat what the computer needs to learn as a large amount of data, throw this data into a complex, multi-level data processing network (deep neural network), and then check whether the resulting data obtained after being processed by this network meets the requirements - if it does, keep this network as the target model; if it does not, adjust the network parameter settings again and again, perseveringly, until the output meets the requirements.

This is still too abstract and difficult to understand. Let's change to a more intuitive way of speaking.

Assume that the data to be processed by deep learning is the "water flow" of information, and the deep learning network that processes the data is a huge water pipe network composed of pipes and valves. The entrance of the network is a number of pipe openings, and the exit of the network is also a number of pipe openings. This water pipe network has many layers, and each layer has many regulating valves that can control the flow direction and flow rate of water. According to the needs of different tasks, the number of layers of the water pipe network and the number of regulating valves in each layer can have different combinations. For complex tasks, the total number of regulating valves can be tens of thousands or even more. In the water pipe network, each regulating valve in each layer is connected to all the regulating valves in the next layer through water pipes, forming a water flow system that is fully connected from front to back and layer by layer (this is a relatively basic situation. Different deep learning models have different installation and connection methods of water pipes).

So, how can computers use this vast network of pipes to learn to read?

For example, when the computer sees a picture with the word "田" written on it, it simply converts all the numbers that make up the picture (in the computer, each color point in the picture is represented by numbers composed of "0" and "1") into a flow of information and pours it into the water pipe network from the entrance.

We pre-insert a sign at each outlet of the water pipe network, corresponding to each Chinese character we want the computer to recognize. At this time, because the input is the Chinese character "田", when the water flows through the entire water pipe network, the computer will run to the pipe outlet to see if the water flowing out of the pipe outlet marked with the character "田" is the largest. If so, it means that the pipe network meets the requirements. If not, we will give the computer a command: adjust each flow control valve in the water pipe network so that the digital water "flowing out" from the "田" outlet is the largest.

Now, the computer will be busy for a while, adjusting so many valves! Fortunately, the computer has a fast calculation speed. Through brute force calculation plus algorithm optimization (actually, it is mainly a sophisticated mathematical method, but we will not talk about mathematical formulas here, you just need to imagine the computer calculating desperately), it can always quickly give a solution and adjust all the valves to make the flow at the outlet meet the requirements.

Next, when learning the character "申", we will use a similar method to turn each picture with the character "申" into a large number of water flows composed of numbers, and pour them into the water pipe network to see if the pipe outlet with the character "申" has the most water flowing out. If not, we have to adjust all the regulating valves again. This time, we must ensure that the character "田" we just learned is not affected, and also ensure that the new character "申" can be processed correctly.

This process is repeated until the water corresponding to all Chinese characters can flow through the entire water pipe network in the desired way. At this point, we say that this water pipe network is already a trained deep learning model.

For example, the attached figure shows the process of the information water flow of the character "田" being poured into the water pipe network. In order to make more water flow out of the outlet marked with the character "田", the computer needs to adjust all the flow control valves in a specific way almost frantically, constantly experimenting and exploring until the water flow meets the requirements.

When a large number of literacy cards are processed by this pipeline network and all valves are adjusted to the right position, the entire water pipe network can be used to recognize Chinese characters. At this time, we can "weld" all the adjusted valves and wait for new water flow to arrive.

Similar to what is done during training, the unknown image will be converted into a stream of data by the computer and poured into the trained water pipe network. At this time, the computer only needs to observe which outlet has the most water flow, and that is the word written in the image.

Is it simple? Is it magical? Is deep learning a learning method that relies on crazy adjustment of valves to "make up" the best model? Why should each valve in the entire water pipe network be adjusted in this way? Why should it be adjusted to such an extent? Is it completely determined by the final water flow at each outlet? Is there really no profound reason in this?

Deep learning is basically a semi-theoretical and semi-empirical modeling method that uses human mathematical knowledge and computer algorithms to build an overall architecture, and then combines as much training data as possible and the computer's large-scale computing power to adjust internal parameters to get as close to the problem target as possible.

The basic idea guiding deep learning is a pragmatic one.

Isn't it to understand more complex laws of the world? Then we can keep increasing the number of adjustable valves in the entire water pipe network (increase the number of layers or increase the number of regulating valves in each layer). Don't we have a large amount of training data and large-scale computing power? Then we can let many CPUs and many GPUs (graphics processing units, commonly known as graphics chips, originally dedicated to drawing and playing games, and happen to be particularly suitable for deep learning calculations) form a huge computing array, so that the computer can learn the hidden laws in the training data while desperately adjusting countless valves. Perhaps it is precisely because of this pragmatic thinking that the perception ability (modeling ability) of deep learning is far stronger than traditional machine learning methods.

Pragmatism means not seeking to understand everything. Even if a deep learning model has been trained to be very "smart" and can solve the problem very well, in many cases, even the person who designed the entire water pipe network may not be able to explain why each valve in the pipe is adjusted in this way. In other words, people usually only know whether the deep learning model works, but it is difficult to say what kind of causal relationship exists between the value of a certain parameter in the model and the perception ability of the final model.

This is really interesting. The most effective machine learning method in history is actually a "black box" that many people think is only understandable and cannot be explained.

A philosophical speculation triggered by this is that if people only know what the computer has learned to do, but cannot explain what kind of rules the computer has mastered in the learning process, will the learning itself get out of control?

For example, many people are worried that if we continue to develop in this way, will computers quietly learn something that we don't want them to learn? In addition, in principle, if the number of layers of deep learning models is infinitely increased, will the computer's modeling ability be comparable to the ultimate complexity of the real world? If the answer is yes, then as long as there is enough data, computers can learn all possible knowledge in the universe - what will happen next? Do you have some concerns about the intelligence of computers surpassing humans? Fortunately, experts have not yet reached a consensus on whether deep learning is capable of expressing complex knowledge at the level of the universe. At least humans are relatively safe in the foreseeable future.

One more thing to add: At present, some visualization tools have emerged to help us "see" what deep learning looks like when performing large-scale operations. For example, Google's famous deep learning framework TensorFlow provides a web-based tool (Tensorflow — Neural Network Playground), which uses easy-to-understand diagrams to draw the real-time characteristics of the entire network performing deep learning operations.

The attached figure shows what a deep neural network with 4 intermediate layers (hidden layers) looks like when learning a training data set. In the figure, we can intuitively see the direction and size of the data "flow" between each layer of the network and the next layer. We can also change the basic settings of the deep learning framework on this webpage at any time and observe the deep learning algorithm from different angles. This is very helpful for us to learn and understand deep learning.

Note: This article is excerpted from the book Artificial Intelligence by Kai-Fu Lee and Yong-Gang Wang

<<: In-depth understanding of Android's rendering mechanism

>>: When knowledge graphs “meet” deep learning

Commonly used words by anchors in live e-commerce live broadcast rooms

No matter how vague the requirements given by your boss are, you don’t have to be afraid once you learn this method from Tencent experts!

I recently read the book "Deliberate Practic...

After the rules of the game for new energy vehicles have changed, Korean batteries are having a hard time gaining a foothold

On March 30, the new energy vehicle battery facto...

How to explain deep learning to non-professionals?

Commonly used words by anchors in live e-commerce live broadcast rooms

What is the meaning of the peach wood Wenchang Tower? What are the taboos for placing the peach wood Wenchang Tower?

It’s cockroach season again! Why can’t we kill or eradicate these “little cockroaches”?

How to write high-conversion story copy? Use these 5 principles!

A perfect ending! Goodbye, Tianzhou-3

Self-propagation: How to get users to voluntarily forward and recommend your products?

Li Zhongying's video on children's learning ability and results

Three lies about smartphones

How to build a user growth system with the help of distribution methods?

What will we see when we open the “Millennium Han Tomb”?

Recommend

Summary of Android's dominant control View

From ancient times to the present, has human beings themselves been evolving?

Why doesn’t Pinduoduo have a shopping cart?

Douyu product function analysis

How to prove the Pythagorean theorem? Einstein or Zhao Shuang, whose method is simpler?

Where did humans come from? This "landing pioneer fish" gives a scientific answer to the question

Don’t let your physical examination go to waste! Please learn to understand the symbols on the report →

No matter how vague the requirements given by your boss are, you don’t have to be afraid once you learn this method from Tencent experts!

After the rules of the game for new energy vehicles have changed, Korean batteries are having a hard time gaining a foothold

2500 words user retention analysis

Worse than staying up late! How harmful is “fragmented sleep”?

How to leverage Moments ads to attract new users?

Super practical Baidu information flow account volume skills

10 common misunderstandings about operations!

Free resources that APP promotion novices must know