Understand the four deep learning methods of supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning in one article

Understand the four deep learning methods of supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning in one article

Generally speaking, there are four main ways to train deep learning networks: supervised, unsupervised, semi-supervised, and reinforcement learning. In the following articles, the Computer Vision Team will explain the theoretical knowledge behind each of these methods. In addition, the Computer Vision Team will share terms that are often encountered in the literature and provide more resources related to mathematics.

Supervised Learning

Supervised learning uses examples for which the correct answer is known to be true. Imagine that we can train a network to recognize pictures of your parents from a photo gallery that contains pictures of your parents. Here are the steps we would take in this hypothetical scenario.

Step 1: Creation and classification of dataset

We're going to start the process by going through your photos (the dataset), identifying all the photos that have your parents in them, and labeling them. We're then going to split the entire pile into two piles. We're going to use the first pile to train the network (training data), and the second pile to see how accurate the model is at selecting photos of our parents (validation data).

Once the dataset is ready, we will feed the photos to the model. Mathematically, our goal is to find a function in the deep network that takes a photo as input and outputs 0 if your parents are not in the photo and 1 otherwise.

This step is often called a classification task. In this case, we are usually training a yes or no answer, but the fact is that supervised learning can also be used to output a set of values ​​other than just 0 or 1. For example, we can train a network to output the probability that a person will repay a credit card loan, in which case the output value is any value between 0 and 100. These tasks are called regression.

Step 2: Training

To continue the process, the model makes a prediction for each photo using the following rule (activation function) to decide whether or not to light up a particular node in the work. The model operates one layer at a time from left to right - we will ignore the more complex network for now. Once the network has calculated this for each node in the network, we will reach the rightmost node (the output node) that is lit (or not).

Now that we know which pictures have pictures of your parents, we can tell the model whether its prediction was right or wrong. We then feed this information back to the network.

This feedback, used by the algorithm, is the result of a function that quantifies how much the true answer deviates from the model's prediction. This function is called a cost function, also known as an objective function, utility function, or fitness function. The result of this function is then used to modify the strength and bias of connections between nodes in a process called backpropagation, as information propagates "backwards" from the resulting nodes.

We repeat this for each image, and in each case the algorithm tries to minimize the cost function.

There are many mathematical techniques that we can use to verify whether the model is correct or not, but we usually use a very common method called gradient descent. There is a "layman's theory" on Algobeans that explains how it works very well. Michael Nielsen has perfected this method with mathematical knowledge, including calculus and linear algebra.

http://neuralnetworksanddeeplearning.com/chap2.html

Step 3: Verification

Once we have processed all the photos in the first stack, we should be ready to test the model. We should take advantage of the second stack of photos and use them to verify whether the trained model can accurately pick out photos that include your parents.

We usually repeat steps 2 and 3 by tweaking various things about the model (hyperparameters), like how many nodes there are in it, how many layers there are, what mathematical functions are used to decide whether a node lights up, how aggressively and efficiently the weights are trained during the backpropagation phase, etc. You can understand this by browsing this post on Quora, which will give you a good explanation.

Step 4: Use

Finally, once you have an accurate model, you can deploy that model into your application. You can define the model as an API call, such as ParentsInPicture(photo), and you can call that method from your software, causing the model to perform inference and give the appropriate results.

We'll look at this exact process in detail later, writing an iPhone app that recognizes business cards.

It can be hard (aka expensive) to get a labeled dataset, so you need to make sure the value of the predictions justifies the cost of getting the labeled data and training the model in the first place. For example, getting labeled X-rays of people who are likely to have cancer is very expensive, but the value of getting an accurate model that produces few false positives and few false negatives is obviously very high.

Unsupervised Learning

Unsupervised learning is for situations where you have a dataset but no labels. Unsupervised learning takes an input set and tries to find patterns in the data. For example, organizing it into groups (clustering) or finding outliers (anomaly detection). For example:

•Imagine if you were a t-shirt manufacturer and had a bunch of people's body measurements. You might want to have a clustering algorithm that could group those measurements into a set of clusters to decide how big your XS, S, M, L, and XL shirts should be.

Some of the unsupervised learning techniques you will read about in the literature include:

• Autoencoding

http://ufldl.stanford.edu/tutorial/unsupervised/Autoencoders/

•Principal components analysis

https://www.quora.com/What-is-an-intuitive-explanation-for-PCA

• Random forests

https://en.wikipedia.org/wiki/Random_forest

• K-means clustering

https://www.youtube.com/watch?v=RD0nNK51Fp8

One of the most promising recent developments in unsupervised learning is an idea from Ian Goodfellow (while working in Yoshua Bengio’s lab) called “generative adversarial networks,” in which we connect two neural networks to each other: one network, which we call the generator, is responsible for generating data designed to try to fool the other network, which we call the discriminator. This approach has led to some amazing results, such as AI that can generate photorealistic images from strings of text or hand-drawn sketches.

Semi-supervised Learning

Semi-supervised learning combines a large amount of unlabeled data with a small amount of labeled data during the training phase. Compared to models that use all labeled data, models trained using the training set can be more accurate and cheaper to train.

The lesson about why using unlabeled data can sometimes help a model be more accurate is that even if you don’t know the answer, you can learn something about what the possible values ​​are and how often specific values ​​occur.

Benefits for math enthusiasts: If you are interested in semi-supervised learning, you can read Professor Zhu Xiaojin's slide tutorial and the 2008 literature review essay. (We will share these two in the shared file column of the platform)

Reinforcement Learning

Reinforcement learning is for situations where you again don’t have a labeled dataset, but you still have a way to tell whether you’re getting closer to a goal (a reward function). The classic children’s game “hotter or colder” (a variation of Huckle Buckle Beanstalk) is a great illustration of this concept. Your task is to find a hidden goal object, and your friends shout out whether you’re getting hotter (closer) or colder (away) from the goal object. “Hotter/colder” is the reward function, and the goal of the algorithm is to maximize the reward function. You can think of the reward function as a delayed and sparsely labeled form of data: instead of getting a specific “right/wrong” answer at each data point, you get a delayed response that only hints at whether you’re moving in the direction of your goal.

•DeepMind published a paper in Nature describing a system that combined reinforcement learning with deep learning to learn how to play a suite of Atari video games, some with great success (such as Breakout) and others less so (such as Montezuma's Revenge).

•The Nervana team (now at Intel) published a great explanatory blog post that goes into detail on these technologies, which you may want to read if you are interested.

https://www.nervanasys.com/demystifying-deep-reinforcement-learning/

•A very creative Stanford student project by Russell Kaplan, Christopher Sauer, and Alexander Sosa illustrates one of the challenges of reinforcement learning and proposes a clever solution. As you can see in the DeepMind paper, the algorithm failed to learn how to play Montezuma's Revenge. What's the reason? As the Stanford students describe it, "in an environment with scarce reward functions, reinforcement learning agents still struggle to learn." It's hard to find the hidden "key" when you don't get enough "hotter" or "colder" prompts. The Stanford students fundamentally taught the system to understand and respond to natural language prompts such as "climb down the ladder" or "get the key", making the system the highest-scoring algorithm in OpenAI gym. You can click on the algorithm video to watch a demonstration of the algorithm.

(http://mp.weixinbridge.com/mp/wapredirect?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F0B2ZTvWzKa5PHSkJvQVls b0FLYzQ%2Fview&action=appmsg_redirect&uin=Nzk3MTk3MzIw&biz=MzA5MzQwMDk4Mg==&mid=2651042109&idx=1&type=1&scene=0)

•Watch this algorithm on reinforcement learning, learn it well, and then play Super Mario like a big boss.

Richard Sutton and Andrew Barto wrote the book on reinforcement learning. You can also check out the second draft here: http://incompleteideas.net/sutton/book/the-book-1st.html

<<:  Interview with Jim Zemlin, Executive Director of the Linux Foundation: Open source has gone from being a niche genre to being known to everyone

>>:  Use of Android performance analysis tools

Recommend

What are the factors that affect the ups and downs of App Store rankings?

In the past half month, the rankings of many APP ...

After reading 10,000 negative reviews, I reviewed and reflected on my operations

So I thought of taking a look at what the users i...

B station community operation strategy!

Today we are going to talk about the small broken...

How to create a WeChat mini program store?

Since the launch of the WeChat Mini Program websi...

Teach you how to select Sogou advertising keywords in 5 minutes!

For bidders, no matter whether they are facing a ...

A new perspective on the four winning marketing rules

To design a successful home delivery activity pla...

Practical tips: How can APP achieve precise operation?

When it comes to APP operation methods , a simple...

KEEP product analysis

During the Spring Festival, Keep Live Class launc...

3 questions and 5 steps to help you create a successful online event

When doing an online event, we often feel uncerta...

How do iQiyi, Youku and Tencent Video monetize advertising?

Most products in the PC era have a clear product ...

How to build a personal brand to increase traffic?

1. Why should we build a personal brand? Many peop...

Handler, Looper and MessageQueue source code analysis

In Android, Handler can be used to update the cha...

Various styles of gesture sliding Cell

Source code introduction MGSwipeTableCell is a su...