On November 9, 2015, Google released the artificial intelligence system TensorFlow and announced that it would be open source. This move had a huge impact on the field of deep learning and attracted great attention from a large number of deep learning developers. Of course, there are still many doubts about the field of artificial intelligence, but it is undeniable that artificial intelligence is still the trend of future development. TensorFlow became the most watched project on the day it was launched on GitHub. As the best way to build deep learning models and the leader of deep learning frameworks, it easily received more than 10,000 star ratings in the week of its release. This is mainly due to Google's remarkable research and development achievements in the field of artificial intelligence and its god-level technical talent pool. Of course, another point is that AlphaGo, which defeated humans for the first time in Go and then maintained an unbeaten record of 60 consecutive games, also implemented its reinforcement learning framework based on TensorFlow's advanced API. TensorFlow: Why it?As the second generation DL framework of Goolge, TensorFlow, which uses data flow graphs for calculations, has become one of the most popular frameworks in the fields of machine learning and deep learning. Since its release, TensorFlow has been constantly improving and adding new features. On February 26 this year, TensorFlow 1.0 was officially released at the first annual TensorFlow Developer Summit held in Mountain View. Its biggest highlight is that it achieves the fastest speed by optimizing the model, and it is incredibly fast. What is even more unexpected is that many supporters use the release of TensorFlow 1.0 to define the first year of AI.
TensorFlow has achieved the following results in the past:
Google's first-generation distributed machine learning framework DistBelief no longer meets Google's internal needs. Google's colleagues redesigned DistBelief to introduce support for various computing devices including CPU/GPU/TPU, and can run well on mobile devices such as Android devices, iOS, Raspberry Pi, etc., and support a variety of different languages (because of various high-level APIs, training only supports Python, inference supports including C++, Go, Java, etc.), and also includes great tools like TensorBoard, which can effectively improve the efficiency of deep learning researchers. The application of TensorFlow in Google's internal projects is also growing rapidly: it is used in many Google products such as Gmail, Google Play Recommendation, Search, Translate, Map, etc.; there are nearly 100 projects and papers using TensorFlow to do related work. TensorFlow has also achieved a lot in the past 14 months before the official release, including 475+ non-Google Contributors, 14,000+ commits, more than 5,500 github projects with TensorFlow in the title, 5,000+ answered questions on Stack Overflow, an average of 80+ issue submissions per week, and is used by some top academic research projects: – Neural Machine Translation – Neural Architecture Search – Show and Tell. Of course, deep learning is to use unsupervised or semi-supervised feature learning, hierarchical feature extraction and high-level algorithms to replace manual feature acquisition. Currently, researchers and developers engaged in deep learning use not only TensorFlow as a deep learning framework, but also many other excellent frameworks in the fields of vision, language, natural language processing and bioinformatics, such as Torch, Caffe, Theano, Deeplearning4j, etc. Below, the editor has compiled some of Duan Shishi’s blog posts that provide in-depth analysis of network neural models and algorithms to help you understand the power of TensorFlow, an open source deep learning framework. Deep understanding of Neural StyleThis article mainly uses Tensorflow to use CNN to do Neural Style related work on artistic photos. First, the author will explain in detail how the paper A Neural Algorithm of Artistic Style was done, and then combine an open source Tensorflow Neural Style version to appreciate the style of the great god. A Neural Algorithm of Artistic Style In the field of art, especially painting, artists create different contents and styles, and blend them to create independent visual experiences. If two images are given, current technology is fully capable of allowing computers to identify the specific content of the images. Style is a very abstract thing. In the eyes of computers, it is of course just some pixels, but the human eye can effectively distinguish the different styles of different painters and whether there are some more complex features to constitute it. When I first studied the DeepLearning paper, the essence of the multi-layer network is actually to find more complex and more intrinsic features, so the style of the image can theoretically be extracted through the multi-layer network to extract some interesting things. This article uses convolutional neural networks (using the pre-trained VGG network model) to do content and style reconstruction respectively, and considers the minimization of content loss and style loss during synthesis (actually also includes the loss of denoising changes), so that the synthesized image will ensure more accurate reconstruction of content and style.
methodsAfter understanding the above two points, the remaining issue is the data problem of modeling. Here, we calculate the loss separately according to Content and Style. The method of Content loss is relatively simple: Where F^l is the data representation of the generated Content Representation at the lth layer, P^l is the data representation of the original image at the lth layer, and the squared-error loss is defined as the error between the two feature representations. Style loss is basically the same as Content loss, except that it includes the sum of errors output by each layer. Where A^l is the data representation of the original style image at the lth layer, and G^l is the representation of the generated Style Representation at the lth layer. After defining the loss, we use optimization methods to minimize the model loss (note that there are only content loss and style loss in the paper). The source code also involves noise reduction loss: I won’t talk about the optimization method here, as tensorflow has built-in methods such as Adam to handle this. Deep understanding of AlexNetI have read some Tensorflow documents and some interesting projects before, and found that it is very complicated. I need to spend more time to understand it from the beginning, especially the cv part, which I am particularly interested in. In the next period of time, I will start to learn more about the models that have achieved good results in the ImageNet competition: AlexNet, GoogLeNet, VGG (yes, it is the pretrained model used in the neural network before), and deep residual networks. ImageNet Classification with Deep Convolutional Neural Networks ImageNet Classification with Deep Convolutional Neural Networks is the model structure used by Hinton and his student Alex Krizhevsky in the 2012 ImageNet Challenge, which refreshed the probability of Image Classification. Since then, deep learning has surpassed the state-of-art in the field of Image again and again, and even reached the point of defeating humans. While reading this article, I found many optimization techniques that I had seen sporadically before, but many of them were not deeply understood. This article explains how their Alexnet can achieve such good results. Okay, without further ado, let's start reading the article: This picture is the basic network structure of alexnet in caffe. It is relatively abstract here. The author uses caffe's draw_net to draw the network structure of alexnet: The basic structure of AlexNetAlexnet consists of 8 layers in total, of which the first 5 are convolutional and the last 3 are fully-connected. The article says that reducing any convolution will make the result very poor. The following is a detailed description of the composition of each layer:
The paper also points out that this picture was made under two GPUs, and there may be some differences between it and the Alexnet in Caffe, but this may not be the point. When using it, you can directly refer to the network results of Alexnet in Caffe. Each layer is very detailed, and the basic structural understanding is consistent with the above. Why AlexNet achieved better resultsI have talked about the basic network structure of AlexNet before. You may have some questions about some of them, such as LRN, Relu, and dropout. I believe that those who have come into contact with DL have heard of or learned about these. Here, I will explain in detail why these things can improve the performance of the final network according to the description in the paper. ReLU NonlinearityGenerally speaking, people who are new to neural networks and have not yet gained a deep understanding of deep learning are not very familiar with this. They are generally more familiar with the other two activation functions (which actually introduce nonlinear relationships into neural networks so that neural networks can effectively fit nonlinear functions) tanh(x) and (1+e^(-x))^(-1), and ReLU (Rectified Linear Units) f(x)=max(0,x). The training block of a deep convolutional network based on ReLU is several times that of a network based on tanh. The following figure shows the number of iterations of a four-layer convolutional network based on CIFAR-10 when tanh and ReLU reach a 25% training error: The solid line and the broken line represent the training error of ReLU and tanh respectively. It can be seen that ReLU can converge faster than tanh. Local Response NormalizationAfter using ReLU f(x)=max(0,x), you will find that the value after the activation function does not have a value range like the tanh and sigmoid functions, so generally a normalization is done after ReLU. LRU is a method proposed in the article (not sure here, should it be proposed?). In neuroscience, there is a concept called "Lateral inhibition", which talks about the influence of active neurons on their surrounding neurons. DropoutDropout is also a concept that is often mentioned. It can effectively prevent overfitting of neural networks. Compared with the general linear model that uses regularization to prevent model overfitting, Dropout in neural networks is achieved by modifying the structure of the neural network itself. For a certain layer of neurons, some neurons are randomly deleted with a defined probability, while keeping the individual neurons of the input layer and the output layer unchanged. Then, the parameters are updated according to the learning method of the neural network. In the next iteration, some neurons are randomly deleted again until the training is completed. Data AugmentationIn fact, the simplest way to enhance model performance and prevent model overfitting is to add data, but there are strategies for adding data. The paper randomly proposes 227*227 patches from 256*256 (224*224 in the paper), and also expands the data set through PCA. This effectively expands the data set. In fact, there are more methods to use depending on your business scenario, such as basic image transformations such as increasing or decreasing brightness, some filtering algorithms, etc. This is a particularly effective method, especially when the amount of data is not large enough. Deep understanding of GoogLeNetGoogLeNet is the champion of ILSVRC 2014. It is mainly a tribute to the classic LeNet-5 algorithm. It was mainly completed by Google team members. See the paper Going Deeper with Convolutions. Related work mainly includes LeNet-5, Gabor filters, and Network-in-Network. Network-in-Network improved the traditional CNN network and easily defeated the AlexNet network with a small number of parameters. The final size of the model using Network-in-Network is about 29MNetwork-in-Network caffe model. GoogLeNet borrowed the idea of Network-in-Network, which will be described in detail below. 1) Network-in-NetworkOn the left is the linear convolution layer of CNN. Generally speaking, the linear convolution layer is used to extract linearly separable features. However, when the extracted features are highly nonlinear, more filters are needed to extract various potential features. This leads to a problem: too many filters lead to too many network parameters, and the network is too complex, which puts too much pressure on calculation. The article mainly makes some improvements in two ways:
Finally, the author designed a 4-layer Network-in-network + global mean pooling layer to solve the imagenet classification problem.
The basic network results are as above, and the code can be found at https://github.com/ethereon/caffe-tensorflow. Due to the author's recent job changes, I don't have a machine to run this article, and I can't draw the basic network structure diagram. I will make up for it later. What is mentioned here is that the middle cccp1 and ccp2 (cross channel pooling) are equivalent to the convolutional layer of 1*1 kernel size. The implementation of NIN in caffe (omitted, please go to the original article to read) The introduction of NIN can actually be considered as deepening the depth of the network. By deepening the network depth (increasing the feature representation capability of a single NIN) and changing the original fully connected layer to an aver_pool layer, the number of filters required and the parameters of the model are greatly reduced. The experiment in the paper proves that the performance is the same as Alexnet, and the final model size is only 29M. After understanding NIN, you will no longer feel confused when looking at GoogLeNet. Pain points:
Inception moduleThe Inception module was proposed mainly considering that multiple convolution kernels of different sizes can hold information of different clusters in the image. For the convenience of calculation, the paper uses 1*1, 3*3, and 5*5 respectively, and adds a 3*3 max pooling module. However, there is a big hidden danger in calculation. The output filters of each layer of the Inception module will be the sum of the number of all filters in the branch. After multiple layers, the number of final models will become huge, and the naive inception will have a greater dependence on computing resources. The Network-in-Network model was mentioned earlier. The 1*1 model can effectively reduce the dimension (using less to express as much information as possible), so the article proposed "Inception module with dimension reduction". Under the premise of not losing the model's feature representation ability, the number of filters is minimized to achieve the purpose of reducing the complexity of the model. Overall of GoogLeNetThe basic code for constructing GoogLeNet in tensorflow is in https://github.com/ethereon/caffe-tensorflow (if you are too lazy to find it, the original text shows it). The author encapsulates some basic operations. After understanding the network structure, it is easy to construct GoogLeNet. After the new company, the author will try to write the network code of GoogLeNet based on tflearn. GoogLeNet on TensorflowFor the convenience of implementation, the author rewrote GoogLeNet using tflearn. The only difference between the code and the caffe model is the position of some padding. Because it is troublesome to change, the concat of the inception part must be kept consistent. I don’t know how to modify the pad value here (caffe prototxt), so the padding is set to the same. The specific code (omitted, the original text shows) If you are interested, you can take a look at this part of the caffe model prototxt. Please help check if there are any problems. The code author has submitted it to the official library of tflearn, add GoogLeNet(Inception) in Example. If you have tensorflow, please install tflearn directly to see if you can help check if there are any problems. Because there is no GPU machine here, it runs slowly. The TensorBoard graph is as follows, not as obvious as the previous Alexnet (mainly because it did not run so many epochs. When writing here, it was found that there was no disk space on the host. It was embarrassing, so I rewrote restore to run. The TensorBoard graph also seems to have some problems. It seems that each time it is loaded, it is not the same. But from the basic log, it is gradually converging. Here is the graph for you to see) Network structure, there is a bug here, it may be TensorBoard, googlenet graph may be too large, about 1.3M, can not be downloaded on Chrome, tried Firefox seems to be able to: Deep understanding of VGG\Residual NetworkI have just joined a new company and started to study DeepLearning and TensorFlow at work. I have been very busy. I read the papers on VGG and deep residual some time ago, but I have not had time to write them. Today I plan to reread these two related papers carefully. VGGnetVGGnet is a related work of the Visual Geometry Group team of Oxford at ILSVRC 2014. The main work is to prove that increasing the depth of the network can affect the final performance of the network to a certain extent. As shown in the figure below, the article improves the performance by gradually increasing the depth of the network. Although it seems a little violent and there are not many tricks, it is indeed effective. Many pretrained methods use VGG models (mainly 16 and 19). Compared with other methods, VGG has a large parameter space. The final model is more than 500m, alnext is only 200m, and googlenet is even less, so it usually takes longer to train a vgg model. Fortunately, there are public pretrained models for us to use very conveniently. The pretrained model used in the previous neural style article is as follows: It can be seen from the figure that from A to the last E, they increase the number of convolution layers in each convolution group. Finally, D and E are our common VGG-16 and VGG-19 models. In C, the author explains that the introduction of 1*1 is to consider linear transformation (the channels are consistent here, and no dimensionality reduction is performed). Later, in the analysis of the final data, C does have a certain degree of improvement over B, but it is not as good as D. The main advantage of VGG is
VGG-16 tflearn implementationThe official github of tflearn provides an implementation of VGG-16 based on tflearn from future import division, print_function, absolute_import
The VGG-16 graph is as follows: Regarding VGG, the author personally feels that it does not have many highlights. We can use the pre-trained model very well, but it is not as eye-catching as GoogLeNet. Deep Residual NetworkGenerally speaking, the deeper the network, the more difficult it is to train. Deep Residual Learning for Image Recognition proposes a residual learning framework that can greatly simplify the training time of the model network, allowing the model to be deeper (152 even tried 1000) within an acceptable time. This method achieved the best results in ILSVRC2015. As the depth of the model increases, the following problems arise:
In order to solve the performance degradation problem caused by increasing depth, the author proposes the following structure for residual learning: Assuming the potential mapping is H(x), let the stacked nonlinear layers fit F(x):=H(x)-x, residual optimization is easier than optimizing H(x). F(x)+x can be easily achieved through "shortcut connections". The main improvement of this article is to add residual learning to the traditional convolutional model and find approximately optimal identity mappings through residual optimization. A network structure in the paper: The Deep Residual Network tflearn implementation is described in detail in the original paper. Understanding Fast Neural StyleThe previous articles described the commonly used models in the field of Computer Vision. In the next period of time, the author will spend time learning some applications of TensorFlow in the field of Computer Vision, mainly analyzing related papes and source codes. Today, I will learn more about the related work of fast neural style. There are also articles analyzing the content of neural style. That article is the origin of neural style, but it cannot be applied to actual work. Why? It needs to specify the content image and style image every time, and then minimize the content loss and style loss to generate images. It takes a lot of time, and it is impossible to save a certain style of model. Therefore, each time an image is generated, it is a process of training a model. In fast neural style, the trained model of a certain style of image can be saved, and then the content image is transformed. Of course, the article also mentions another application direction of image transform: Super-Resolution, which uses deep learning technology to convert low-resolution images into high-resolution images. It is now used in many large Internet companies, especially video websites. Paper PrincipleA few months ago, I read an article related to Neural Style, TensorFlow: In-depth Understanding of Neural Style, A Neural Algorithm of Aritistic Style, which constructed a multi-layer convolutional network and generated an image that combined content and style by minimizing the defined content loss and style loss. It was very interesting. Perceptual Losses for Real-Time Style Transfer and Super-Resolution uses perceptual loss instead of per-pixels loss, uses a pre-trained vgg model to simplify the original loss calculation, adds a transform Network, and directly generates a style version of the content image. How is it achieved? Please see the figure below: The entire network is composed of parts: image transformation network, loss networkwrok; Image Transformation network is a deep residual conv networkwrok, which is used to directly transform the input image (content image) into an image with style; and the loss network parameters are fixed. The loss network here is consistent with the network structure in A Neural Algorithm of Aritistic Style, but the parameters are not updated and are only used to calculate content loss and style loss. This is the so-called perceptual loss. The author explains it this way: the pretrained convolution model of Image Classification has already learned perceptual and semantic information (scene and semantic information) very well, so the entire loss network behind is only for calculating content loss and style loss, unlike A Neural Algorithm of Aritistic Style to update the parameters of this part of the network, but the parameters of the previous transform network are updated here. So from the perspective of the entire network structure, the input image is transformed through the transform network to obtain the converted image, and then the corresponding loss is calculated. The entire network updates the previous transform network by minimizing this loss. Isn’t it simple? The calculation of loss is also very similar to the previous one, content loss: style loss: Gram matrix in style loss: Gram Matrix is a very important thing, it can ensure that y^hat and y have the same shape. The specific description of Gram can be found in this part of the paper. The author does not explain it clearly, I believe readers will understand it at a glance: I believe that after reading this, you basically understand how this paper is done in fast neural style. To summarize:
Note: The technical content of this article is authorized by deep learning engineer Duan Shishi . For the sake of reading experience, the content has been slightly modified and integrated, and some practical content has been streamlined. If you want to learn more about deep learning practices, please go to Xiao Shishi's Code Crazy Camp to read. 【Editor's recommendation】
[Editor: Lin Shishou TEL: (010) 68476606] |
<<: Born out of Gathering | Huawei China ICT Ecosystem Tour 2017 Changping Station Successfully Held
>>: Miscellaneous: MVC/MVP/MVVM (Part 1)
Pinduoduo focuses on the sinking market and is fa...
One of the qualities of a great scientist is his ...
For mature products with a large number of users,...
Under the supervision of the State Administration...
Operations generally consist of content operation...
This is the "evolution" of Douyin live ...
Working in the optimization industry, I deal with...
Have you ever noticed a phenomenon: people get fa...
First there was chicken shit vine, then there was...
Training course video lecture content introductio...
I must be honest, this article is the most time-c...
As an SEOer, how to make the website better inclu...
1. First you need to digest the information about...
Baidu framework account promotion and account ope...
WeChat has become a must-have app for everyone, a...