Using convolutional autoencoders to reduce noise in images

Preface

I was too busy at work this week. I wanted to write the Attention translation, but I couldn't find the time. I will write it later. This week, let's look at a simple autoencoder practical code. I won't introduce the theory of autoencoders in detail. You can find a lot of them online. The simplest autoencoder is to reproduce the input through an encoder and decoder. For example, we input an image into a network, and the encoder of the autoencoder compresses the image to obtain the compressed information. Then the decoder decodes the information to reproduce the original image.

The autoencoder is actually optimized by minimizing the difference between the target and the input, that is, making the output layer reproduce the original information as much as possible. Since the basic form of the autoencoder is relatively simple, there are many variants of it, including DAE, SDAE, VAE, etc. If you are interested, you can search for other related information on the Internet.

This article will implement two demos. The first part is to implement a simple input-hidden-output autoencoder. The second part will implement a convolutional autoencoder based on the first part to reduce the noise of the image.

Tool Description

TensorFlow 1.0
jupyter notebook
Data: MNIST handwriting dataset
Complete code address: NELSONZHAO/zhihu

***part

First, we will implement the simplest AutoEncoder with the structure shown above.

Loading data

Here, we use the MNIST handwriting dataset for experiments. First, we need to import the data. TensorFlow has encapsulated this experimental dataset, so it is also very simple for us to use.

If you want to display the data in grayscale, use the code plt.imshow(img.reshape((28,28)), cmap='Greys_r').

We can load our dataset through input_data. If you already have the MNIST dataset (four compressed packages) locally, you can put these four compressed packages in the directory MNIST_data, so that TensorFlow will directly extract the data without downloading it again. We can use imshow to view any image. Since the data we loaded has been processed into a 784-dimensional vector, it needs to be reshaped when it is displayed again.

Build the model

After we load the data, we can start the simplest modeling. Before that, let's first get the size of the input data. The image we loaded is a 28x28 pixel block, and TensorFlow has already processed it into a 784-dimensional vector for us. At the same time, we also need to specify the size of the hidden layer.

Here I specify 64. The smaller the hidden_units, the more information is lost. You can also try other sizes to see the results.

AutoEncoder contains three layers: input, hidden and output.

In the hidden layer, we use ReLU as the activation function.

At this point, a simple AutoEncoder has been constructed. Next, we can start the TensorFlow graph for training.

Visualization of training results

After the above steps, we have constructed a simple AutoEncoder. Now we will visualize the results to see its performance.

Here, I selected 5 samples from the test dataset for visualization. Similarly, if you want to observe grayscale images, specify the cmap parameter as 'Greys_r'. The first row above is the original image in the test dataset, and the second row is the image after being reproduced by AutoEncoder. You can clearly see the loss of pixel information.

Similarly, we can also visualize the hidden layer compressed data, the results are as follows:

These five images correspond to the hidden layer compressed images of the five images in the test.

Through the simple example above, we understand the basic working principle of AutoEncoder. Next, we will further improve our model and convert the hidden layer into a convolutional layer to perform image denoising.

Some code is omitted in the above process. Please go to my GitHub to view the complete code.

Part 2

Based on the understanding of the working principle of AutoEncoder above, in this section we will add multiple convolutional layers to AutoEncoder to perform image denoising.

We still use the MNIST dataset for the experiment. The steps of data import are not described here. Please download the code to view it. Before we start, let's take a look at our entire model structure through a picture:

We input a noisy image into the model and give the model a noise-free image at the output, allowing the model to learn the denoising process through the convolutional autoencoder.

Input Layer

The input layer here is different from the input layer in the previous part because we are going to use convolution operation here. Therefore, the input layer should be an image of height x width x depth. The depth of a general image is three layers in RGB format. Here, the depth of our MNIST dataset is only 1.

Encoder Convolutional Layer

The encoder convolution layer sets up three convolution layers plus a pooling layer to process the image.

In the first convolution layer, we used 64 filters of size 3 x 3. The strides defaulted to 1. After the padding was set to same, our height and width would not be changed. Therefore, after the first convolution layer, the data we obtained changed from the original 28 x 28 x 1 to 28 x 28 x 64.

Next, the convolution result is subjected to max pooling. Here, I set the size and stride to 2 x 2. The pooling operation does not change the depth of the convolution result, so the size after pooling is 14 x 14 x 64.

I will not go into detail about other convolutional layers. The activation function of all convolutional layers is ReLU.

After three layers of convolution and pooling operations, the conv3 we get is actually equivalent to the hidden layer of AutoEncoder in the previous part. The data of this layer has been compressed to a size of 4 x 4 x 32.

At this point, we have completed the convolution operation on the Encoder side, and the data dimension has changed from the initial 28 x 28 x 1 to 4 x 4 x 32.

Decoder Convolutional Layer

Next, we will start the convolution on the decoder side. Before that, some friends may ask, since the encoder has already rolled the image into 4 x 4 x 32, if we continue to perform convolution in the decoder, won’t the size of the data get smaller and smaller? Therefore, on the decoder side, we do not simply perform convolution operations, but use a combination of Upsample (which can be translated into upsampling in Chinese) + convolution.

We know that the convolution operation is to scan each patch in the image through a filter, and then perform nonlinear processing after weighted summation of the pixel blocks in the patch. For example, if the size of our patch in the original image is 3 x 3 (in layman's terms, we take a 3 x 3 pixel block from a picture), and then we use a 3 x 3 filter to process this patch, then this patch becomes a 1 pixel block after convolution. In Deconvolution (or transposed convolution), this process is reversed, and a 1 pixel block will be expanded to a 3 x 3 pixel block.

However, Deconvolution has some drawbacks. It will cause checkerboard patterns to appear in the image. This is because there will be a lot of overlap in the filters during the Deconvolution process. To solve this problem, someone proposed using Upsample plus convolutional layers to solve it.

There are two common ways of upsampling, one is nearest neighbor interpolation and the other is bilinear interpolation.

This article will also use the Upsample plus convolution method to perform processing on the Decoder side.

TensorFlow also encapsulates the Upsample operation. We use resize_nearest_neighbor to resize the result of the Encoder convolution and then perform convolution again. After three Upsample operations, we get a data size of 28 x 28 x 64. Finally, we need to perform another convolution on this result to process it into the size of our original image.

***Define loss and optimizer in one step.

We used cross entropy to calculate the loss function, and the learning rate of the optimization function was 0.001.

Constructing Noise Data

Through the above steps, we have constructed the entire convolutional autoencoder model. Since we want to use this model to reduce the noise of the image, we also need to construct our noise data based on the original data before training.

Let's take a look at how to add noise through a simple example above. We get a picture data img (size 784), add the noise factor and multiply it by the random number, and the pixels on the picture will be changed. Then, since each pixel data of MNIST data is processed into a number between 0 and 1, we use numpy.clip to clip the picture with noise added to ensure that each pixel data is still between 0 and 1.

The operation of np.random.randn(*img.shape) is equivalent to np.random.randn(img.shape[0], img.shape[1])

Let's take a look at the image comparison before and after adding noise.

Training the model

After introducing model construction and noise processing, we can then train our model.

When training the model, our input has become the data with noise added, and the output is our original data without noise. The main thing is to reshape the original data into the same format as inputs_. Due to the depth of the convolution operation, the model training is a bit slow, so it is recommended to use GPU to run.

Remember to ***close sess.

Results Visualization

After the long training above, our model is finally trained. Next, let's take a look at the effect of the model through visualization.

It can be seen that through the convolutional autoencoder, our noise reduction effect is still very good. The final generated image looks very smooth and the noise is almost invisible.

Some of you may think that we can also use the basic input-hidden-output structure of AutoEncoder to achieve noise reduction. Therefore, I also implemented a model that uses the simplest input-hidden-output structure for noise reduction training (the code is on my GitHub). Let's take a look at its results:

It can be seen that compared with the convolutional autoencoder, its noise reduction effect is worse, and some noise shadows can still be seen in the reshaped image.

Conclusion

So far, we have completed the basic version of the AutoEncoder model, and added a convolutional layer to perform image denoising. I believe you have a preliminary understanding of AntoEncoder.

The complete code is available on my GitHub (NELSONZHAO/zhihu), which contains six files:

BasicAE, the basic version of AutoEncoder (including jupyter notebook and html files)
EasyDAE, a basic version of the denoising AutoEncoder (including jupyter notebook and html files)
ConvDAE, Convolutional Denoising AutoEncoder (including jupyter notebook and html files)

If you think it’s good, you can give my GitHub a star!

<<: Training deep residual neural networks based on boosting principle

>>: ARKit & OpenGL ES - ARKit principle and implementation

After smartphones and TVs became popular, why can’t smart refrigerators and washing machines succeed?

Inside the Guanghan Palace, Chang'e-3 continues to create new miracles - Chang'e-3 lander's lunar surface work exceeds 100 lunar days

Blog

[Moving bricks for profit] Mengtu earns 100,000 yuan a year in the first phase of the basic course + advanced course of the brick-moving team

[Moving bricks for profit] Mengtu earns 100,000 y...

Struggling! After dating for a year and a half, I found out that my boyfriend has a family history of cancer. Is it hereditary? Should I break up?

Some time ago, I was bored and browsing posts in ...

Using convolutional autoencoders to reduce noise in images

After smartphones and TVs became popular, why can’t smart refrigerators and washing machines succeed?

BMW recalls some imported M760Li due to fuel inlet pipe problems

Authoritative research: This latest diet can extend your life by at least 10 years?

Are NetEase’s 2017 H5s that went viral thoughtful enough?

Striving for strength at sea, building a "national treasure" - Wan Buyan, a "model of the times"

Top 10 rumors in WeChat Moments in the first half of 2014

Samsung's mobile phone business is hit hard, and the empire may be coming to an end

Nvidia launches its first open autonomous driving platform, which will be officially launched on October 1

"Lost in Russia" is available for free online. Who did Xu Zheng offend?

Inside the Guanghan Palace, Chang'e-3 continues to create new miracles - Chang'e-3 lander's lunar surface work exceeds 100 lunar days

Recommend

Long picture | How beautiful is love? See the love code of a scientist couple

Good immunity means no illness! Do more "10 ones" to improve immunity

Gaocheng SEO training: How to correctly select keywords for online promotion and marketing?

[Moving bricks for profit] Mengtu earns 100,000 yuan a year in the first phase of the basic course + advanced course of the brick-moving team

How do advertisers choose “KOL” to promote their products?

The moon is shrinking all the time. Will it shrink into the size of a raisin?

How much does it cost to customize the recording of a video of a Russian beauty holding a sign and shouting?

More than 10 Chinese-style boutiques gathered at Huawei App Market Carnival to bring you an immersive gaming experience

Write a good product promotion plan, only these 6 tips

The marketing hotspot for the whole year of 2018, a must-have for operators and promoters! !

A comprehensive explanation of the Android advanced view coordinate system

Attention! These two common foods may increase the risk of colorectal cancer, Chinese scholars have recently found

Analysis of the elements and channels of online promotion plans!

Yiping expands the world | Huanwang joins hands with automotive media to open up new traffic for large screens

Struggling! After dating for a year and a half, I found out that my boyfriend has a family history of cancer. Is it hereditary? Should I break up?