Making machines creative has always been one of the highest ideals of artificial intelligence. Therefore, generation tasks have become the standard for measuring machine creativity. These generation tasks include generating text (question and answer, dialogue, poetry, novels), as well as generating pictures and videos. In this issue, we will introduce the technology behind the AI painting that has recently swept the screen. Image source: pixabay 1. The birth of the adversarial network model (GAN) Until a few years ago, AI performed very poorly on generative tasks. They either strictly repeated templates and rules, or could only speak nonsense. It was not until the birth of the adversarial generative network model that we could both allow artificial intelligence to generate creativity and control the quality of these creativity. The English abbreviation of adversarial generative network is GAN. It consists of a generator (G) and an adversarial (classifier) (A). The generator needs to continuously train itself to get realistic pictures and fool the classifier; while the classifier should try to distinguish the generated pictures from the real pictures. Ian Goodfellow, the author of GAN, used an imaginary policeman and a counterfeiter in his original paper as an example. The classifier is the policeman, and the generator is the counterfeit factory. At first, the policeman could only distinguish real and fake banknotes by the rough pattern, so the counterfeit factory could get away with just printing the main pattern of the banknote. In order to solve the problem of counterfeit banknotes, the police began to check the shading, microtext and watermarks on the banknotes. As a result, a large number of counterfeit factories that could not imitate these features had to close down, and the remaining banknotes were obviously more counterfeit and could make more realistic banknotes. In order to continue to distinguish these counterfeit banknotes, the police found that the paper used for these counterfeit banknotes was different from that of real banknotes, and the feel was also different, so the counterfeiters had to start thinking about how to more accurately mix the proportion of compounds that imitated the pulp of real banknotes to make banknotes that felt the same... In this way, the ability of the banknote detector became stronger and stronger, but it also made the counterfeit machine's imitation ability stronger. 2. “Learning” in Confrontation Specifically, the generator and classifier in GAN are two neural networks. Neural network is a basic technology in machine learning. It can be understood as a "machine": after taking in a piece of data to be processed, it spits out the result we want. If we want to judge whether the picture is a dog or a cat, the result spit out is a number: 1 (representing a dog) or 0 (representing a cat); if we want to use artificial intelligence to translate, the input is text in one language, and the output is text in another language. Without training, neural networks can only output completely unreliable or random results. But if you can find a lot of data with known correct outputs, you can train the neural network to get the correct answer. We call this process "learning". GAN's classifier has a similar mechanism, which can tell whether a picture is real or generated. The generator can generate an image based on a number or vector. At the beginning, the generator has no goal and can only generate random data, such as blurry images or even pure noise. After a simple training, the classifier can easily distinguish these bad pictures from real pictures. At this time, the generator needs to train itself to try to fool the simple classifier. Then, this process will be repeated. First, the classifier must learn how to distinguish the images spit out by the enhanced generator, and the generator must also improve itself to deceive the enhanced classifier. After tens of thousands or even hundreds of thousands of iterations, the generator will become powerful and generate more realistic images. Although the principle is relatively simple, GAN is very difficult to train. In the original GAN paper, the quality of the generated images was not very high. But then a large number of research teams made improvements in different directions, giving GAN many variants. Among them, StyleGAN is quite famous, which can generate extremely realistic faces. These faces are different from any existing faces and are completely created by computers. As a model for image generation, GAN still has many shortcomings. First, GAN training is very unstable, and sometimes the entire model will crash during training. Second, different scenarios require training different GAN models: if you want to generate cat pictures, you need to find a lot of cat pictures for training; if you want to generate pictures of human faces, you need to find a way to get a lot of pictures of human faces. But the types of requirements are infinite, and some scenarios can be very complex, such as "I want to generate a cat chasing a dog", which is difficult to solve with GAN. In other words, GAN can only understand training image data specifically for a certain scene, but cannot understand human language, so it cannot control the generation of images through text. These two problems are largely solved by OpenAI's DALLE model. We will introduce it in detail in the next video. The article is produced by Science Popularization China-Starry Sky Project (Creation and Cultivation). Please indicate the source when reprinting. Author: Guan Xinyu popular science author Reviewer: Yu Yang, Head of Tencent Xuanwu Lab |
<<: Fruit: Why are flowers colorful, but I am so monotonous?
>>: When the sun turns on "violent mode", the earth simply can't "bear it"...
In the context of knowledge payment , more and mo...
Doulele·Douyin unblocking encyclopedia, helping y...
[[135040]] Preface After the publication of "...
Tencent has a lot of advertising resources, so th...
「Black hat SEO spinach quick ranking」 Hubei SEO w...
...
This reading note will be divided into several se...
Apple's $3 billion acquisition of Beats has be...
When it comes to search engines, everyone used to...
As usual, after each major Android version is rel...
Recently, the Organization Department of the CPC ...
Xinhua News Agency, Beijing, February 23 (Xinhua)...
Since mid-2016, short video feeds and Native Ads ...
“The two core elements of customer acquisition ar...
This article begins with a study of cases with ex...