From brushes to pixels: A brief introduction to the past and present of AI painting

From brushes to pixels: A brief introduction to the past and present of AI painting

Some things you need to know about AI painting.

Last week, the highly anticipated Midjourney V5 AI Art Generator was officially released, once again changing the world of AI-driven art creation. It has significantly enhanced image quality, more diverse output, a wider range of styles, as well as support for seamless textures, wider aspect ratios, improved image hints, extended dynamic range, and more.

The following images are generated by Midjourney V4 and Midjourney V5 with the prompt "Elon Musk introduces Tesla, a commercial from the 90s".

What meets people's expectations this time is that Midjourney V5 brings more realistic picture generation effects, more expressive angles or scene overviews, and finally the right "hands" . A joke that was once widely circulated in the AI ​​painting community was, "Never ask a woman's age or why an AI model hides its hands."

This is because AI art generators are “difficult to draw hands”. Although they can grasp visual patterns, they cannot grasp the underlying biological logic. In other words, AI art generators can calculate that hands have fingers, but it is difficult to know that a person should only have 5 fingers on a hand, or that these fingers should have a set length with a fixed relationship between them.

Over the past year, the “defect” of AI art generators that can’t render hands correctly has become a cultural trope. The hand problem is partly related to the ability of AI art generators to infer information from the massive image datasets they’re trained on.

It is worth noting that Midjourney V5 does a good job of generating realistic human hands. Most of the time, the hands are correct, with 5 fingers on a hand instead of 7-10.

The release of Midjourney V5 has aroused a surge in the interest of users around the world. The huge influx of traffic caused the Midjourney server to crash for a short time, which in turn made many users unable to access it. In addition, OpenAI's DALL·E 2, Stability AI's Stable Diffusion and other "text map" models have also been hot topics of discussion in the industry .

When people input any text into these "text-image" models, they can generate relatively accurate pictures that match the description. The generated pictures can be set to any style, such as oil paintings, CGI renderings, photos, etc. In many cases, the only limitation comes from human imagination .

Previous life: A dream that started from DeepDream

In 2018, the first AI-generated portrait, Edmond de Belamy, was created by a generative adversarial network (GAN) as part of Obvious Art’s “La Famille de Belamy” series and was eventually sold at a Christie’s art auction for $432,500 .

In 2022, Jason Allen's AI-generated work "Théâtre D'opéra Spatial" won first place in the Colorado State Fair's annual art competition.

In recent years, various "Venice graph" models have also appeared one after another in people's expectations. When neural networks achieved certain results in image processing, researchers began to develop some visualization techniques to better understand how these neural networks view the world and classify it, thus shaping one "Venice graph" model after another.

DeepDream generates images based on the representations learned by the neural network. After taking the input image, it runs the trained convolutional neural network (CNN) in reverse and tries to maximize the activation of the entire layer by applying gradient ascent. The figure below (left) shows the original input image and its DeepDream output.

Surprisingly , the output images contain many animal faces and eyes, which is because DeepDream was trained using the ImageNet database (examples of different dog breeds and birds). For some people, the images generated by DeepDream are similar to dreamlike psychedelic experiences . But even so, DeepDream has accelerated the work of people using AI as a tool for artistic image creation.

Neural Style Transfer is a deep learning-based technology that can combine the content of one image with the style of another image , such as the image above (right), where Van Gogh's "Starry Night" is applied to the target image. Neural Style Transfer redefines the loss function in CNN to achieve this - retaining the target image through high-level activations of CNN, and capturing the style of other images through multiple layers of activation. As a result, the output image will retain the style and content of the input image.

In 2017, Wei Ren Tan et al. proposed the model " ArtGAN ", which, although the images it outputs do not look like the works of painters at all, still captures the low-level features of artworks. As a result, ArtGAN has inspired more researchers to use GAN to generate artistic images .

Soon after, Ahmed Elgammal et al. proposed the Creative Adversarial Neural Network “ CAN ” to train GAN to generate images that are considered art by the discriminator but do not conform to any existing artistic style. The images produced by CAN mostly look like an abstract painting, giving it a unique feel .

In 2017, Phillip Isola et al. created the conditional GAN, pix2pix , which takes an input image and generates a transformed version. For example, in real life, if we have an RGB image, we can easily convert it to a BW (black and white binary image) version. But if we want to turn a BW image into a color image , it is time-consuming to manually colorize it. pix2pix can automatically complete this process and can be applied to any dataset of image pairs without adjusting the training process or loss function.

pix2pix is ​​a major breakthrough in generative AI, but it requires corresponding image pairs for training, which is not feasible for all applications. For example, without a corresponding photo for every painting created by Monet, pix2pix cannot convert the input into a Monet painting.

To this end, Jun-Yan Zhu, Taesung Park and others proposed "CycleGAN", which extends pix2pix by combining two conditional GANs and a "cycle" between them. This model can convert images to other modalities without seeing paired images in the training set.

This life: The battle between Transformer and Diffusion

A major turning point occurred in 2021, when some "Venice" models came out . OpenAI released DALL·E , named after Pixar's cartoon "Wall-E" and surrealist painter Salvador Dali. DALL·E combines discrete variational autoencoders (dVAE) that learn to map images to low-dimensional tags and Transformer models that autoregressively model text and image tags. Given a text input, DALL·E can predict the image tag and decode it into an image during inference.

DALL·E can also combine concepts that it has learned separately but has never seen in a single generated image . For example, there are illustrations of robots and dragons in the training set, but no dragon-shaped robots. When prompted with “robot dragon”, the model can still generate corresponding images.

However, while DALL·E can generate comics and artistically styled images well , it cannot accurately generate realistic photos. Therefore, OpenAI invested a lot of resources to create an improved graphic model - DALL·E 2 .

DALL·E 2 uses the CLIP (Dataset of Image Text Pairs) text encoder . DALL·E 2 exploits the relationship between text descriptions and images to provide an embedding for the Diffusion model that reflects text input and is more suitable for image generation. Compared with DALL·E, DALL·E 2 improves the quality of images and also allows users to extend the background of existing images or computer-generated images . For example, characters from some famous works can be placed in a custom background.

Soon after, Google released a text-generated image model called Imagen. This model used a pre-trained encoder from the NLP model T5-XXL, whose embeddings were fed back into the Diffusion model. As a result, this model was able to more accurately generate images containing text (a problem that OpenAI’s model had difficulty with).

However, the biggest revolution in the field of “vintage graphs” may be the fully open source Stable Diffusion released by Stability AI . Stable Diffusion is much more computationally efficient than other vintage graph models. Previous vintage graph models required hundreds of days of GPU computing, while Stable Diffusion requires much less computing, making it more accessible to people with limited resources. It also allows users to modify existing images through image-to-image conversion (such as turning a sketch into digital art) or painting (removing or adding something to an existing image).

Deep learning and its image processing applications are now at a completely different stage than they were a few years ago. At the beginning of the last century, deep neural networks were groundbreaking in their ability to classify natural images. Today, these landmark models, either using Transformers or based on Diffusion models, are able to generate highly realistic and complex images based on simple text prompts, making the field of "text images" shine and become a new brush in the art world .

"Threat" or "Symbiosis", where will human painters go?

AI artist has been controversial since its inception. Copyright disputes, output of wrong information, algorithmic bias, etc. have put the "Wenshengtu" application at the forefront of the storm again and again. For example, in January this year, three artists filed a lawsuit against Stability AI and Midjourney, the creators of Stable Diffusion and Midjourney, and DreamUp's artist portfolio platform DeviantArt. They claimed that these organizations violated the rights of "millions of artists" and used 5 billion pictures grabbed from the Internet to train AI models "without the consent of the original artists."

Most artists are afraid that they will be replaced by robots and lose their livelihoods because AI imitates models of their unique styles. In December last year, hundreds of artists uploaded pictures to ArtStation, one of the largest art communities on the Internet, saying "no to AI -generated images ." At the same time, some artists pessimistically believe that "we are watching the death of art unfold." The copyright issue surrounding the images used in the training data is still controversial.

Of course, there are also some artists who actively embrace AI and use the Wenshengtu model as their own painting assistant to save repetitive and boring work. At the same time, some artists use AI as an "engine" of imagination . In the interaction with users in software and communities like Midjourney, they tear each other apart and produce new and interesting human aesthetics, which then overflow into the real world. As Midjourney describes: "AI is not a replica of the real world, but an extension of human imagination."

Currently, regulators are catching up with AI artists . Recently, the U.S. Copyright Office stated in a letter that images in graphic novels created using the AI ​​system Midjourney should not be protected by copyright. This decision is one of the first decisions made by a U.S. court or agency on the scope of copyright protection for works created by AI. In addition, some scholars have proposed a system that allows artists to apply carefully calculated perturbations to their art, Glaze, to protect artists from style imitation by the Vincent DIffusion model.

A series of “Vincent Image” applications allow artists and the public without programming knowledge to use these powerful models to generate visually stunning images. “Giving AI the power to create” , whether in painting or other fields, these tools can help artists express their creativity and may shape the future of art.

The role of AI in the arts will depend on how it is used and the goals and values ​​of those who use it, and it is important to remember that the use of these models should be guided by ethical and responsible considerations.

Reference Links:

https://arxiv.org/abs/2302.10913

https://arxiv.org/abs/2302.04222

https://tech.cornell.edu/news/ai-vs-artist-the-future-of-creativity/

https://www.taipeitimes.com/News/biz/archives/2023/02/24/2003794928

https://www.buzzfeednews.com/article/pranavdixit/ai-art-generators-lawsuit-stable-diffusion-midjourney

https://www.theverge.com/2023/1/16/23557098/generative-ai-art-copyright-legal-lawsuit-stable-diffusion-midjourney-deviantart

https://arstechnica.com/information-technology/2023/03/ai-imager-midjourney-v5-stuns-with-photorealistic-images-and-5-fingered-hands/

<<:  Are sprouted potatoes poisonous? Eating these 9 foods incorrectly can cause "poisoning", so be careful when eating them!

>>:  What happened to the sandstorm that ruined my circle of friends?

Recommend

Brand Marketing Promotion: How to design a poster?

Nowadays, countdowns are widely used, including b...

#千万IP创科学热门#丨Minnan flower bricks, can cement also bloom?

Minnan tiles are traditional architectural decora...

Why do people get carsick? | Graphic Science

Sometimes, the difference between people is that ...

27 dead and 158 injured! If you find a gas leak, don't do this

Gas is our production and life An indispensable e...

How do technical founders choose non-technical partners?

[[121542]] There are many articles and blogs that...

Some slightly more advanced uses of CocoaPods

[[150180]] I remember when I first started iOS de...

It’s time to enjoy the spring, but don’t get angry!

As the weather gradually warms up, it is a good t...

B station product analysis report!

The article provides a brief analysis of Bilibili...

Today is the Lesser Heat. Suddenly, the warm wind arrives.

No need to drink from Heshuo, just brew tea and f...