New AI image processing method! Tsinghua team proposes a method for generating “high-resolution” images

In the hot field of "AI image generation", a seamless diffusion model of arbitrary resolutions has emerged .

Recently, a research team from Tsinghua University and Zhipu AI jointly proposed a new cascade model - Relay Diffusion (RDM). It is reported that with this model, the diffusion process can be carried out seamlessly at any new resolution or model without having to start over from pure noise .

The related research paper titled “Relay Diffusion: Unifying diffusion process across resolutions for image synthesis” has been published on the preprint website arXiv, and the related code has been released on GitHub.

In recent years, diffusion models have achieved great success in image synthesis, significantly improving the quality of image synthesis. However, diffusion models still face great challenges when synthesizing high-resolution images . First, low-resolution noise scheduling is difficult to use directly for high-resolution images. Researchers need to carefully adjust the noise scheduling table for high-resolution scenes, and it is still difficult to obtain good results. Second, the high-resolution training process requires a lot of resources and has a high computational cost.

Currently, a commonly used solution is to train in latent space and then map it back to pixel space, as proposed by latent (stable) diffusion. However, this method is inevitably affected by low-level artifacts. Another solution is to train a series of super-resolution diffusion models of different resolutions to form a cascade. The existing cascade method is effective, but it requires complete sampling from noise at each stage, which is inefficient and the effect is heavily dependent on training techniques such as conditional enhancement.

In order to better solve the above problems, the research team proposed the cascade model Relay Diffusion. While having the advantages of the original cascade method, with the help of blurring diffusion and block noise, it can seamlessly connect between any different resolutions, just like a "relay race", greatly reducing the cost of training and sampling .

According to the paper, through discrete cosine transform spectrum analysis, it was found that the signal-to-noise ratio (SNR) corresponding to the same noise intensity at a higher resolution in the frequency space is higher in the low-frequency part, which means that the low-frequency information of natural images is not well destroyed.

To this end, the study proposed a block noise with correlation between pixels, whose corresponding SNR at high resolution is equivalent to the SNR of Gaussian noise at low resolution in the low-frequency part .

Taking 64×64 and 256×256 as examples, the overall process of Relay Diffusion is: first generate a low-resolution image through the standard diffusion process, then upsample it to a blurred high-resolution image with the same pixel value in each 4×4 grid, and then perform a blurring diffusion process (blurring diffusion) on each 4×4 grid independently.

This aligns the final state of the forward process with the upsampled blurred image, so the second stage of Relay Diffusion can directly start from the blurred image instead of pure Gaussian noise in the existing cascade method.

Experimental results show that compared with the traditional cascade diffusion model, Relay Diffusion omits the part of generating low-frequency information when generating high-resolution images, which greatly saves computing costs. It is also simpler and does not require low-resolution images as conditions and various conditional enhancement techniques, and does not require redesigning or adjusting the noise schedule.

In addition, Relay Diffusion can achieve better generation performance faster while saving costs. It achieves SoTA's FID on the unconditional dataset CelebA-HQ-256 and SoTA's sFID and competitive FID on the conditional dataset ImageNet-256, significantly exceeding models such as ADM, LDM, and DiT. Relay Diffusion also shows strong performance advantages when no classifier guidance (CFG) is used.

The research team said that the cascade model proposed in this study will help create more advanced text-to-image models .

In the future, they will continue to apply the relevant technologies in Relay Diffusion to the general field of literary graph models, so as to promote further research in this field.

Paper link:

https://arxiv.org/abs/2309.03350

GitHub address:

https://github.com/THUDM/RelayDiffusion

<<: You say the earth is round, where is the evidence?

>>: Reinventing "alchemy"! Can you get rich just by relying on microorganisms?

2021 hot marketing calendar for the whole year! Hot spot analysis included

Blog

"Zero-Basic Money-Making Copywriting Quickly" can multiply your income several times in one hour a day, and you can also write popular articles

Blog

Mosquito interception technology + Baidu passive drainage system 2.0 course video

Recommend

How does Pinduoduo, which has a super high conversion rate, make users unable to stop buying?

In the field of e-commerce, Pinduoduo may be the ...

"91 Ten Articles" - A daily must-read briefing for the new energy vehicle industry (210310)

1. Motorcycle company Dayun recently announced th...

Artificial intelligence needs to be viewed calmly. If we blindly follow the speculative craze, it may turn into a cold wave.

Artificial intelligence is a "real thing&quo...

Should you drink more porridge in winter to nourish your stomach or take tonic food? Middle-aged and elderly people should be careful about these four health care misunderstandings

The weather is cold, and there are many health-pr...

Cao Yu Back Training Baidu Cloud Download

Cao Yu back training resource introduction: Cours...

Guide to producing short video materials!

Short video ads really require a lot of material....

New AI image processing method! Tsinghua team proposes a method for generating “high-resolution” images

2021 hot marketing calendar for the whole year! Hot spot analysis included

"Zero-Basic Money-Making Copywriting Quickly" can multiply your income several times in one hour a day, and you can also write popular articles

Mosquito interception technology + Baidu passive drainage system 2.0 course video

How to plan an event with high user participation?

The "rice popcorn" that the birds in the trees love to eat, don't try it blindly｜Expo Daily

Some Misunderstandings about MVC/MVP/MVVM

Nature: Climate warming may trigger species spread and virus evolution

A guide to live streaming marketing techniques!

One picture to understand｜A full moon story of a "business trip" in space

8,000-word explanation of how online education institutions can create an excellent training camp product

Recommend

How does Pinduoduo, which has a super high conversion rate, make users unable to stop buying?

"91 Ten Articles" - A daily must-read briefing for the new energy vehicle industry (210310)

Artificial intelligence needs to be viewed calmly. If we blindly follow the speculative craze, it may turn into a cold wave.

A complete guide to the 2019 Tik Tok promotion plan, everything you want to know is here!

The two core elements of product promotion: content and channels

The Starship launch failed, but the next era of spaceflight is coming

How to use the points system to stimulate user retention

The exploration of the galaxy is endless, and we are committed to creating a "sky-asking tool"

Canalys: AI-enabled PC shipments to reach 13.3 million units in the third quarter of 2024

Taking P2P as an example: How to build a points system from 0 to 1?

Cocos 2d-x v3.7 released - unified! powerful! all in one!

Financial product operations: 4 key conversion funnels you must know

Should you drink more porridge in winter to nourish your stomach or take tonic food? Middle-aged and elderly people should be careful about these four health care misunderstandings

Cao Yu Back Training Baidu Cloud Download

Guide to producing short video materials!