With just two changes, Apple gave StyleGANv2 the ability to generate 3D images

With just two changes, Apple gave StyleGANv2 the ability to generate 3D images

How to make an existing 2D GAN into 3D level? This is an interesting and practical problem.

To solve this problem, researchers from Apple and the University of Illinois at Urbana-Champaign tried to modify the classic GAN, StyleGANv2, as little as possible. The study found that only two modifications were absolutely necessary: ​​1) a multi-plane image style generator branch that produces a set of alpha maps conditioned on depth; 2) a discriminator conditioned on pose.

Paper address: https://arxiv.org/abs/2207.10642

The study calls the generated output "Generative Multi-Plane Image (GMPI)". The GMPI method not only has high rendering quality, but also ensures view consistency. More importantly, the number of alpha maps can be adjusted dynamically and can be changed during training and inference phases, thereby alleviating memory issues and quickly training GMPI at a resolution of 1024^2 in less than half a day.

First, let’s look at the performance of the GMPI method on three challenging common high-resolution datasets (FFHQ, AFHQv2, and MetFaces):




Method Introduction

This study modifies the classic generator by adding an "alpha branch" and combines it with simple and efficient alpha compositing rendering.

The framework of the GMPI generation method is shown in the figure below, where the generator and alpha synthesis renderer are responsible for generating an image I_v_tgt, which generates the target object in the user-specified pose v_tgt. It is ensured that the images generated for different poses are view-consistent.

The “alpha branch” uses the intermediate representation to generate a multi-plane image representation M, which contains alpha maps at different depths in addition to a single image.

More specifically, the study developed a new generator branch for StyleGANv2 that produces a set of front-parallel alpha maps, similar in nature to multi-plane images (MPIs). The study demonstrated for the first time that MPIs can be used as scene representations for unconditional 3D-aware generative models. This new alpha branch is trained from scratch and fine-tuned against the regular StyleGANv2 generator and discriminator. Combining the generated alpha maps with the single standard image output of StyleGANv2 for end-to-end differentiable multi-plane style rendering, the study achieved 3D-aware generation of different views while ensuring view consistency. Despite the limited ability of alpha maps to handle occlusion, rendering is very efficient. In addition, the number of alpha maps can be adjusted dynamically and can even differ during training and inference, thereby reducing memory burden.

The study found that in order to achieve 3D perception, it is absolutely necessary to adjust the discriminator according to the specific pose. On the other hand, it is also necessary to adjust the model according to the depth of the alpha map. The study made a simple modification to the original StyleGANv2 network by adding an additional alpha branch, as shown in Figure 3 below.

In order to obtain alpha maps that exhibit the expected 3D structure, the study found that two adjustments to StyleGANv2 are required: (a) the alpha map prediction for any plane in the MPI must be conditioned on the plane's depth or a learnable token; (b) the discriminator must be conditioned on the camera pose. While these two adjustments seem intuitive, they are surprisingly sufficient for a 3D-aware inductive bias.

Another inductive bias that improves alpha maps is to include 3D renderings with shadows. Although useful, the study found that this inductive bias is not necessary to obtain 3D perception. In addition, the researchers also found that some classic 2D GAN evaluation metrics can produce misleading results.

experiment

This study analyzed GMPI of various resolutions on three datasets (FFHQ, AFHQv2, and MetFaces).

The speed comparison and quantitative evaluation results are provided in Tables 1 and 2 below. With faster training, GMPI achieves better performance than SOTA models on 256^2 images and can generate high-resolution results up to 1024^2, which most baseline models cannot generate.

In order to analyze the effect of the key designs in the method, this study conducted ablation experiments and the results are shown in Table 3 and Figures 4 and 5.

Interested readers can read the original paper for more research details.

<<:  iOS 16 was revealed to have an abnormal Apple ID login bug

>>:  WeChat installation package has expanded 575 times in 11 years, and 98% of the files are garbage: Why is the size of the App getting bigger and bigger?

Recommend

Windows becomes open source software? Microsoft executives say it's possible

[[131206]] According to InfoWorld, after Microsof...

How to efficiently achieve active user growth?

“ Retention analysis is an important method and a...

Easy custom camera

Source code introduction This is a simple custom ...

As a programmer, these are the ten things you should invest in most

[[147539]] If you are already a very good program...

Can I apply for a mortgage payment deferral due to the epidemic? How to apply?

Recently, epidemics have occurred in many places i...

iOS StoreKit 2 New Features Analysis

1. Background At WWDC 2021, a new StoreKit 2 libr...

Kuaishou Operation Guide from 0 to 1

As the saying goes, "Everything is difficult...

Online traffic generation skills for training institutions!

Education is the foundation of a country. Parents...

Domestic products do not need Xiaohongshu-style marketing

Xiaohongshu has started to take the black-and-red...