A 3D world can be generated with just a picture. Is the era of spatial intelligence coming?

A 3D world can be generated with just a picture. Is the era of spatial intelligence coming?

Any photo can generate a 3D world that you can wander in. It sounds like a fantasy of science fiction writers, but with the development of AI technology, it has become a reality within reach.

On December 3, Stanford University professor Fei-Fei Li announced that her World Labs team had launched an AI system that can generate a 3D world from a single image. They named the system "Large World Model" (LWM) and called themselves a spatial intelligence AI company. As soon as the model was launched, it attracted the attention of netizens around the world, and some people said that the real-life version of Inception is here!

Screenshot of World Labs official website

In the live demo that has been opened, users can directly control it on the browser and experience the world created by World Labs. For example, if you input a photo of a museum scene, AI will help you imagine the entrance and exit, the next adjacent exhibition hall, and exhibits; if you input the world-famous painting "Outdoor Cafe at Night", you can walk into the painting and experience the complete neighborhood environment...

So, what is the difference between “large world models”? What are their specific application scenarios? And what impact will it have on the development of AI?

One picture, generate 3D world

“It’s still quite a surprise. Sora itself previously had a bit of the ‘flavor’ of the simulated world, but the ‘large world model’ is another technical route, and the industry as a whole feels it exceeded expectations,” said Ma Qianli, president of the China AIGC Industry Alliance Research Institute and co-founder of Unbounded AI.

The "large world model" can be simply regarded as a tool for artificial intelligence to form a virtual world: users only need to upload a picture, and the system can automatically generate a 3D virtual world within the corresponding range based on the environmental information in the picture.

In addition, users can easily browse this 3D world directly on the web page with a mouse or keyboard. Moreover, the generated 3D world is interactive, and users can freely move the camera to explore this 3D world like playing a game, and operations such as depth of field and zoom are all possible.

"Interactivity actually means inputting commands to the AI ​​through keyboard input or mouse movement, and it will render and generate the corresponding scene in real time based on the commands. Before this, the 3D scenes that everyone saw were all pre-built by humans," explained Zhu Linchao, doctoral supervisor at Zhejiang University.

What is amazing about this “large world model” is that it follows the basic rules of 3D geometric physics and has a real sense of depth and space.

In Ma Qianli's view, the compliance with the basic rules of 3D geometric physics means that the AI ​​model has reached a full understanding of the image content after being trained with a large amount of 3D data, which reflects that AI has further understood the real world.

However, Zhu Linchao also said that the "large world model" is still a long way from practical application in terms of following the physical rules of the real world. "Although it claims to introduce some physical mechanisms, the technical details of how to generate a model that better follows the basic principles of physics have not yet been disclosed. In some scenes, rendering errors also occur, such as different objects merging in an unnatural way and becoming a mass of color."

However, it is understood that World Labs said these are just "early previews" and they are working hard to increase the scale and realism of the generated world and explore new ways of interaction.

The world generation model has a wide range of application scenarios

World Labs is not the first to try out 3D generation. Previously, many companies such as Nvidia and Meta were also actively developing technologies related to physical AI and 3D worlds, and the market competition was fierce.

In China, many companies have also joined in. Take Unbounded AI as an example. Its product “Magic Mirror” also uses AI to generate 3D products. Users only need to input a photo on the browser, and “Magic Mirror” can generate a corresponding 3D model based on the person in the photo, which can eventually be made into a hand-made model.

The innovation of tools and the implementation of application scenarios are issues that most people are concerned about.

World Labs said in its official blog that they plan to build tools that are useful to professionals such as artists, designers, developers, filmmakers and engineers, allowing anyone to imagine and create their own world, expanding the potential of generative artificial intelligence from 2D images and videos to the 3D world.

"The emergence of AI models such as the 'large world model' may be able to fill the digital space in the VR world in the future." Ma Qianli explained that the construction cost of the digital space in VR is very high and the development cycle is relatively slow. The emergence of such tools will reduce the modeling cost of the digital space and be able to quickly build virtual world scenes according to needs, which means that the metaverse will be closer and closer to people.

Justin Johnson, a doctoral student of Fei-Fei Li and co-founder of World Labs, pointed out on social media that as this technology matures, we may no longer need to use screens of different sizes, such as mobile phones and tablets. He said that if you can seamlessly integrate virtual content with the physical world, the need for all these screens will decrease.

The era of spatial intelligence has arrived

Two years ago, ChatGPT was launched, and generative AI has been advancing rapidly since then, from processing two-dimensional images and text to understanding the three-dimensional world. From cultural intelligence to spatial intelligence, generative AI is recognizing the human physical world at an extremely fast speed.

The emergence of the "large world model" is also a practical demonstration of spatial intelligence.

Fei-Fei Li defines spatial intelligence as the ability of machines to perceive, reason and act in 3D space and time. In her opinion, spatial intelligence is the next frontier technology direction in the field of AI.

In September this year, in an interview with the media, Fei-Fei Li said that spatial intelligence is her next North Star and that the technology will change the development process of AI. She believes that spatial intelligence is as important as language intelligence, and may even be older and more basic in some aspects. The development of AI will not be limited to processing flat images or texts, but will move towards understanding the three-dimensional world, which is a natural extension of the development of intelligence.

So, what impact will the spatial intelligence AI pioneered by World Labs have on the future development of AI?

Zhu Linchao said that when people process information, they mainly rely on visual information. The emergence of such a large model can make more people pay attention to visual models, including how to build a better 3D environment and achieve physically consistent movement. These may attract more people to this field.

"The current investment in AI is too huge, and the direction is very important. Once such AI technology is verified, companies will dare to bet on the track, thereby promoting the development of the industry." Ma Qianli said.

Today, a single image generates a 3D world model, which gives us a preliminary understanding of spatial intelligence. In the future, there may be more large models. With the continuous optimization of AI algorithms and the upgrading of hardware equipment, spatial intelligence will further break through the existing technological boundaries and may become an important driving force for the transformation of human lifestyles.

Imagine boldly, if the time dimension is added and the training is successful, perhaps AI can really know the past and the present and predict the future?

<<:  Zhejiang's Most Scientific | Who says fiber can only be used to make clothes? It can also be used to make bicycles and airplanes!

>>:  "Goodbye Love 4" guests are anxious, this study tells you how the brain works

Recommend

Block Tesla and see who are the competitors of Model 3

Tesla's recently released Model 3 entry-level...

Planetary Exploration and "Look, Smell, Ask, and Touch"

Some time ago, I read Dr. Li Mingtao's articl...

Huke.com C4D software series course video

Huke.com C4D software series course video course ...

Information flow promotion, analysis of 7 excellent case techniques!

For third-party optimizers, advertising placement...

Do computers have vision? Let computers "see" the world

1. The Birth of Vision In the billions of years s...

How many "different worlds" can a mountain separate?

The mountains stretch north and south It created ...

In-depth analysis of event operations (I): Preparation before the event

In the next few articles, I will introduce to you...

Super Fan Pass is online, how to use it? This guide may be the most complete!

On August 23, all Fans Channels were switched to ...