A 3D world can be generated with just a picture. Is the era of spatial intelligence coming?

Any photo can generate a 3D world that you can wander in. It sounds like a fantasy of science fiction writers, but with the development of AI technology, it has become a reality within reach.

On December 3, Stanford University professor Fei-Fei Li announced that her World Labs team had launched an AI system that can generate a 3D world from a single image. They named the system "Large World Model" (LWM) and called themselves a spatial intelligence AI company. As soon as the model was launched, it attracted the attention of netizens around the world, and some people said that the real-life version of Inception is here!

Screenshot of World Labs official website

In the live demo that has been opened, users can directly control it on the browser and experience the world created by World Labs. For example, if you input a photo of a museum scene, AI will help you imagine the entrance and exit, the next adjacent exhibition hall, and exhibits; if you input the world-famous painting "Outdoor Cafe at Night", you can walk into the painting and experience the complete neighborhood environment...

So, what is the difference between “large world models”? What are their specific application scenarios? And what impact will it have on the development of AI?

One picture, generate 3D world

“It’s still quite a surprise. Sora itself previously had a bit of the ‘flavor’ of the simulated world, but the ‘large world model’ is another technical route, and the industry as a whole feels it exceeded expectations,” said Ma Qianli, president of the China AIGC Industry Alliance Research Institute and co-founder of Unbounded AI.

The "large world model" can be simply regarded as a tool for artificial intelligence to form a virtual world: users only need to upload a picture, and the system can automatically generate a 3D virtual world within the corresponding range based on the environmental information in the picture.

In addition, users can easily browse this 3D world directly on the web page with a mouse or keyboard. Moreover, the generated 3D world is interactive, and users can freely move the camera to explore this 3D world like playing a game, and operations such as depth of field and zoom are all possible.

"Interactivity actually means inputting commands to the AI through keyboard input or mouse movement, and it will render and generate the corresponding scene in real time based on the commands. Before this, the 3D scenes that everyone saw were all pre-built by humans," explained Zhu Linchao, doctoral supervisor at Zhejiang University.

What is amazing about this “large world model” is that it follows the basic rules of 3D geometric physics and has a real sense of depth and space.

In Ma Qianli's view, the compliance with the basic rules of 3D geometric physics means that the AI model has reached a full understanding of the image content after being trained with a large amount of 3D data, which reflects that AI has further understood the real world.

However, Zhu Linchao also said that the "large world model" is still a long way from practical application in terms of following the physical rules of the real world. "Although it claims to introduce some physical mechanisms, the technical details of how to generate a model that better follows the basic principles of physics have not yet been disclosed. In some scenes, rendering errors also occur, such as different objects merging in an unnatural way and becoming a mass of color."

However, it is understood that World Labs said these are just "early previews" and they are working hard to increase the scale and realism of the generated world and explore new ways of interaction.

The world generation model has a wide range of application scenarios

World Labs is not the first to try out 3D generation. Previously, many companies such as Nvidia and Meta were also actively developing technologies related to physical AI and 3D worlds, and the market competition was fierce.

In China, many companies have also joined in. Take Unbounded AI as an example. Its product “Magic Mirror” also uses AI to generate 3D products. Users only need to input a photo on the browser, and “Magic Mirror” can generate a corresponding 3D model based on the person in the photo, which can eventually be made into a hand-made model.

The innovation of tools and the implementation of application scenarios are issues that most people are concerned about.

World Labs said in its official blog that they plan to build tools that are useful to professionals such as artists, designers, developers, filmmakers and engineers, allowing anyone to imagine and create their own world, expanding the potential of generative artificial intelligence from 2D images and videos to the 3D world.

"The emergence of AI models such as the 'large world model' may be able to fill the digital space in the VR world in the future." Ma Qianli explained that the construction cost of the digital space in VR is very high and the development cycle is relatively slow. The emergence of such tools will reduce the modeling cost of the digital space and be able to quickly build virtual world scenes according to needs, which means that the metaverse will be closer and closer to people.

Justin Johnson, a doctoral student of Fei-Fei Li and co-founder of World Labs, pointed out on social media that as this technology matures, we may no longer need to use screens of different sizes, such as mobile phones and tablets. He said that if you can seamlessly integrate virtual content with the physical world, the need for all these screens will decrease.

The era of spatial intelligence has arrived

Two years ago, ChatGPT was launched, and generative AI has been advancing rapidly since then, from processing two-dimensional images and text to understanding the three-dimensional world. From cultural intelligence to spatial intelligence, generative AI is recognizing the human physical world at an extremely fast speed.

The emergence of the "large world model" is also a practical demonstration of spatial intelligence.

Fei-Fei Li defines spatial intelligence as the ability of machines to perceive, reason and act in 3D space and time. In her opinion, spatial intelligence is the next frontier technology direction in the field of AI.

In September this year, in an interview with the media, Fei-Fei Li said that spatial intelligence is her next North Star and that the technology will change the development process of AI. She believes that spatial intelligence is as important as language intelligence, and may even be older and more basic in some aspects. The development of AI will not be limited to processing flat images or texts, but will move towards understanding the three-dimensional world, which is a natural extension of the development of intelligence.

So, what impact will the spatial intelligence AI pioneered by World Labs have on the future development of AI?

Zhu Linchao said that when people process information, they mainly rely on visual information. The emergence of such a large model can make more people pay attention to visual models, including how to build a better 3D environment and achieve physically consistent movement. These may attract more people to this field.

"The current investment in AI is too huge, and the direction is very important. Once such AI technology is verified, companies will dare to bet on the track, thereby promoting the development of the industry." Ma Qianli said.

Today, a single image generates a 3D world model, which gives us a preliminary understanding of spatial intelligence. In the future, there may be more large models. With the continuous optimization of AI algorithms and the upgrading of hardware equipment, spatial intelligence will further break through the existing technological boundaries and may become an important driving force for the transformation of human lifestyles.

Imagine boldly, if the time dimension is added and the training is successful, perhaps AI can really know the past and the present and predict the future?

<<: Zhejiang's Most Scientific | Who says fiber can only be used to make clothes? It can also be used to make bicycles and airplanes!

>>: "Goodbye Love 4" guests are anxious, this study tells you how the brain works

5 ways to use eye tracking to improve website operations

Blog

The new benchmark for B station promotion in 2022! (Bilibili case)

Blog

Summary of the second offline open class of the Mobile Developer Service Alliance: Efficiency, efficiency, and efficiency!

Blog

In-depth case analysis of UNIQLO brand marketing

Blog

Turning waste into treasure! Chinese scientists have discovered a new way to make gasoline from plastic waste, with a yield of up to 80%

Blog

Nicotine is a drug for longevity? Beware of the pitfalls of subversive research

Recommend

The human body staged a palace drama that was even more exciting than the hit TV series!

Source: Youlai Healthy Life...

From 0 to 20 billion in just 3 years, all his marketing tricks are here!

"April 23" World Book Day has not yet a...

24-hour emergency response: Raincoats of missing persons found in the wild during rescue operation in Ailao Mountain, Yunnan

Hot News TOP NEWS Raincoats of missing persons we...

SMIC: 2Q20 conference call transcript: Full-year revenue target still grows 15-20%, gross profit margin higher than last year

According to the financial report released by SMI...

What are the origins of these BOTs? Let's take a look at the technological "secrets" of the Spring Festival Gala robots dancing Yangko

Wearing small flowered jackets, waving red handke...

What are some things you thought were disadvantages that are actually advantages?

One minute with the doctor, the postures are cons...

A 3D world can be generated with just a picture. Is the era of spatial intelligence coming?

5 ways to use eye tracking to improve website operations

The new benchmark for B station promotion in 2022! (Bilibili case)

Summary of the second offline open class of the Mobile Developer Service Alliance: Efficiency, efficiency, and efficiency!

In-depth case analysis of UNIQLO brand marketing

Turning waste into treasure! Chinese scientists have discovered a new way to make gasoline from plastic waste, with a yield of up to 80%

Nicotine is a drug for longevity? Beware of the pitfalls of subversive research

Pixel-graphic games have no future? Developers don't think so

Why do we like spicy food?

4 steps to solve the advertising problem, analysis of advertising strategies in the wedding photography industry!

What will the photonic computers of the future look like?

Recommend

The human body staged a palace drama that was even more exciting than the hit TV series!

From 0 to 20 billion in just 3 years, all his marketing tricks are here!

24-hour emergency response: Raincoats of missing persons found in the wild during rescue operation in Ailao Mountain, Yunnan

July new media operation and promotion hot calendar!

3 cents for a Tik Tok follower? In the age of 15 seconds of fame, how can you monetize through TikTok?

How does operations make annual summaries and annual plans?

WeChat has been updated again! Tap is more interesting, and there are 4 new changes

Astronauts are not allowed to fart. How can they do this?

New brand marketing model!

SMIC: 2Q20 conference call transcript: Full-year revenue target still grows 15-20%, gross profit margin higher than last year

Didi’s disruption of ride-sharing: an ill-timed gamble

How to operate content well and create phenomenal products

How to create tens of millions worth of products through content operations?

What are the origins of these BOTs? Let's take a look at the technological "secrets" of the Spring Festival Gala robots dancing Yangko

What are some things you thought were disadvantages that are actually advantages?