Artificial Intelligence in AR

In March this year, the Shanghai Municipal Commission of Economy and Information Technology announced the first batch of projects to be supported by the city's artificial intelligence innovation and development in 2018. A total of 19 innovative companies were shortlisted, including Liangfengtai, a representative domestic AR company. This project is jointly carried out by the Commission of Economy and Information Technology and the Municipal Finance Bureau, and the planned support amount exceeds 100 million. This is not the first time that AR companies have been classified as artificial intelligence, but this classification is not common. AR/VR are often mentioned as twin brothers and are generally considered to be new technologies at the application layer or "smart wearable devices". Compared with the relative "algorithm" label of artificial intelligence, it seems not deep enough, connotative and high-end. So what is the relationship between AR and artificial intelligence? Does AR belong to the artificial intelligence we know today?

Let’s first briefly sort out the core technologies of AR

AR (Augmented Reality) is the process of superimposing virtual information on the real world, that is, "enhancing" reality. This enhancement can come from vision, hearing and even touch. The main purpose is to integrate the real world and the virtual world together in terms of sensory perception.

Among them, the cognition of the real world is mainly reflected in vision, which requires the help of cameras to obtain information and feedback in the form of images and videos. Through video analysis, the perception and understanding of the three-dimensional world environment is realized, such as the 3D structure of the scene, what objects are inside, and where they are in space. The purpose of 3D interactive understanding is to tell the system what to "enhance".

Figure. Typical AR process

There are several key points :

The first is 3D environment understanding. To understand what you see, you mainly rely on object/scene recognition and positioning technology. Recognition is mainly used to trigger AR response, while positioning is to know where to overlay AR content. Positioning can also be divided into coarse positioning and fine positioning according to different accuracy. Coarse positioning is to give a general direction, such as area and trend. Fine positioning may need to be accurate to the point, such as the XYZ coordinates in the 3D coordinate system and the angle of the object. Depending on the application environment, both dimensions of positioning have application needs in AR. In the field of AR, common detection and recognition tasks include face detection, pedestrian detection, vehicle detection, gesture recognition, biometrics, emotion recognition, natural scene recognition, etc.

After perceiving the real 3D world and integrating it with virtual content, this virtual-real fusion information needs to be presented in a certain way. What is needed here is the second key technology in AR: display technology. Currently, most AR systems use perspective helmet displays, which are divided into video perspective and optical perspective. Other representatives include light field technology (mainly famous because of Magic Leap), holographic projection (often appearing in science fiction film and television dramas), etc.

The third key technology in AR is human-computer interaction, which allows people to interact with superimposed virtual information. AR pursues natural human-computer interaction methods other than touch buttons, such as voice, gestures, posture, and faces, with voice and gestures being used more frequently.

The technical connection between artificial intelligence and AR

In the field of artificial intelligence, there are several concepts that are often mentioned, such as deep learning (DL) and machine learning (ML). In the academic field, several major fields including artificial intelligence (AI) have their own research boundaries. In a general sense, we often talk about general artificial intelligence, which covers all application directions of technologies that "make machines like humans."

From this picture, we can also get a brief glimpse of the relationship between the three. Deep learning is a technical method to achieve machine learning, and machine learning is to make machines intelligent and achieve artificial intelligence. It can be said that artificial intelligence is the ultimate goal, and machine learning is a technical direction to achieve this goal. Among them, there is another important concept, computer vision (CV), which mainly studies how to make machines "see" like humans. It is an important branch of the current concept of artificial intelligence. This is also because one of the main ways for humans to obtain information is vision. At present, computer vision has played a role in the commercial market, such as face recognition; reading traffic signals and paying attention to pedestrians for navigation in autonomous driving; industrial robots are used to detect problem control processes; processing of reconstructed images of three-dimensional environments, etc. These concepts are both differentiated and overlapped to a certain extent.

Among them, the deep learning craze triggered by Hinton began to spread in 2006, which to a certain extent led to the rise of AI again. In the past ten years, major breakthroughs have been made in many fields including speech recognition, computer vision, and natural language processing, and it has been extended to application areas and is developing in full swing.

Among the core technologies of AR, 3D environment understanding, 3D interaction understanding, computer vision, and deep learning are closely related. In academia, 3D environment understanding mainly corresponds to the field of computer vision, and deep learning has been widely used in computer vision in recent years. In terms of interaction, more natural interaction methods such as gestures and voice are used in hardware terminals, thanks to the breakthroughs in deep learning in related fields in recent years. It can also be said that the application of deep learning in AR is mainly in key visual technologies.

At present, the most common form of AR is 2D image scanning and recognition , as seen in most AR marketing such as Tencent QQ-AR Torch Event and Alipay Five Fortunes. Using a mobile phone to scan the recognition image will result in superimposed content, but the main research and development direction is still in 3D object recognition and 3D scene modeling.

Real objects exist in 3D form, with different angles and spatial orientations. So a natural extension is to move from 2D image recognition to 3D object recognition, identifying the category and posture of objects, and deep learning can be used here. Take fruit recognition as an example, identifying different categories of fruit and giving the location area, which integrates the functions of object recognition and detection.

3D scene modeling expands from identifying 3D objects to larger and more complex 3D areas. For example, identifying what is in the scene, their spatial positions and relationships, etc. This is 3D scene modeling, which is a core technology of AR. This involves the currently popular SLAM (real-time localization and mapping). By scanning a scene, and then overlaying three-dimensional virtual content such as a virtual battlefield on it. If it is only based on ordinary 2D image recognition, a specific picture is required, and recognition will fail when the picture is not visible. In SLAM technology, even if a specific plane does not exist, spatial positioning is still very accurate because of the help of the surrounding 3D environment.

Here I want to discuss the integration of deep learning and SLAM technology . Computer vision can be roughly divided into two schools. One is based on learning, such as feature extraction-feature analysis-classification. At present, deep learning technology has achieved a dominant position in this route. The other route is based on geometric vision, which derives the spatial structure information of objects from lines, edges, and 3D shapes. The representative technology is SFM/SLAM. Deep learning basically dominates the learning-based direction, but in the field of geometric vision, there is currently little progress. From the academic perspective, the research progress of deep learning technology can be said to be changing with each passing day, while the progress of SLAM technology in the past decade is relatively small. At the SLAM technology symposium organized by the International Conference on Visual Computing (ICCV) in 2015, based on the rapid development of deep learning in other fields of vision in recent years, some experts at the meeting proposed the possibility of using deep learning in SLAM, but there is no mature idea at present. In general, the integration of deep learning and SLAM is a direction worth studying in the short term, and the combination of semantic and geometric information is a very valuable trend in the long run. Therefore, SLAM+DL is worth looking forward to.

In terms of interaction methods, the main ones include voice recognition and gesture recognition. Voice recognition has made great progress. Domestic companies such as Baidu, iFlytek, and Unisound are among the best. AR companies want to break through the mature commercialization of gesture recognition. For example, a gesture recognition system based on deep learning demonstrated by Liangfengtai mainly defines six gestures: up, down, left, right, clockwise, and counterclockwise. First, it realizes the detection and positioning of human hands, and then recognizes the corresponding gesture trajectory to realize the recognition of human gestures. Other hot fields of artificial intelligence such as face recognition are also used in AR, but they are not important research and development directions of AR companies.

It is not difficult to see from the above that the underlying technology or basic part of AR is the integration of computer vision and related fields, and the combination of deep learning and AR, which is currently popular, is also the direction of algorithm engineers' efforts. This is also the basis for the saying that AR is an interdisciplinary subject of computer vision and human-computer interaction, and the foundation of AR is artificial intelligence and computer vision.

Figure: Computer vision and AR process association

The "Artificial Intelligence Impact Report" released by Toutiao last year also briefly counted the distribution of artificial intelligence scientists, including companies and large R&D institutions in the fields of facial recognition, voice recognition, robotics, AR, chips, etc. The distribution of high-end R&D personnel also illustrates the subdivision directions of the AI field.

So is AR artificial intelligence?

For AR practitioners, the ideal state is to replace smartphones with smarter AR terminals. Therefore, for users, the first thing that is affected by the use of AR is the content, followed by the terminal. If the AR industry chain is roughly divided, it includes technology providers, smart terminal R&D companies, and AR content providers. Among them, AR device providers inevitably focus on hardware technology, such as the underlying chips, batteries, optical lenses, etc., as well as the performance optimization of the hardware itself, while content providers are more inclined to optimize content and performance based on existing technologies. So we can say that AR technology providers, or AR companies that have achieved certain results in the research and development of underlying algorithms, are artificial intelligence companies.

For companies, especially startups, they will transform the underlying technology into mature products or services, which may be drones, AR smart terminals, robots, etc., or industry solutions to achieve commercial purposes, and this has become the expectation and requirement of the media, enterprises and the public for AI companies after the boiling voice. Recently, the book "Artificial Intelligence Wave: 100 Frontier AI Applications that Change Life" published by the Artificial Intelligence Industry Development Alliance (AIIA) will be released to the public, and it includes the cutting-edge achievements of current giant companies and startups in commercialization, which also directly reflects the current main commercialization direction of AI.

As a technology-driven business field, whether it is AR or most other areas of artificial intelligence, the technology still has a long way to go before it can be fully mature. While the entire industrial chain is gradually prospering and paying attention to commercialization, more companies and institutions are also needed to continuously expand the boundaries of technology, establish core competitiveness, and allow the industry to explode with greater value and potential. In this way, China can be expected to overtake others in the AI era.

Note: This article comes from AR company Liangfengtai

<<: The first version of 5G standards will be released next month. Revealed: How are 5G standards established?

>>: McKinsey report: What is the best way to earn money in the AI era? Programming, programming, programming

1000 top landing pages, 5 conversion rate optimization tips

This year, the pension level for retirees will be increased by 4%! How to adjust specifically? Attached is the latest notice of 2022 adjustments!

Blog

Kuaishou Search for Hidden Traffic Opportunities

Blog

There are several plugins that must be installed for wordpress, and WP plugins that must be installed to build a website.

Blog

Is WeChat going to charge money? Two functions will be charged. Those who like to read subscription accounts and use red envelopes should pay attention.

Preface With the development of social economy, I...

Artificial Intelligence in AR

1000 top landing pages, 5 conversion rate optimization tips

SEO website optimization, do you pay attention to the website information architecture?

Taking Kaola.com as an example, we analyze the event planning process and innovative gameplay

Comments on 10 of the most popular domestic open source projects on Github

Tmall Double 11 event marketing tactics!

I heard that you waited another month for Apple’s reply? Maybe it won’t take so long in the future!

App promotion tips: eight ways to tap into iOS channels

This year, the pension level for retirees will be increased by 4%! How to adjust specifically? Attached is the latest notice of 2022 adjustments!

Kuaishou Search for Hidden Traffic Opportunities

There are several plugins that must be installed for wordpress, and WP plugins that must be installed to build a website.

Recommend

Understand the marketing promotion of Station B in one article

The secrets of Douyin App's operation and promotion routines

Gameplay Analysis | WeChat Reading Sharing System

How much does it cost to make a lighting app in Neijiang?

WeChat Mahjong Mini Program Customization, How Much Does It Cost to Develop a Mahjong Mini Program?

Kuaishou brand account operation plan!

Practical Tips: How to get traffic with Google text ads?

How do we do code review?

How much does it cost to develop a Zengcheng e-commerce mini program?

Is WeChat going to charge money? Two functions will be charged. Those who like to read subscription accounts and use red envelopes should pay attention.

How to use growth hacking thinking for promotion?

Social media operation: How to use social media to increase followers accurately?

From popular to unpopular, a review of the history of tablet computers

How to analyze user needs and build a user system?

Five key directions for new media professionals in 2019