[51CTO.com original article] 2016 is known as the first year of VR. At that time, users could only experience the most basic VR experience, which was some helmet hardware and plugging their phones into the helmet. At present, in emerging fields such as VR/AR, autonomous driving, and drones, traditional interaction methods can no longer meet user needs. With breakthroughs in deep learning, computer vision, and other fields, some new interaction methods have become possible. Based on past experience, the development of VR/AR technology is mainly to create immersion, which is a special experience that VR/AR brings to users that is different from mobile phones. To achieve immersion, three technical supports are needed: interaction, display, and mobility. In the following content, we will focus on interaction technology.
In the AR/VR field, the interaction is no longer done with a mouse and keyboard, and most interaction technologies use handles. There is also position tracking technology, which is provided in some high-end VR devices, but the cost is high and it needs to be connected to a computer or host to be implemented. In the future, it may be possible to use direct grasping with hands, and there are currently many gesture interaction solution providers. uSens mainly focuses on AR/VR Interaction: The HCI (Human-Computer Interaction) on the left is the human-computer interaction interface. uSens mainly provides Gesture (gesture recognition), View Direction (head turn recognition) and Position Tracking (position tracking). The right side is the display technology. Although the display technologies of AR and VR are different, the interaction technologies are the same. Gesture interaction technologyAs shown in the figure below, gesture interaction is divided into three types: symbolic, indirect, and direct:
The following figure shows the ability of common hardware to implement three scenarios:
Usage scenarios of gesture interaction technology Gesture interaction technology is widely used, not only in the VR field, as shown in the following figure: As shown in the figure, Game, 3D UI, Drive, Public Display, Medical and Intelligent Home, the first three are for a better experience, and the last three are for applications in public places; applications like surgery and cooking are for considerations of cleanliness and hygiene. AI-based gesture interaction technologyuSens has been using deep learning algorithms to solve the problem of hand recognition since 2014. Although not many people are researching gesture recognition, it can be achieved with deep learning, just like image recognition and face recognition. uSens uses deep learning to develop 26-degree-of-freedom gesture recognition, which can identify the hand joints, joint angles and positions from the image. The following figure shows the general idea of the 26-DOF gesture recognition algorithm: Refer to a case of deep learning in 2014, which mainly borrows the CNN method. Different images are convolved through CNNs of different scales to obtain a large vector, and then a two-layer fully connected deep network is used to obtain the heat-maps of the joint points. Heat-maps is the probability distribution of joint points on the image. As shown in the black box above, the blue dot is the distribution map of the tip of the little finger. The weight at this point is relatively large, while other positions are weakened. By obtaining the weight map of each joint point, inverse dynamics, filtering and other methods can be used to track the movement of the hand with 26 degrees of freedom. Motion tracking (spatial positioning technology) The following figure shows the main application scenarios of motion tracking: As shown in the figure, motion tracking was first used in the military. For example, missiles and airplanes need to locate themselves in the air. After optimization, spatial positioning technology is used in intelligent robots, unmanned vehicles, sweeping robots, etc. VR/AR also requires such spatial positioning technology. The following are the companies and products that are currently developing and mastering spatial positioning technology: Google Tango, Microsoft Hololens, Qualcomm VR SDK, Apple ARKit, Snapchat, Facebook and uSens. Spatial Positioning Technology/Visual Methods Spatial positioning technology requires the combination of information from both vision and sensors. The following figure shows the visual feature model : As shown in the right side of the figure, this is a camera imaging. By observing a point in three-dimensional space with the camera, an image can be obtained. The 3D point and the image satisfy the downward projection equation. On the left are three commonly used questions:
Visual SLAM Methods The visual SLAM method includes two modules: Tracking and Mapping. The function of the Tracking module is to solve the camera pose of each frame image when the 3D point position is known. The function of the Mapping module is to update the position of the 3D point. Visual features The visual features of spatial positioning technology can be divided into the following two types:
Solution According to the equation shown in the figure above, there are two ways to solve it: filtering-based algorithm and optimization-based algorithm. There is currently no good or bad distinction between these methods, and they can achieve similar results in actual systems. Spatial positioning technology/sensors The above-mentioned visual methods have become practical thanks to advances in sensors and the fusion of sensor signals into algorithms, which improves system performance. The following figure shows the sensors: Mechanic Gyroscope, LaserGyroscope and MEMs Gyroscope: As shown in the figure above, the first on the left is a mechanical gyroscope, which uses the characteristic that angular momentum remains constant. When the middle rotor keeps rotating at a high speed, but the direction of the middle rotor remains unchanged, the direction of rotation of the device can be obtained. This old-fashioned mechanical gyroscope was used in ships hundreds of years ago. Currently, Laser Gyroscopes are high-precision gyroscopes that use lasers, as shown in the small figure in the middle of the above picture. They are used for missiles flying in the air, with an error of about 100 meters over a few hours. There is a laser source in the middle of the laser gyroscope, which emits lasers in two directions. If the object is stationary, the optical paths of the two rays are the same length, and the phase difference between the two optical paths at the receiving end is zero; if the object rotates, the two optical paths will change very slightly, resulting in a phase difference. By identifying the phase difference, the rotation speed of the entire device can be obtained. MEMs Gyroscope (micromechanical gyroscope), used in mobile phones or VR helmets. This gyroscope is very small and uses some mechanical structures to identify movement. In a micromechanical gyroscope, there are two movable blades. When the object rotates, the blades will remain stationary. By identifying the angle, the rotation speed can be inferred. This type of micromechanical gyroscope is much worse than a laser mechanical gyroscope. It cannot achieve the desired effect if used alone, so it must be combined with the gyroscope and visual information. IMU (Inertial Measurement Unit) As shown below, the gyroscope and accelerometer in the IMU : The gyroscope in the IMU outputs the rotation angle of the camera at adjacent moments. The accelerometer in the IMU outputs the acceleration of the camera at adjacent moments, that is, the rate of change of velocity. Problems with spatial positioning technology/sensors As shown in the figure below, there are problems with the sensor: Sensor data sampling is discrete and drifting. As shown in Figure 1, the continuous line is the actual acceleration, but the IMU sampling is discrete, so the highest point is not sampled, resulting in errors in the results. The acceleration obtained by the IMU includes gravity. In practice, the acceleration caused by gravity is much greater than the acceleration caused by normal movement, so to remove gravity, it is necessary to accurately estimate the direction. The relative position and direction of the IMU and the camera have a great impact. The two positions are not together, there is a relative displacement between them, and due to industrial production reasons, the difference between the two is very small, an angle. Research has found that even a one-degree difference in angle will have a significant impact on the accuracy of the entire system, so the angle and displacement between the two must be calibrated online. The acquisition time of IMU and image is inconsistent. The camera sampling frequency is about 60 or 30 frames, but the IMU sampling frequency is very high, generally 500, 800, or 1000. The sampling frequency and sampling time are different. The solution to these problems is: sensor + visual fusion The following figure shows pre-integration and camera calibration: When the sampling rate of the IMU is much higher than that of the image, the pre-integration method can be used to integrate the IMU information between image frames and treat it as one quantity. In this way, there is no need to optimize the IMU points in each frame. The relationship between the camera IMU and the world coordinate system requires precise calibration. If there is a difference of one degree, the entire system will completely collapse. Space Positioning Technology/ATW ATW (Asynchronous Timewarp) is a technology for generating intermediate frames that can effectively reduce latency by predicting the future position of both eyes for early rendering. Camera image acquisition, SLAM algorithm, rendering, these processes all require processing time, collectively referred to as "Motion to Photon Latency". In VR applications, it needs to be less than 20ms to ensure that users do not feel dizzy. The following figure shows the rendering process: As can be seen from the figure, the whole process includes a lot of calculations, including image acquisition time, algorithm processing time and rendering time, and the time from rendering result to final result display. These can be obtained through IMU data, but some must be predicted. uSens Hardware The following figure shows the evolution of uSens' hardware products, which are smaller, more compact, easier to embed, and have lower power consumption.
The following figure shows the hardware specifications : After several iterations, the product hardware is much smaller in size and power consumption than before. The red marks are the factors that have a greater impact on the effect, such as global exposure resolution, dual system, and simultaneous sampling of left and right cameras. Problems facing VR/AR technology in the future The following figure shows an intelligent visual system: It is currently used in VR/AR scenarios. In fact, it can be applied to more scenarios as shown in the figure above, such as advertising machines, in-vehicle gesture recognition, robots and drones. The development of human-computer interaction technology is determined by two factors: on the one hand, users want a more natural way of interaction; on the other hand, technological advances make interaction more natural and convenient. In the future, human-computer interaction will include two core functions: natural gesture understanding and environmental perception. VR/AR technology is still on the way, and will face the following three major problems in the future:
For more details, please click on the video to watch : Future Development of Human-Computer Interaction Technology (Part 1) Future Development of Human-Computer Interaction Technology (Part 2)The above content is based on the speech of Mr. Ma Gengyu at the WOTI Global Innovation Technology Summit - Summit Forum. If you have the intention to submit articles or seek coverage, please contact [email protected]
He graduated from the Department of Computer Science at Tsinghua University in 2004 and later joined Samsung Advanced Institute of Technology. During his 10 years of work, he led and participated in many computer vision, pattern recognition, and human-computer interaction projects, including face recognition, 3D modeling, motion tracking and other technologies that have been applied in Samsung mobile phones. He has published more than 20 papers and patents and is currently working at Linggan Technology as Vice President of Technology R&D, leading the algorithm team to develop gesture recognition and spatial positioning algorithms. [51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites] |
<<: 49 Android Studio tips, plugins, and resources you should know
>>: Use Intent to open third-party applications and verify availability
The astronauts' space lectures let us see man...
[[121542]] There are many articles and blogs that...
Operations are basically a process of constantly ...
On the 18th, the Chinese Academy of Sciences and ...
The products with the lowest threshold and the hi...
Are we overestimating the impact of generative AI...
When we play with cats, what we like to look at m...
Reporters learned from China National Offshore Oi...
Activity push is an important way for products to...
Learn Photoshop from scratch, from beginner to ma...
Famous Artists Gallery | Grice, a Russian painter...
A friend previously left a message asking Qinggua...
The job of operations is to "help products d...
2024 is already in the past. In the past year, we...
Source code introduction: Based on Baidu data, a ...