ARKit & OpenGL ES - ARKit principle and implementation

Principle

If you want to learn more about OpenGL ES, please go to the OpenGL ES related articles directory

The code used in this article is in the ARKit branch of https://github.com/SquarePants1991/OpenGLESLearn.git.

iOS11 introduced a new framework ARKit, which makes it easy to create AR apps through ARKit and SceneKit. Apple also provides a basic AR application framework, so you can start developing your AR app directly from it.

However, this series of articles will use OpenGL ES to provide rendering support for ARKit. Next, let’s first learn about the theoretical knowledge of ARKit.

AR Basic Concepts

The most basic concept of AR is the technology that combines virtual computer graphics with the real environment. There are many ways to implement this technology.

Using 2D or 3D graphics to decorate faces is common in some camera and video apps, mainly using face recognition and tracking technology.
Marker-based 3D model placement, such as AR-based storybooks and Onmyoji's summons. Markers can be simple black-framed markers or feature point training data for a complex image. If you are interested, you can go to ARToolKit, which is an open source AR framework mainly used for marker-based AR. ARToolkit6 Beta was recently released, and I wonder if there are any new features available.
Track the feature points of the real environment and calculate the position of the real camera in the real environment. The so-called feature points are the positions where the grayscale changes dramatically in the image. Therefore, if you want a more accurate and stable calculation, you need the real environment to have richer color changes. ARKit uses this principle to locate the camera.

WorldTracking

Tracking real-world feature points, calculating the real camera position, and applying it to the virtual camera in the 3D world is the most important part of AR implementation. The accuracy of the calculation directly affects the rendered result. ARKit uses ARSession to manage the entire AR processing flow, including the calculation of the camera position.

 #pragma make - AR Control  
 - ( void )setupAR {
 if (@available(iOS 11.0, *)) {
        self.arSession = [ARSession new ];
        self.arSession.delegate = self;
 }
 } 
  
 - ( void )runAR {
 if (@available(iOS 11.0, *)) {
        ARWorldTrackingSessionConfiguration *config = [ARWorldTrackingSessionConfiguration new ];
        config.planeDetection = ARPlaneDetectionHorizontal;
        [self.arSession runWithConfiguration:config];
 }
 } 
  
 - ( void )pauseAR {
 if (@available(iOS 11.0, *)) {
        [self.arSession pause];
 }
 }

The way to use ARSession is very simple. Initialize and set the delegate. To start ARSession, you need to pass in a configuration ARWorldTrackingSessionConfiguration. ARWorldTrackingSessionConfiguration means that the AR system will track the feature points in the real world and calculate the camera position. Apple may also release configurations such as ARMarkerTrackingSessionConfiguration to identify tracking markers in the future. After ARSession is turned on, the camera will be started and the phone position will be sensed through sensors. Borrow a picture from WWDC.

ARSession integrates the video stream and location information captured by the camera to generate a series of continuous ARFrames.

 - ( void )session:(ARSession *)session didUpdateFrame:(ARFrame *)frame {
 ...
 }

Each ARFrame contains the image captured by the camera, information about the camera position, etc. In this method, we need to draw the image captured by the camera, draw 3D objects based on the camera position and other information.

Plane detection

ARKit provides another cool feature, which is detecting real-world planes and providing an ARPlaneAnchor object that describes the plane's position, size, orientation, and other information.

 - ( void )runAR {
 if (@available(iOS 11.0, *)) {
        ARWorldTrackingSessionConfiguration *config = [ARWorldTrackingSessionConfiguration new ];
        config.planeDetection = ARPlaneDetectionHorizontal;
        [self.arSession runWithConfiguration:config];
 }
 }

The above config.planeDetection = ARPlaneDetectionHorizontal; sets the type of detected plane to horizontal. However, this is the only option available. If ARKit detects a plane, it will use the delegate method - (void)session:(ARSession *)session didAddAnchors:(NSArray

Hit Test

Hit Test allows you to easily place objects on detected planes. When you tap the screen, Hit Test can detect which planes are at the location you tapped, and provide ARAnchor for setting the location to place the object.

 [frame hitTest:CGPointMake(0.5, 0.5) types:ARHitTestResultTypeExistingPlane];

Using ARFrame's hitTest method, the first point passed in ranges from (0,0) to (1,1), and the second parameter represents which objects can be detected. The objects that can be detected are as follows.

ARHitTestResultTypeFeaturePoint, a continuous surface detected based on the closest feature point.
ARHitTestResultTypeEstimatedHorizontalPlane, the plane perpendicular to gravity is calculated in an inaccurate way.
ARHitTestResultTypeExistingPlane, the plane has been detected. When testing, the size of the plane itself is ignored and it is regarded as an infinite plane.
ARHitTestResultTypeExistingPlaneUsingExtent, the detected plane takes the size of the plane itself into consideration during detection.

If the detection is successful, NSArray* is returned. ARHitTestResult contains the detection type, the distance of the intersection point, and the ARAnchor of the plane. Note that ARAnchor will only be present when ARHitTestResultTypeExistingPlane and ARHitTestResultTypeExistingPlaneUsingExtent are detected. These four detection types can exist at the same time by using the | method, such as ARHitTestResultTypeEstimatedHorizontalPlane | ARHitTestResultTypeExistingPlane.

Light intensity adjustment

ARKit also provides a function to detect light intensity, mainly to make the lighting of the 3D model consistent with the lighting intensity of the environment. In ARFrame, there is a variable called lightEstimate. If the light intensity is detected successfully, there will be a value. The value type is ARLightEstimate, which contains only one variable, ambientIntensity. In the 3D lighting model, it corresponds to ambient light, and its value ranges from 0 to 2000. When rendering with OpenGL, you can use this value to adjust the ambient light intensity in the lighting model.

This is almost the end of the theoretical knowledge of ARKit. The next article will introduce how to use OpenGL ES to render the content in ARFrame.

Implementation

The code used in this article is in the ARKit branch of https://github.com/SquarePants1991/OpenGLESLearn.git.

The OpenGL basic code used in this article comes from the OpenGL ES series, which has basic functions such as rendering geometry and textures. The implementation details will not be repeated.

The key code for integrating ARKit is in ARGLBaseViewController. Let's take a look at its code.

Processing ARFrame

 - ( void )session:(ARSession *)session didUpdateFrame:(ARFrame *)frame {
 // Synchronize YUV information to yTexture and uvTexture  
    CVPixelBufferRef pixelBuffer = frame.capturedImage;
    GLsizei imageWidth = (GLsizei)CVPixelBufferGetWidthOfPlane(pixelBuffer, 0);
    GLsizei imageHeight = (GLsizei)CVPixelBufferGetHeightOfPlane(pixelBuffer, 0);
 void * baseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0); 
  
    glBindTexture(GL_TEXTURE_2D, self.yTexture);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_LUMINANCE, imageWidth, imageHeight, 0, GL_LUMINANCE, GL_UNSIGNED_BYTE, baseAddress);
    glBindTexture(GL_TEXTURE_2D, 0); 
  
    imageWidth = (GLsizei)CVPixelBufferGetWidthOfPlane(pixelBuffer, 1);
    imageHeight = (GLsizei)CVPixelBufferGetHeightOfPlane(pixelBuffer, 1);
 void *laAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1);
    glBindTexture(GL_TEXTURE_2D, self.uvTexture);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_LUMINANCE_ALPHA, imageWidth, imageHeight, 0, GL_LUMINANCE_ALPHA, GL_UNSIGNED_BYTE, laAddress);
    glBindTexture(GL_TEXTURE_2D, 0); 
  
    self.videoPlane.yuv_yTexture = self.yTexture;
    self.videoPlane.yuv_uvTexture = self.uvTexture;
    [self setupViewport: CGSizeMake(imageHeight, imageWidth)]; 
  
 // Synchronize the camera  
    matrix_float4x4 cameraMatrix = matrix_invert([frame.camera transform]);
    GLKMatrix4 newCameraMatrix = GLKMatrix4Identity;
 for ( int col = 0; col < 4; ++col) {
 for ( int row = 0; row < 4; ++row) {
            newCameraMatrix.m[col * 4 + row] = cameraMatrix.columns[col][row];
 }
 } 
  
    self.cameraMatrix = newCameraMatrix;
    GLKVector3 forward = GLKVector3Make(-self.cameraMatrix.m13, -self.cameraMatrix.m23, -self.cameraMatrix.m33);
    GLKMatrix4 rotationMatrix = GLKMatrix4MakeRotation(M_PI / 2, forward.x, forward.y, forward.z);
    self.cameraMatrix = GLKMatrix4Multiply(rotationMatrix, newCameraMatrix);
 }

The above code shows how to process the ARFrame captured by ARKit. The capturedImage of ARFrame stores the image information captured by the camera, and its type is CVPixelBufferRef. By default, the image information is in YUV format, which is stored in two Planes, and can also be understood as two pictures. One is in Y (Luminance) format, which stores the brightness information, and the other is in UV (Chrominance, Chroma), which stores the chroma and concentration. We need to bind these two pictures to different textures, and then use the algorithm in the Shader to convert YUV to RGB. The following is a fragment shader that processes textures and uses formulas for color conversion.

 precision highp float ; 
  
 varying vec3 fragNormal;
 varying vec2 fragUV; 
  
 uniform float elapsedTime;
 uniform mat4 normalMatrix;
 uniform sampler2D yMap;
 uniform sampler2D uvMap; 
  
 void main( void ) {
    vec4 Y_planeColor = texture2D(yMap, fragUV);
    vec4 CbCr_planeColor = texture2D(uvMap, fragUV); 
  
 float Cb, Cr, Y;
 float R ,G,B;
    Y = Y_planeColor.r * 255.0;
    Cb = CbCr_planeColor.r * 255.0 - 128.0;
    Cr = CbCr_planeColor.a * 255.0 - 128.0; 
  
 R = 1.402 * Cr + Y;
    G = -0.344 * Cb - 0.714 * Cr + Y;
 B = 1.772 * Cb + Y; 
  
  
    vec4 videoColor = vec4(R / 255.0, G / 255.0, B / 255.0, 1.0);
    gl_FragColor = videoColor;
 }

After processing and binding the texture, in order to ensure that the texture is not stretched non-uniformly under different screen sizes, the viewport is recalculated [self setupViewport: CGSizeMake(imageHeight, imageWidth)];. Next, the camera transformation calculated by ARKit is assigned to self.cameraMatrix. Note that the image captured by ARKit needs to be rotated 90 degrees to be displayed normally, so the width and height are deliberately reversed when setting the Viewport, and the camera is rotated at the end.

VideoPlane

VideoPlane is the geometry written to display video, it can receive two textures, Y and UV.

 @interface VideoPlane : GLObject
 @property (assign, nonatomic) GLuint yuv_yTexture;
 @property (assign, nonatomic) GLuint yuv_uvTexture;
 - (instancetype)initWithGLContext:(GLContext *)context;
 - ( void )update:(NSTimeInterval)timeSinceLastUpdate;
 - ( void )draw:(GLContext *)glContext;
 @end 
  
 ... 
  
 - ( void )draw:(GLContext *)glContext {
    [glContext setUniformMatrix4fv:@ "modelMatrix" value:self.modelMatrix];
 bool canInvert;
    GLKMatrix4 normalMatrix = GLKMatrix4InvertAndTranspose(self.modelMatrix, &canInvert);
    [glContext setUniformMatrix4fv:@ "normalMatrix" value:canInvert ? normalMatrix : GLKMatrix4Identity];
    [glContext bindTextureName:self.yuv_yTexture to:GL_TEXTURE0 uniformName:@ "yMap" ];
    [glContext bindTextureName:self.yuv_uvTexture to:GL_TEXTURE1 uniformName:@ "uvMap" ];
    [glContext drawTrianglesWithVAO:vao vertexCount:6];
 }

The other functions are very simple, which is to draw a square and finally cooperate with the Shader to display the video to render the data in YUV format.

Perspective projection matrix

In ARFrame, you can get the texture and camera matrix required for rendering. In addition to these, the perspective projection matrix that matches the real camera is also required. It can make the perspective of the rendered 3D objects look natural.

 - ( void )session:(ARSession *)session cameraDidChangeTrackingState:(ARCamera *)camera {
    matrix_float4x4 projectionMatrix = [camera projectionMatrixWithViewportSize:self.viewport.size orientation:UIInterfaceOrientationPortrait zNear:0.1 zFar:1000];
    GLKMatrix4 newWorldProjectionMatrix = GLKMatrix4Identity;
 for ( int col = 0; col < 4; ++col) {
 for ( int row = 0; row < 4; ++row) {
           newWorldProjectionMatrix.m[col * 4 + row] = projectionMatrix.columns[col][row];
 }
 }
    self.worldProjectionMatrix = newWorldProjectionMatrix;
 }

The above code demonstrates how to obtain the 3D perspective projection matrix through ARKit. With the perspective projection matrix and the camera matrix, you can easily use OpenGL to render objects.

 - ( void )glkView:(GLKView *)view drawInRect:(CGRect)rect {
    [super glkView:view drawInRect:rect]; 
  
    [self.objects enumerateObjectsUsingBlock:^(GLObject *obj, NSUInteger idx, BOOL *stop) {
 [obj.context active];
        [obj.context setUniform1f:@ "elapsedTime" value:(GLfloat)self.elapsedTime];
        [obj.context setUniformMatrix4fv:@ "projectionMatrix" value:self.worldProjectionMatrix];
        [obj.context setUniformMatrix4fv:@ "cameraMatrix" value:self.cameraMatrix]; 
  
        [obj.context setUniform3fv:@ "lightDirection" value:self.lightDirection];
        [obj draw:obj.context];
 }];
 }

This article mainly introduces the basic idea of OpenGL ES rendering ARKit, without describing too many OpenGL ES technical details. If you are interested, you can directly clone the code on Github to learn more.

<<: Using convolutional autoencoders to reduce noise in images

>>: The third round of the Aite Tribe Story Collection with prizes has begun~

Doing these 9 things can delay cardiovascular disease by 10 years!

Blog

How much does it cost to develop a Shangrao agricultural products mini program? How much does it cost to develop a Shangrao agricultural products mini program?

Blog

Baidu search oCPC optimization skills

Blog

It is the second leading cause of blindness in the world, but its symptoms are hidden and difficult to detect! Be alert to these 5 situations!

Blog

The ocean is running a fever: Will extreme weather bring us into an era of "boiling"?

Blog

Is the trans fatty acid content of milk tea off the charts? Drinking too much will turn your blood milky white? The truth is this →

Blog

Why is Ant Technology’s valuation so high? The basis for Ant Technology’s $200 billion valuation

ARKit & OpenGL ES - ARKit principle and implementation

Doing these 9 things can delay cardiovascular disease by 10 years!

How much does it cost to develop a Shangrao agricultural products mini program? How much does it cost to develop a Shangrao agricultural products mini program?

Baidu search oCPC optimization skills

It is the second leading cause of blindness in the world, but its symptoms are hidden and difficult to detect! Be alert to these 5 situations!

The ocean is running a fever: Will extreme weather bring us into an era of "boiling"?

Is the trans fatty acid content of milk tea off the charts? Drinking too much will turn your blood milky white? The truth is this →

Why is Ant Technology’s valuation so high? The basis for Ant Technology’s $200 billion valuation

How to do integrated marketing?

Baidu Ruliu AI intelligent work platform upgrades standards to empower all enterprise work scenarios with AI

Can smelly car exhaust be made “cleaner”?

Recommend

Interface testing practice | What to do if the higher version of Android cannot capture HTTPS

Apple releases first artificial intelligence report: new image recognition

Can't eat before colonoscopy? You can eat this

Google has come up with a solution to the parking problem that has been complaining for years

A small action can determine whether your knees are healthy! Try it now →

How to open a small store on Douyin without 300,000 followers? Do I need 300,000 followers to open a Douyin store?

How to promote on Baidu Tieba? Is it true that Weibo can promote traffic to you?

The light of "sweeping grass" has faded, and Ecovacs has "revealed its true colors"

A comprehensive review of China's mobile internet advertising in 2019

Douyin series "Peking University Huahua, Tsinghua University and Peking University Academic Masters' Learning Power"

This most fascinating experiment in the history of science solves the mystery of the earth's "movement" and "stillness"

Security experts recommend not installing these apps on your phone

WeChat has undergone a huge change! It has been two days, but many people still don’t know...

GUCCI donated 100 million to rebuild Notre Dame de Paris

French students design radiation-proof underwear to prevent smartphones from killing sperm