ARKit & OpenGL ES - ARKit principle and implementation

ARKit & OpenGL ES - ARKit principle and implementation

Principle

If you want to learn more about OpenGL ES, please go to the OpenGL ES related articles directory

The code used in this article is in the ARKit branch of https://github.com/SquarePants1991/OpenGLESLearn.git.

iOS11 introduced a new framework ARKit, which makes it easy to create AR apps through ARKit and SceneKit. Apple also provides a basic AR application framework, so you can start developing your AR app directly from it.

However, this series of articles will use OpenGL ES to provide rendering support for ARKit. Next, let’s first learn about the theoretical knowledge of ARKit.

AR Basic Concepts

The most basic concept of AR is the technology that combines virtual computer graphics with the real environment. There are many ways to implement this technology.

  • Using 2D or 3D graphics to decorate faces is common in some camera and video apps, mainly using face recognition and tracking technology.

  • Marker-based 3D model placement, such as AR-based storybooks and Onmyoji's summons. Markers can be simple black-framed markers or feature point training data for a complex image. If you are interested, you can go to ARToolKit, which is an open source AR framework mainly used for marker-based AR. ARToolkit6 Beta was recently released, and I wonder if there are any new features available.

  • Track the feature points of the real environment and calculate the position of the real camera in the real environment. The so-called feature points are the positions where the grayscale changes dramatically in the image. Therefore, if you want a more accurate and stable calculation, you need the real environment to have richer color changes. ARKit uses this principle to locate the camera.

WorldTracking

Tracking real-world feature points, calculating the real camera position, and applying it to the virtual camera in the 3D world is the most important part of AR implementation. The accuracy of the calculation directly affects the rendered result. ARKit uses ARSession to manage the entire AR processing flow, including the calculation of the camera position.

#pragma make - AR Control
- (void)setupAR {
    if (@available(iOS 11.0, *)) {
        self.arSession = [ARSession new];
        self.arSession.delegate = self;
    }
}

- (void)runAR {
    if (@available(iOS 11.0, *)) {
        ARWorldTrackingSessionConfiguration *config = [ARWorldTrackingSessionConfiguration new];
        config.planeDetection = ARPlaneDetectionHorizontal;
        [self.arSession runWithConfiguration:config];
    }
}

- (void)pauseAR {
    if (@available(iOS 11.0, *)) {
        [self.arSession pause];
    }
}

The way to use ARSession is very simple. Initialize and set the delegate. To start ARSession, you need to pass in a configuration ARWorldTrackingSessionConfiguration. ARWorldTrackingSessionConfiguration means that the AR system will track the feature points in the real world and calculate the camera position. Apple may also release configurations such as ARMarkerTrackingSessionConfiguration to identify tracking markers in the future. After ARSession is turned on, the camera will be started and the phone position will be sensed through sensors. Borrow a picture from WWDC.

ARSession integrates the video stream and location information captured by the camera to generate a series of continuous ARFrames.

- (void)session:(ARSession *)session didUpdateFrame:(ARFrame *)frame {
...
}

Each ARFrame contains the image captured by the camera, information about the camera position, etc. In this method, we need to draw the image captured by the camera, draw 3D objects based on the camera position and other information.

Plane detection

ARKit provides another cool feature, which is detecting real-world planes and providing an ARPlaneAnchor object that describes the plane's position, size, orientation, and other information.

- (void)runAR {
    if (@available(iOS 11.0, *)) {
        ARWorldTrackingSessionConfiguration *config = [ARWorldTrackingSessionConfiguration new];
        config.planeDetection = ARPlaneDetectionHorizontal;
        [self.arSession runWithConfiguration:config];
    }
}

The above config.planeDetection = ARPlaneDetectionHorizontal; sets the type of detected plane to horizontal. However, this is the only option available at the moment. If ARKit detects a plane, it will provide you with data through the delegate method - (void)session:(ARSession *)session didAddAnchors:(NSArray *)anchors. You can determine whether a plane is detected by judging whether the ARAnchor is an ARPlaneAnchor. ARAnchor is used to represent the position of a 3D object in the real environment. You can achieve AR effects by keeping your 3D object and the 3D transformation of the ARAnchor synchronized.

Hit Test

Hit Test allows you to easily place objects on detected planes. When you tap the screen, Hit Test can detect which planes are at the location you tapped, and provide ARAnchor for setting the location to place the object.

[frame hitTest:CGPointMake(0.5, 0.5) types:ARHitTestResultTypeExistingPlane];

Using ARFrame's hitTest method, the first point passed in ranges from (0,0) to (1,1), and the second parameter represents which objects can be detected. The objects that can be detected are as follows.

  • ARHitTestResultTypeFeaturePoint, a continuous surface detected based on the closest feature point.

  • ARHitTestResultTypeEstimatedHorizontalPlane, the plane perpendicular to gravity is calculated in an inaccurate way.

  • ARHitTestResultTypeExistingPlane, the plane has been detected. When testing, the size of the plane itself is ignored and it is regarded as an infinite plane.

  • ARHitTestResultTypeExistingPlaneUsingExtent, the detected plane takes the size of the plane itself into consideration during detection.

If the detection is successful, NSArray * is returned. ARHitTestResult contains the detection type, the distance of the intersection point, and the ARAnchor of the plane. Note that ARAnchor will only be present when ARHitTestResultTypeExistingPlane and ARHitTestResultTypeExistingPlaneUsingExtent are detected. These four detection types can exist at the same time by using the | method, such as ARHitTestResultTypeEstimatedHorizontalPlane | ARHitTestResultTypeExistingPlane.

Light intensity adjustment

ARKit also provides a function to detect light intensity, mainly to make the lighting of the 3D model consistent with the lighting intensity of the environment. In ARFrame, there is a variable called lightEstimate. If the light intensity is detected successfully, there will be a value. The value type is ARLightEstimate, which contains only one variable, ambientIntensity. In the 3D lighting model, it corresponds to ambient light, and its value ranges from 0 to 2000. When rendering with OpenGL, you can use this value to adjust the ambient light intensity in the lighting model.

This is almost the end of the theoretical knowledge of ARKit. The next article will introduce how to use OpenGL ES to render the content in ARFrame.

Implementation

The code used in this article is in the ARKit branch of https://github.com/SquarePants1991/OpenGLESLearn.git.

The OpenGL basic code used in this article comes from the OpenGL ES series, which has basic functions such as rendering geometry and textures. The implementation details will not be repeated.

The key code for integrating ARKit is in ARGLBaseViewController. Let's take a look at its code.

Processing ARFrame

- (void)session:(ARSession *)session didUpdateFrame:(ARFrame *)frame {
    // Synchronize YUV information to yTexture and uvTexture
    CVPixelBufferRef pixelBuffer = frame.capturedImage;
    GLsizei imageWidth = (GLsizei)CVPixelBufferGetWidthOfPlane(pixelBuffer, 0);
    GLsizei imageHeight = (GLsizei)CVPixelBufferGetHeightOfPlane(pixelBuffer, 0);
    void * baseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0);

    glBindTexture(GL_TEXTURE_2D, self.yTexture);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_LUMINANCE, imageWidth, imageHeight, 0, GL_LUMINANCE, GL_UNSIGNED_BYTE, baseAddress);
    glBindTexture(GL_TEXTURE_2D, 0);

    imageWidth = (GLsizei)CVPixelBufferGetWidthOfPlane(pixelBuffer, 1);
    imageHeight = (GLsizei)CVPixelBufferGetHeightOfPlane(pixelBuffer, 1);
    void *laAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1);
    glBindTexture(GL_TEXTURE_2D, self.uvTexture);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_LUMINANCE_ALPHA, imageWidth, imageHeight, 0, GL_LUMINANCE_ALPHA, GL_UNSIGNED_BYTE, laAddress);
    glBindTexture(GL_TEXTURE_2D, 0);

    self.videoPlane.yuv_yTexture = self.yTexture;
    self.videoPlane.yuv_uvTexture = self.uvTexture;
    [self setupViewport: CGSizeMake(imageHeight, imageWidth)];

    // Synchronize camera matrix_float4x4 cameraMatrix = matrix_invert([frame.camera transform]);
    GLKMatrix4 newCameraMatrix = GLKMatrix4Identity;
    for (int col = 0; col < 4; ++col) {
        for (int row = 0; row < 4; ++row) {
            newCameraMatrix.m[col * 4 + row] = cameraMatrix.columns[col][row];
        }
    }

    self.cameraMatrix = newCameraMatrix;
    GLKVector3 forward = GLKVector3Make(-self.cameraMatrix.m13, -self.cameraMatrix.m23, -self.cameraMatrix.m33);
    GLKMatrix4 rotationMatrix = GLKMatrix4MakeRotation(M_PI / 2, forward.x, forward.y, forward.z);
    self.cameraMatrix = GLKMatrix4Multiply(rotationMatrix, newCameraMatrix);
}

The above code shows how to process the ARFrame captured by ARKit. The capturedImage of ARFrame stores the image information captured by the camera, and its type is CVPixelBufferRef. By default, the image information is in YUV format, which is stored in two Planes, and can also be understood as two pictures. One is in Y (Luminance) format, which stores the brightness information, and the other is in UV (Chrominance, Chroma), which stores the chroma and concentration. We need to bind these two pictures to different textures, and then use the algorithm in the Shader to convert YUV to RGB. The following is a fragment shader that processes textures and uses formulas for color conversion.

precision highp float;

varying vec3 fragNormal;
varying vec2 fragUV;

uniform float elapsedTime;
uniform mat4 normalMatrix;
uniform sampler2D yMap;
uniform sampler2D uvMap;

void main(void) {
    vec4 Y_planeColor = texture2D(yMap, fragUV);
    vec4 CbCr_planeColor = texture2D(uvMap, fragUV);

    float Cb, Cr, Y;
    float R ,G,B;
    Y = Y_planeColor.r * 255.0;
    Cb = CbCr_planeColor.r * 255.0 - 128.0;
    Cr = CbCr_planeColor.a * 255.0 - 128.0;

    R = 1.402 * Cr + Y;
    G = -0.344 * Cb - 0.714 * Cr + Y;
    B = 1.772 * Cb + Y;


    vec4 videoColor = vec4(R / 255.0, G / 255.0, B / 255.0, 1.0);
    gl_FragColor = videoColor;
}

After processing and binding the texture, in order to ensure that the texture is not stretched non-uniformly under different screen sizes, the viewport is recalculated [self setupViewport: CGSizeMake(imageHeight, imageWidth)];. Next, the camera transformation calculated by ARKit is assigned to self.cameraMatrix. Note that the image captured by ARKit needs to be rotated 90 degrees to display normally, so the width and height are deliberately reversed when setting the Viewport, and the camera is rotated at the end.

VideoPlane

VideoPlane is the geometry written to display video, it can receive two textures, Y and UV.

@interface VideoPlane : GLObject
@property (assign, nonatomic) GLuint yuv_yTexture;
@property (assign, nonatomic) GLuint yuv_uvTexture;
- (instancetype)initWithGLContext:(GLContext *)context;
- (void)update:(NSTimeInterval)timeSinceLastUpdate;
- (void)draw:(GLContext *)glContext;
@end

...

- (void)draw:(GLContext *)glContext {
    [glContext setUniformMatrix4fv:@"modelMatrix" value:self.modelMatrix];
    bool canInvert;
    GLKMatrix4 normalMatrix = GLKMatrix4InvertAndTranspose(self.modelMatrix, &canInvert);
    [glContext setUniformMatrix4fv:@"normalMatrix" value:canInvert ? normalMatrix : GLKMatrix4Identity];
    [glContext bindTextureName:self.yuv_yTexture to:GL_TEXTURE0 uniformName:@"yMap"];
    [glContext bindTextureName:self.yuv_uvTexture to:GL_TEXTURE1 uniformName:@"uvMap"];
    [glContext drawTrianglesWithVAO:vao vertexCount:6];
}

The other functions are very simple, which is to draw a square and finally cooperate with the Shader to display the video to render the data in YUV format.

Perspective projection matrix

In ARFrame, you can get the texture and camera matrix required for rendering. In addition to these, the perspective projection matrix that matches the real camera is also required. It can make the perspective of the rendered 3D objects look natural.

- (void)session:(ARSession *)session cameraDidChangeTrackingState:(ARCamera *)camera {
    matrix_float4x4 projectionMatrix = [camera projectionMatrixWithViewportSize:self.viewport.size orientation:UIInterfaceOrientationPortrait zNear:0.1 zFar:1000];
    GLKMatrix4 newWorldProjectionMatrix = GLKMatrix4Identity;
    for (int col = 0; col < 4; ++col) {
        for (int row = 0; row < 4; ++row) {
           newWorldProjectionMatrix.m[col * 4 + row] = projectionMatrix.columns[col][row];
        }
    }
    self.worldProjectionMatrix = newWorldProjectionMatrix;
}

The above code demonstrates how to obtain the 3D perspective projection matrix through ARKit. With the perspective projection matrix and the camera matrix, you can easily use OpenGL to render objects.

- (void)glkView:(GLKView *)view drawInRect:(CGRect)rect {
    [super glkView:view drawInRect:rect];

    [self.objects enumerateObjectsUsingBlock:^(GLObject *obj, NSUInteger idx, BOOL *stop) {
        [obj.context active];
        [obj.context setUniform1f:@"elapsedTime" value:(GLfloat)self.elapsedTime];
        [obj.context setUniformMatrix4fv:@"projectionMatrix" value:self.worldProjectionMatrix];
        [obj.context setUniformMatrix4fv:@"cameraMatrix" value:self.cameraMatrix];

        [obj.context setUniform3fv:@"lightDirection" value:self.lightDirection];
        [obj draw:obj.context];
    }];
}

This article mainly introduces the basic idea of ​​​​OpenGL ES rendering ARKit, without describing too many OpenGL ES technical details. If you are interested, you can directly clone the code on Github to learn more.

<<:  Android soft keyboard control method and some problems encountered in development.

>>:  Summarize some knowledge points of Android modularization.

Recommend

NVIDIA launches Battlebox 4K gaming PC console

The only market that is still growing in the enti...

A brief analysis of 6 types of super traffic content! !

Regarding traffic , some people say: There must b...

Three steps to create a valuable corporate WeChat public account

When mini programs are very popular, talking abou...

Alibaba's conspiracy and conspiracy to make up for the "Double 12"

The 50-yuan discount made offline stores crazy. T...

How do brands choose UP hosts on Bilibili for promotion?

Bilibili has special platform attributes. From a ...

Tuzki of the succulent world: I don’t want to, ... grow up!

The aquamarine is definitely one of the most popu...