A brief analysis of the principles of mobile rendering

Author｜ Shang Huaijun

Rendering on a computer or mobile phone is a very complex process. This article introduces some basic knowledge related to rendering, and introduces the principles of mobile rendering in combination with the technical framework of iOS and Android. Finally, it analyzes in detail some methods of off-screen rendering and rounded corner optimization in iOS.

Rendering Basics

The original data source for screen drawing

Bitmap

The raw data we need to draw images on the screen is called a bitmap. A bitmap is a data structure. A bitmap is composed of n*m pixels, and the color information of each pixel is represented by a combination of RGB or grayscale values. According to the bit depth, bitmaps can be divided into 1, 4, 8, 16, 24 and 32-bit images. The more information bits used for each pixel, the more colors are available, the more realistic and richer the color expression, and the larger the corresponding data volume.

Physical pixels and logical pixels

Bitmaps generally store physical pixels, while the application layer generally uses logical pixels. There is a certain correspondence between physical pixels and logical pixels. For example, the correspondence between physical pixels and logical pixels in iOS is as follows:

iOS1 double screen 1pt corresponds to 1 physical pixel
iOS2 double screen 1pt corresponds to 2 physical pixels
iOS3x screen 1pt corresponds to 3 physical pixels

Draw the bitmap to the display

As mentioned above, the raw data needed to draw an image on the screen is called a bitmap. So the question is, how to draw the image on the screen after having the bitmap data? As shown in the figure below: the electron gun scans line by line from top to bottom, and the display presents a frame of the picture after the scan is completed. Then the electron gun returns to the initial position of the screen for the next scan. In order to synchronize the display process of the display and the scanning process of the video controller, the display will use the hardware clock to generate a series of timing signals. When the electron gun changes lines for scanning, the display will send a horizontal synchronization signal; when a frame of the picture is drawn, the electron gun returns to its original position, and before preparing to draw the next frame, the display will send a vertical synchronization signal. The display is usually refreshed at a fixed frequency, and this refresh rate is the frequency generated by the vertical synchronization signal.

CPU, GPU, and display collaborative workflow

The previous part introduced the process of the video controller displaying bitmap data on the physical screen. So how is the bitmap data obtained? In fact, the bitmap data is obtained through the collaborative work of the CPU and GPU. The following figure is a common process of collaborative work between the CPU, GPU, and monitor. The CPU calculates the display content and submits it to the GPU. After the GPU completes the rendering, it stores the rendering results in the frame buffer. Next, the obtained pixel information needs to be displayed on the physical screen. At this time, the video controller (Video Controller) will read the information in the frame buffer and pass it to the monitor (Monitor) for display. The complete process is shown in the figure below:

The Difference Between CPU and GPU

When talking about the collaborative workflow of CPU, GPU, and display, we have to mention the difference between CPU and GPU.

The CPU is a central processing unit, suitable for single complex logic, while the GPU is a graphics processing unit, suitable for high-concurrency simple logic.

GPU has a lot of computing units and a very long pipeline, but the control logic is very simple, and it also omits the cache, which is suitable for operations that do not require low latency. The CPU not only occupies a lot of space in the cache, but also has a very complex control logic. In comparison, the computing power is only a small part of the CPU. Graphics rendering involves a lot of matrix operations, and matrix-related operations can be split into simple parallel operations, so rendering processing is particularly suitable for GPUs.

In summary: GPUs have a large amount of computational work, but the technical content is not high, and they need to be repeated many times. It is like a job that requires adding, subtracting, multiplying and dividing within 100 times hundreds or thousands of times. CPUs are like old professors who can calculate integrals and differentials and are suitable for processing single complex logical operations.

Universal Rendering Pipeline

We usually call the complete process of image drawing the rendering pipeline, which is completed by the collaboration of the CPU and GPU. Generally, a rendering process can be divided into four conceptual stages: application stage, geometry stage, rasterizer stage, and pixel processing stage. In "Real-Time Rendering 4th", various knowledge points of real-time rendering are explained very thoroughly. If you are interested in the principles of rendering, you can read this book, which can be called the "Bible of Real-Time Rendering". The following will briefly introduce these processes.

Application Stage

In short, it is the image processing stage in the application. To put it simply, it is a program running on the CPU, and the GPU has nothing to do at this time. In this stage, the CPU is mainly responsible for processing user interactions and operations, and then doing some processing related to the application layer layout, and finally outputting primitives (points, lines and triangles) information to the next stage.

You may wonder, can simple points, lines and triangles represent rich three-dimensional graphics? The dolphin with a strong sense of three-dimensionality below can give a positive answer. Simple triangles plus different colors can present three-dimensional graphics.

Geometry Stage

1. Vertex Shader

Vertex shaders can perform some basic processing on vertex attributes. They can convert vertex information into perspective, add lighting information, add textures, and so on. The information that the CPU sends to the GPU is like standing at the God's perspective and giving all the information seen from this perspective to the GPU. The GPU, on the other hand, stands at the human perspective and outputs the images that humans can observe on the display. So here, the coordinate conversion is centered on the human perspective.

2. Shape Assembly This stage takes all the vertices output by the vertex shader as input and assembles all the points into the shape of the specified primitive. Primitives are such as points, lines, and triangles. This stage is also called primitive assembly.

3. The Geometry Shader adds additional vertices to the primitives and converts the original primitives into new primitives to build more complex models.

Rasterizer Stage

The rasterization stage converts the primitives processed by the first three geometry stages into a series of pixels.

As shown in the figure above, we can see that there is a point at the center of each pixel. Rasterization uses this center point for division. If the center point is inside the primitive, then the pixel corresponding to this center point belongs to the primitive. In short, this stage converts continuous geometric figures into discrete pixel points.

Pixel Processing

1. Fragment Shader

After the above rasterization stage, we get the pixels corresponding to each primitive. The last thing to do in this stage is to fill each pixel with the correct color, and then through a series of processing calculations, get the corresponding image information, and finally output it to the display. Interpolation will be done here, just like interpolation animation. For example, if you want to connect a series of scattered points into a smooth curve, there may be many missing points between adjacent known points. At this time, you need to fill the missing data through interpolation. In the end, all points on the smooth curve except the known points are interpolated. Similarly, after the three role values of the triangle are given, the other fragments are calculated based on the interpolation, which presents a gradient effect.

2. Tests and Blending

This stage checks the corresponding depth value (z coordinate) to determine whether the pixel is in front of or behind other layer pixels and decides whether it should be discarded. In addition, this stage also checks the alpha value (the alpha value defines the transparency of a pixel) to blend the layers. (In short, it checks the layer depth and transparency and blends the layers.)

 R = S + D * (1 - Sa)

 meaning:
 R: Result, final pixel color.
 S: Source, source pixels (pixels of the layer above).
 D: Destination, target pixel (the layer pixel below).
 a: alpha, transparency.

 Result = Color of S(top) + Color of D(bottom) * (1 - Transparency of S(top))

After going through the long pipeline above, we can get the original data source required for screen drawing - bitmap data, and then the video controller displays the bitmap data on the physical screen.

iOS rendering principle

Rendering technology stack

After laying out some basic knowledge related to rendering, the following mainly introduces some principles and knowledge related to iOS rendering. The figure below shows the iOS graphics rendering technology stack, which has three related core system frameworks: Core Graphics, Core Animation, and Core Image. These three frameworks are mainly used to draw visual content. They all use OpenGL to call the GPU for actual rendering, and then generate the final bitmap data and store it in the frame buffer. The video controller then displays the frame buffer data on the physical screen.

UIKit

UIKit is the most commonly used framework for iOS developers. You can draw the interface by setting the layout and related properties of UIKit components. However, UIKit does not have the ability to image on the screen. This framework is mainly responsible for responding to user operation events (UIView inherits from UIResponder), and the events are transmitted through the response chain.

Core Animation

Core Animation is mainly responsible for combining different visual contents on the screen. These visual contents can be decomposed into independent layers, which are CALayer that we often come into contact with in daily development. These layers are stored in the layer tree. CALayer is mainly responsible for page rendering, which is the basis of everything the user can see on the screen.

Core Graphics

Core Graphics is mainly used for drawing images at runtime. Developers can use this framework to handle path-based drawing, transformations, color management, off-screen rendering, patterns, gradients, and shadows, and more.

Core Image

Core Image is the opposite of Core Graphics. Core Graphics creates images at runtime, while Core Image creates images before runtime.

OpenGL ES and Metal

OpenGL ES and Metal are both third-party standards, and the specific internal implementations based on these standards are developed by the corresponding GPU manufacturers. Metal is a third-party standard of Apple, implemented by Apple. Many developers have not used Metal directly, but indirectly use Metal through core system frameworks such as Core Animation and Core Image.

The relationship between CoreAnimation and UIKit framework

Core Animation, mentioned in the rendering framework above, is the basic framework for graphics rendering and animation on iOS and OS X. It is mainly used to animate views and other visual elements of applications. The implementation logic of Core Animation is to hand over most of the actual drawing work to the GPU for accelerated rendering, which will not burden the CPU and can achieve smooth animation. The core class of CoreAnimation is CALayer, and the core class of the UIKit framework is UIView. The relationship between these two classes is introduced in detail below.

The relationship between UIView and CALayer

As shown in the figure above, UIView and CALayer have a one-to-one correspondence. Each UIView has a corresponding CALayer, one is responsible for layout and interactive response, and the other is responsible for page rendering.

Their two core relationships are as follows:

CALayer is one of the properties of UIView, responsible for rendering and animation, and providing presentation of visual content.
UIView provides encapsulation of CALayer functions and is responsible for handling interactive events.

To give a more vivid example, UIView is the drawing board, and CALayer is the canvas. When you create a drawing board, it will automatically be bound to a canvas. The drawing board will respond to your operations, such as you can move the drawing board, and the canvas is responsible for presenting specific graphics. The two have clear responsibilities. One is responsible for interaction, and the other is responsible for rendering and drawing.

Why separate CALayer and UIView?

The user interaction methods on iOS and MacOS are fundamentally different, but the rendering logic is universal. In iOS system, we use UIKit and UIView, while in MacOS system, we use AppKit and NSView. Therefore, in this case, the display logic is separated and reused across platforms.

The contents property in CALayer stores the bitmap rendered by the device rendering pipeline (usually called the backing store), which is the original data source required for screen drawing. When the device screen is refreshed, the generated bitmap is read from CALayer and then presented to the screen.

 @interface CALayer: NSObject <NSSecureCoding, CAMediaTiming>
 /** Layer content properties and methods. **/

 /* An object providing the contents of the layer, typically a CGImageRef,
 * but may be something else. (For example, NSImage objects are
 * supported on Mac OS X 10.6 and later.) Default value is nil.
 * Animatable. */

 @property(nullable, strong) id contents;
 @end

Core Animation Pipeline

In fact, as early as WWDC's Advanced Graphics and Animations for iOS Apps (WWDC14 419, a session about UIKit and Core Animation basics), Apple gave the rendering pipeline of the CoreAnimation framework. The specific process is shown in the figure below:

In the entire pipeline, the app itself is not responsible for rendering. Instead, rendering is handled by an independent process, namely the Render Server process. The following will introduce the entire pipeline process in detail.

Application Phase

View Creation
Layout calculation
Pack the layers and send them to the Render Server at the next RunLoop
The app processes the user's click operation. During this process, the app may need to update the view tree. If the view tree is updated, the layer tree will also be updated.
Secondly, the app uses the CPU to calculate the displayed content

Render Server & GPU

This stage mainly executes metal, Core Graphics and other related programs, and calls the GPU to complete the rendering of the image on the physical layer.
GPU stores the rendered bitmap data in the Frame Buffer

Display

The video controller displays the bitmap data in the frame buffer on the physical screen frame by frame.

If you string the above steps together, you will find that they take more than 16.67 ms to execute. Therefore, in order to support the screen's 60 FPS refresh rate, these steps need to be executed in parallel through a pipeline, as shown in the figure below. Each stage continuously delivers products to the next stage. At this time, the requirement of generating one frame of data in 16.67 milliseconds can be met.

Android rendering principle

Android upper display system

An important responsibility of an Android Activity is to manage the life cycle of the interface, which is accompanied by the management of the view window. This involves two main services in Android, AMS (ActivityManagerService) and WMS (WindowManagerService).

In Android, a view has a corresponding canvas. The view tree corresponds to a canvas tree, and Surfaceflinger controls the synthesis of multiple canvases. Finally, the rendering is completed and the bitmap data is output and displayed on the mobile phone screen.

Application layer layout

View and ViewGroup

View is the base class of all controls in Android. The View class has a very important subclass: ViewGroup, which is used as a container for other views. All Android UI components are built on the basis of View and ViewGroup, and the overall concept of "combination" is adopted to design View and ViewGroup: ViewGroup is a subclass of View, so ViewGroup can also be used as View. The graphical user interface of an Android app will correspond to a view tree, and the view tree corresponds to a canvas tree. This is somewhat similar to the concept of UIView and CALayer in iOS, one is responsible for application layer layout, and the other is responsible for underlying rendering.

System bottom layer rendering display

The view of the application layer corresponds to the canvas, and the canvas becomes a layer in the system process. SurfaceFlinger mainly provides layer rendering and synthesis services. SurfaceFlinger is a resident binder service that will be started when the init process starts. The following figure details the conversion of the upper view to the underlying layer, as well as SurfaceFlinger's rendering and synthesis of multiple layers.

iOS off-screen rendering

Off-screen rendering principle and definition

First, let's introduce the principle of off-screen rendering. Our normal rendering process is: the CPU and GPU work together to continuously put the bitmap data obtained after content rendering into the Framebuffer (frame buffer), and the video controller continuously obtains content from the Framebuffer to display real-time content.

The process of off-screen rendering is as follows:

Unlike the normal situation where the GPU directly puts the rendered content into the Framebuffer, off-screen rendering requires first creating an additional off-screen rendering buffer, putting the pre-rendered content into it, and then further overlaying and rendering the content in the Offscreen Buffer when the time is right. After completion, the result is written into the Framebuffer.

Why do we need to store data in an off-screen render buffer first? There are two reasons, one is passive and the other is active.

Some special effects require the use of an additional Offscreen Buffer to save the intermediate state of rendering (passive)
For efficiency purposes, content can be rendered in advance and saved in the Offscreen Buffer to achieve reuse. (Active)

Passive off-screen rendering

Common scenarios that trigger passive off-screen rendering

Transparency, shadows, and rounded corners are often referred to as the three treasures of UI, but these effects often lead to passive off-screen rendering in the daily development process of iOS. The following are several common scenarios that trigger passive off-screen rendering.

What triggers off-screen rendering?

The reason for off-screen rendering cannot be overstated without mentioning the painter algorithm. The overall idea of the painter algorithm is to draw by layer, first drawing the farther scene, and then using the closer scene to cover the farther part. The layer here can be mapped to layer in the iOS rendering technology stack.

Usually, for each layer, the Render Server will follow the "painter's algorithm" and output it to the frame buffer in order. The next layer covers the previous layer to get the final display result. For this layer tree, the layer is output to the frame buffer using a depth-first algorithm.

Although the GPU, as a "painter", can output layer by layer to the canvas, it cannot go back and change a part of it after a layer is rendered. This is because the pixel data of several layers before this layer have been synthesized together during rendering. In fact, this is very similar to the layer merging in Photoshop. Once multiple layers are merged together, it is impossible to modify a single layer. Therefore, it is necessary to draw the sub-layers in the off-screen buffer one by one, and then crop the four corners before blending with the previous layer.

Performance impact of GPU off-screen rendering

When we talk about off-screen rendering, we intuitively feel that it will affect performance. Because in order to meet the refresh rate of 60fps, the GPU operations are highly pipelined. Originally, all the calculations were being output to the frame buffer in an orderly manner. Suddenly, some special effects triggered off-screen rendering, and the context needed to be switched to output the data to another memory. At this time, many intermediate products in the pipeline can only be discarded. This frequent context switching has a very large impact on the GPU's rendering performance.

How to prevent unnecessary off-screen rendering?

For some rounded corners, you can create four background color arc-shaped layers to cover the four corners, creating a visual rounded effect.
For the circular border of the view, if there is no backgroundColor, you can safely use cornerRadius to do it.
For all shadows, use shadowPath to avoid off-screen rendering
For views with special shapes, use a layer mask and turn on shouldRasterize to cache the rendering results.

Optimization strategy for rounded corners

Using CALayer's cornerRadius and setting cliptobounds will trigger offscreen rendering. Clipping operations need to be performed on 60 frames per second when scrolling, even if the content has not changed. The GPU must also switch contexts between each frame, synthesize the entire frame and crop. These performance consumptions directly affect the Render Server, an independent rendering process, causing frame drops. In order to optimize rendering performance, we can choose some other solutions to achieve rounded corners. The following are the conditions that need to be considered for the specific implementation of rounded corners.

Conditions to be considered for the specific implementation of rounded corners

Is there any sliding under the rounded corner (movement underneath the corner).
Whether there is movement through the corner.
Are the four rounded corners on the same layer and do they intersect with other sub-layers?

Specific implementation of rounded corners

How to select the implementation solution of rounded corners according to the corresponding conditions

The above mentioned the conditions to be considered for the optimization of rounded corners and the different solutions for implementing rounded corners. The flowchart below matches the conditions and solutions and gives the best solution for implementing rounded corners.

Summarize

This article mainly introduces the relevant content of mobile rendering principles. The article starts with an introduction to the basic knowledge related to rendering, and talks about the original data source required for rendering - bitmaps, and how the CPU and GPU work together to obtain bitmap data. Later, the relevant principles of mobile rendering are introduced in combination with the technical frameworks of iOS and Android. Finally, an in-depth analysis of off-screen rendering in iOS is given, and some existing solutions for rounded corner optimization are explained.

<<: Apple will truly achieve password-free login. How will it do it?

>>: iOS 16 adds Find My, Health, and Clock to the list of deletable apps