A brief talk about GPU web-based GPU

A brief talk about GPU web-based GPU

Part 01

WebGPU R&D Background  

In the early days, when using GPU modules to develop Web applications, developers mostly used the WebGL API released in 2011 for graphics drawing. This API is based on OpenGL ES and was the only choice for underlying GPU graphics drawing on the Web for a period of time. The addition of programmable GPU language gave it a certain advantage over Canvas2D in terms of performance in certain drawing tasks. This API can only be used after obtaining the WebGL context through the canvas element. Its state machine-style API calls designed with internal global states as the center have been criticized by developers. Developers must carefully construct the API call sequence (procedural calls) and manage the opening and recovery of states to ensure correct drawing results. At the same time, this has led to performance overhead to a certain extent.

With the development of science and technology, GPU is no longer exclusive to graphics rendering applications. It shines in different fields such as metaverse, machine learning, big data, neural network, etc. With the increasing demand for computing power, the role of GPU is becoming more and more important. At the same time, a new generation of graphics API (Vulkan, Metal, DirectX12) has appeared on the desktop. They adopt object-oriented design solutions to provide developers with more low-level interface access, more GPU usage rights, flexible API calling methods and general parallel computing capabilities, allowing developers to maximize the performance of GPU.

The Web side also needs these capabilities. Based on the design concept of modern graphics API, WebGPU came into being. It is not an upgrade of WebGL. WebGPU has its own unique abstract design and does not directly encapsulate a specific graphics API. The following is a schematic diagram of the WebGPU architecture.

Part 02

Important concepts in WebGPU  

2.1 Adapters and Devices

When you start to understand the relevant specifications of WebGPU, the first thing you come into contact with is the concept of adapter and device. The following figure shows the abstract architecture from physical device (GPU) to logical device.

Adapter, i.e. GPUAdapter. One physical GPU device corresponds to one GPUAdapter. A computer may have multiple GPU devices (integrated graphics and discrete graphics). The adapter acts as a translator to link WebGPU with the native graphics API. The corresponding GPUAdapter can be obtained in the following ways.

The device here, GPUDevice, is a concept of logical device and does not correspond to a real GPU. GPU is a shared resource. The browser can run multiple Web applications. Each Web application can use the GPU independently. An agent-like role is needed to help multiple independent Web applications use GPU-related functions. This is the role of WebGPU devices. GPUDevice objects are important objects for subsequent use of related APIs. In a sense, it is very similar to the concept of WebGL context, but it is not strongly related to canvas. Get GPUDevice in the following way.

2.2 Shaders

A shader is a program that runs on the GPU. Modern GPU rendering is implemented through a pipeline (programmable logic pipeline), and the shader code is executed at a certain stage (programmable part) of the pipeline execution. If you have learned about WebGL, you may know about vertex shaders and fragment shaders. The application organizes data resources and passes them to the shader in the form of variables (unifrom/attribute). The shader runs and passes the execution results to the next stage for processing.

Shaders are important tools for developers to control GPUs. Complex calculations, scene effects, image processing, etc. can all be handled by shader programs. WebGPU not only contains vertex shaders and fragment shaders, but also has the ability to perform general parallel calculations, namely compute shaders. It is carried by the WebGPU computing pipeline (the concept of pipeline is introduced below) and has more powerful computing capabilities than WebGL. WebGL uses GLSL language (the language used by OpenGL) to implement shader code, while WebGPU has a redesigned shader language WGSL. The following is an example of creating a shader code and the corresponding module (GPUShaderModule).

2.3 Resources (buffers, textures, samplers)

In the above shader example, some variables are defined, such as unfiorms, uTexture, uSampler, aPosition, aUv, etc. The values ​​of these variable parameters correspond to the data resources of the external application. These data will be stored in the video memory and will eventually be passed to the shader program to run to get the corresponding results. Data resources can be roughly divided into four categories: vertex attribute data, shader variable (uniform buffer) data, texture data (texture), and sampler (sampler).

Vertex attribute data mainly stores vertex position coordinates, normal vectors, texture coordinates (for sampling textures), etc., which are necessary for basic drawing. Shader variable data is the general data required for shader program operation, such as affine transformation matrix, scene lighting parameters, material parameters, etc. Texture data is more used to store image resources and is often used to achieve mapping effects when drawing. Sampler is a special resource that specifies the required texture encoding and filtering methods, such as texture enlargement and reduction, anisotropic filtering, minmap generation, etc. For vertex attribute data and shader variable data, they are mainly mapped to GPUBuffer, namely vertex buffer object (VBO) and uniform buffer object (UBO), texture data corresponds to GPUTexture, and sampler is GPUSampler object. These three types of resources are created by GPUDevice. The following are examples of creating each type of resource.

The creation of GPUBuffer uses the buffer mapping mechanism. When a certain video memory is mapped, the CPU can access it. In the above example, when creating GPUBuffer, mappedAtCreation is set to true to start the mapping mechanism, and the mapping is ended after the data is set.

2.4 Binding Group

In the above example, GPUBuffer objects for storing vertex attributes, GPUBuffer objects for storing uniform variables, GPUTexture objects for storing image resources, and sampler objects are created respectively. For the GPUBuffer objects of vertex attributes, how they are passed to the GPU will be explained in the subsequent pipeline and command encoding modules. For the three resources mentioned later (shader variables, textures, and samplers), they need to be submitted to the GPU in an effective way. For this purpose, WebGPU proposes the concept of binding groups, namely GPUBindGroup, which is a data container used to group some data resources and pass them to the shader program, which can efficiently organize and allocate data. The grouped data organization form can reduce the number of CPU and GPU communications, thereby improving performance. At the same time, it is also convenient for shaders with different behaviors to share the same grouped resources and realize resource reuse. The following figure shows the different data organization and transmission forms of WebGL and WebGPU.

As can be seen from the figure above, the WebGL API design is implemented around the internal global state setting. The resources are bound to the binding points one by one through the API functions, which essentially changes the internal global state. WebGPU puts the resource data into a data container and sends it to the GPU through command submission (introduction to encoders and queues). Creating a GPUBindGroup requires a corresponding descriptor, whose structure is as follows.

The binding group has a corresponding layout (GPUBindGroupLayout). The layout describes to the shader program the type (type) of a resource, the group (group) it belongs to, the corresponding binding point (binding), and the shader program (visibility) used for a specific stage. If you look closely at the examples given in the shader section above, you will find declarations such as @group(0) @binding(0), which means that the resource is bound to binding point 0 of group 0. The binding layout needs to be filled in according to the settings in the shader program. The GPUBindGroupEntry object indicates a binding bit, and the resource data created by WebGPU will be attached to this binding bit (specified in the resource field). The following is a simple example of GPUBindGroup creation. We package the previously created GPUBuffer object, sampler, and texture objects into a binding group object.

2.5 Pipeline

After completing the creation of the shader module and the preparation of data resources, an important task is to build the pipeline. When most developers start learning graphics rendering, they first come into contact with the concept of rendering pipeline, which is an important mechanism for modern image rendering. However, this important concept is not reflected in the design of the WebGL API. The fragmented API organization makes it difficult for beginners to link each step with the GPU pipeline. WebGL requires developers to organize the execution process of the application by themselves, so you will see API designs such as gl.bindVertexArray, gl.bindBuffer, gl.bindTexture, and gl.useProgram. Different resources or states are bound according to different needs, so as to achieve the drawing of different objects or effects. The pipelines in WebGPU are divided into rendering pipelines and computing pipelines.

As the name implies, the rendering pipeline (GPURenderPipeline) is a pipeline used for drawing. Through the function of this pipeline, a 2D image will eventually be generated. The image can be displayed on the screen or rendered to the frame buffer. Creating a GPURenderPipeline requires a corresponding descriptor, and its structure is as follows.


The GPUVertexState and GPUFragmentState fields represent the vertex shader and fragment shader programmable stages respectively. GPUPrimitiveState is used to specify the primitive assembly form, which primitive type is used for drawing during rasterization. GPUDepthStencilState is used to describe the depth stencil test information. GPUMultisampleState specifies multi-sampling, which is used to handle the aliasing effect. The following is an example of creating a GPURenderPipeline.


The above example shows that the two shader modules generated previously are configured in the rendering pipeline, and also describes the layout of the vertex attributes (mentioned in the resource section) in the shader. In the vertex shader, there are two definitions, @location(0) aPosition and @location(1) aUv, which represent the position attribute and uv coordinate attribute of the incoming vertex respectively. Location(0) and location(1) correspond to the shaderLocation in the pipeline configuration.

WebGL is just a graphics API in most cases, and it is rarely used for other things, such as calculations. The emergence of the Compute Pipeline gives WebGPU "computing power". It is not part of the traditional rendering pipeline and is used for GPU parallel computing. The final result is stored in a buffer, which can store any type of data. The compute pipeline has only one compute stage. Creating a GPUComputePipeline requires a corresponding descriptor, and its structure is as follows.

GPUProgrammableStage indicates that this is a programmable stage, similar to GPUVertexState and GPUFragmentState. The processing of each vertex requires a call to the vertex shader, the fragment shader performs the processing of each pixel, and the compute shader is called according to the work items defined by the developer, and each work item corresponds to a thread. The collection of work items is divided into work groups, which are a group of threads (thread blocks) that can share memory, communicate with each other, and coordinate operations. In WebGPU, the work group is simulated as a three-dimensional grid, as shown in the figure below.

Each minimum cube (black edge) can be regarded as a work item, and multiple work items are grouped into a work group (red dashed edge). In the compute shader code, you can see declarations such as @workgroup_size(x, y, z), which tells the GPU how big the work group of this compute shader is. The setting of the work group size (workdgroup_size) depends on the work item coordinate semantics in most cases. The following figure shows a simple GPUComputePipeline creation example.

This is a simple example of image grayscale histogram statistics. Through GPU parallel architecture processing, we can ignore the traversal statistics of image pixels and greatly speed up the calculation.

2.6 Command Encoding and Queuing

The above work can be regarded as the preparation stage, which mainly involves data preparation and pipeline construction. When performing the final drawing or calculation, it needs to be implemented in the form of commands and queues. The command encoder (GPUCommandEncoder) has two main common functions: creating a pass encoder and copying buffer resources (GPUBuffer/GPUTexture). GPUCommandEncoder is created by the device object shape, as follows:

WebGPU channels are divided into render pass and compute pass, corresponding to the rendering pipeline and the computing pipeline. The two types of channel objects are created and started through the corresponding methods (beginRenderPass/beginComputePass) on the GPUCommandEncoder object in combination with their own descriptors, and finally the channel encoder object GPURenderPassEncoder/GPUComputePassEncoder is obtained. This type of encoder is an abstract concept in the design of WebGPU API and a substitute for the global state setting of WebGL. Through the encoder object, you can set the required pipeline, binding group, vertex attribute buffer and call the draw/dispatch function for drawing or calculation. The following is an example of the use of the encoder object.

After calling the finish function, the GPUCommandEncoder object will get a command buffer object (GPUCommandBuffer), which is used to store GPU commands. The submission of these commands is implemented through the command queue (GPUQueue), as follows:

Part 03

Conclusion  

As a brand-new API, WebGPU has injected new vitality into Web application development. It has achieved progress from graphics rendering to general parallel computing, making GPU an important role in Web applications and the key to building high-performance applications in the future.

<<:  Apple disables ChatGPT to prevent confidentiality leaks! A large-scale version of Siri will be upgraded and launched soon

>>:  Troubleshooting and solutions for wild pointer issues in Dewu H5 container

Recommend

Even Apple, a closed-source software company, had to open source Swift.

At Apple's WWDC 2015, the biggest attention w...

How long does it take to settle on the WeChat Pay merchant platform?

WeChat has become an indispensable part of our li...

6 times of submission for review, our application's review journey!

This is the third time I have woken up in the mid...

30 years of conflicting astrocyte research finally comes to a conclusion

Astrocytes are an important component of the nerv...

These 14 Qixi Festival copywritings will torture you thousands of times.

Qixi Festival is coming soon, and your Qixi Festi...

Tips for placing massive information flow ads!

Massive information flow is the first tier of the...

Tesla's global Internet outage paralyzed its cars, netizens: nothing new

Tesla’s Battery Day had just ended when tragedy s...

This galaxy is "bombarding" another galaxy with plasma streams

A weird black hole is spewing plasma into a nearb...