When using Github, have you ever seen the following prompt?
This prompt says that there are a total of 4350078 objects in the remote code repository that need to be cloned. This is called "counting objects". Github needs to calculate the total number of objects that need to be cloned in real time. This process is very slow. According to Github, it takes 8 minutes to count a huge library like the Linux kernel! That is to say, after issuing the git clone command, you will wait for eight minutes before the actual data transfer begins. This is of course unbearable. The Github team has been trying to solve this problem. Later, they finally discovered a new algorithm, and now it only takes 3 milliseconds to count once! In order to understand this algorithm, you must first know what Git objects are. Simply put, objects are files, and there are three most important objects.
Every time you submit code, a commit object is generated, which contains the name of the corresponding current "directory object". The "directory object" stores the subdirectories and file information contained in the code root directory. Each subdirectory is another "directory object", and each file is a "file object" with specific file content. Therefore, "counting objects" means counting various commits, directories, files, etc. Both git clone and git fetch operations require counting objects because you need to know which object files to download. The original algorithm for counting objects is as follows.
The above process shows that "counting objects" is a file traversal algorithm. Changed objects will be counted one by one, which means a large number of file read operations. For large code bases, this process is very slow. The new algorithm that the Github team came up with is to create a Bitmap index, that is, to generate a binary value for each commit. Open the .git/objects/pack/ directory of your local Github repository, and you will see an index file and a data file, which are Bitmaps. Simply put, these two files index all objects in the current code base, and then use a binary value to represent these objects. The binary value has as many bits as there are objects. Its nth bit represents the nth object in the data file. Each commit will have a corresponding binary value, which represents all objects contained in the current snapshot. The binary bits corresponding to these objects are all 1, and the other binary bits are all 0. The advantage of this is that you don't need to read the commit object, just read the binary value to know which nodes are included in the current commit. Even better, you only need to do an XOR operation on the two binary values to know which bits (that is, which objects) have changed. Moreover, because new objects are always added after the existing binary bits, you only need to read the extra bits to know which objects are added to the current commit compared to the previous commit. In this way, "counting objects" becomes a binary value comparison operation, so the speed is extremely fast. For further introduction, please refer to the official documents "Bitmap Explanation" and "Bitmap Format". Currently, Github's production environment has deployed this algorithm, and users no longer have to wait for object counting. In addition, the Github team has merged it into Git, which means that all Git implementations can use the Bitmap function from now on, so there will definitely be more interesting uses in the future. (over) |
<<: Game Framework Construction - Setting the Vision
>>: 7 Top HTML5 Canvas Animations
E-commerce ERP is a pioneer of domestic SAAS ERP ...
01 What is orchid fragrance? Orchids, a species o...
When you enter a product, you are actually enterin...
Normal 0 10 pt 0 2 false false false EN-US ZH-CN ...
The latest unmanned live broadcast song guessing ...
[[226318]] The next generation of Android operati...
Every year at the beginning of the school year, t...
【Written at the end】 Some patients with myocardia...
Compiled by: Gong Zixin Potatoes are high in star...
Black Friday has always been considered the bigge...
Introduction The apps that were rejected this tim...
Cook finally came out. After years of suspicion, ...
Paid promotion ROI is the goal that every operati...
Douyin short video sales tutorial, earning 100,00...
Systemic light chain (AL) amyloidosis is a rare d...