When using Github, have you ever seen the following prompt?
This prompt says that there are a total of 4350078 objects in the remote code repository that need to be cloned. This is called "counting objects". Github needs to calculate the total number of objects that need to be cloned in real time. This process is very slow. According to Github, it takes 8 minutes to count a huge library like the Linux kernel! That is to say, after issuing the git clone command, you will wait for eight minutes before the actual data transfer begins. This is of course unbearable. The Github team has been trying to solve this problem. Later, they finally discovered a new algorithm, and now it only takes 3 milliseconds to count once! In order to understand this algorithm, you must first know what Git objects are. Simply put, objects are files, and there are three most important objects.
Every time you submit code, a commit object is generated, which contains the name of the corresponding current "directory object". The "directory object" stores the subdirectories and file information contained in the code root directory. Each subdirectory is another "directory object", and each file is a "file object" with specific file content. Therefore, "counting objects" means counting various commits, directories, files, etc. Both git clone and git fetch operations require counting objects because you need to know which object files to download. The original algorithm for counting objects is as follows.
The above process shows that "counting objects" is a file traversal algorithm. Changed objects will be counted one by one, which means a large number of file read operations. For large code bases, this process is very slow. The new algorithm that the Github team came up with is to create a Bitmap index, that is, to generate a binary value for each commit. Open the .git/objects/pack/ directory of your local Github repository, and you will see an index file and a data file, which are Bitmaps. Simply put, these two files index all objects in the current code base, and then use a binary value to represent these objects. The binary value has as many bits as there are objects. Its nth bit represents the nth object in the data file. Each commit will have a corresponding binary value, which represents all objects contained in the current snapshot. The binary bits corresponding to these objects are all 1, and the other binary bits are all 0. The advantage of this is that you don't need to read the commit object, just read the binary value to know which nodes are included in the current commit. Even better, you only need to do an XOR operation on the two binary values to know which bits (that is, which objects) have changed. Moreover, because new objects are always added after the existing binary bits, you only need to read the extra bits to know which objects are added to the current commit compared to the previous commit. In this way, "counting objects" becomes a binary value comparison operation, so the speed is extremely fast. For further introduction, please refer to the official documents "Bitmap Explanation" and "Bitmap Format". Currently, Github's production environment has deployed this algorithm, and users no longer have to wait for object counting. In addition, the Github team has merged it into Git, which means that all Git implementations can use the Bitmap function from now on, so there will definitely be more interesting uses in the future. (over) |
<<: Game Framework Construction - Setting the Vision
>>: 7 Top HTML5 Canvas Animations
The "Cocos 2015 Developer Conference (Spring...
Some time ago, Google caused an uproar by disclos...
As the 2022 National People’s Congress and the Ch...
It has been exactly four years since I graduated....
The failure of the Peregrine mission means that t...
The invitation registration mechanism is a functi...
Recently, Aion LX has ushered in a new OTA upgrad...
When it comes to whales and dolphins, many people...
Whether you are at work or at home recently In fa...
"Cassava helps you sleep. You can sleep for ...
⚠Calendar Girl Front Row Reminder⚠ This article i...
Hello everyone, I’m very happy to share with you....
Social data, search data, and multi-line product ...
Compiled by Zhou Shuyi and Pingsheng Microsoft cl...