[51CTO.com original article] Huang Dongxu, the author of the famous open source distributed cache service Codis, co-founder and CTO of PingCAP, and a senior infrastructure engineer, is good at the design and implementation of distributed storage systems and a technical god-level figure among open source fanatics. Even in today's prosperous Internet, in the database, which has a blurred and uncertain boundary, he is still trying to find a deterministic practical direction. In the parallel world of databases, Huang Dongxu is following his heart in different ways. He believes that when traditional relational databases cannot meet the needs of massive data processing and analysis, a new round of demand will open up, but various disadvantaged architectures, memory architectures, NoSQL and other solutions cannot meet his ideal solution. These are not beautiful enough and rarely achieve perfect distributed transactions and elastic expansion. Huang Dongxu seemed to be a contradiction between absolute rationality and sensibility. It was not until the end of 2012 that he saw two papers published by Google, which, like a prism, reflected the glimmer of his inner light. These two papers described a massive relational database F1/Spanner used internally by Google, which solved the problems of relational databases, elastic expansion, and global distribution, and was used on a large scale in production. "If this can be realized, it will be a subversive change in the field of data storage," Huang Dongxu was excited about the emergence of a perfect solution, and PingCAP's TiDB was born on this basis. Of course, every step forward requires tremendous effort. Before launching the TiDB project, Huang Dongxu first completed an open source distributed Redis cluster solution Codis. After the completion of this project, they felt that although there was a solution to the horizontal expansion problem of the cache, the underlying relational database (mainly MySQL) did not have an elegant expansion solution. In addition to sharding libraries and tables at the business layer, or using compromise solutions such as middleware, the industry does not have many other ways. Some businesses may be able to migrate to NoSQL, such as HBase, C*, etc., but many businesses cannot be smoothly migrated, and almost all logic needs to be rewritten. If the solution of sharding libraries and tables and middleware is adopted, the expansion and high availability solutions will bring a lot of additional operation and maintenance costs, such as the inability to use cross-shard joins, subqueries, cross-row transactions, etc. However, as a basic software engineer, Huang Dongxu and his team did not want to pass these complexities on to the business layer, so they began to re-examine the entire database, hoping to fundamentally solve the MySQL expansion problem rather than creating a new middleware. “It would be a great feeling if I could create something completely new and make it productive one day!” In 2012 and 2013, Huang Dongxu and his colleagues began to study a series of papers published by Google on the new generation of distributed databases Spanner and F1, as well as related academic progress. By 2015, they felt that they had basically thought through all the technical issues and architectures, so they decided to start over full-time to completely implement a new database, which is today's protagonist - the next generation open source NewSQL database TiDB. Of course, creation does not mean the beginning. It requires unlimited investment and unlimited game to adapt to the competition and scrutiny of the Internet. Only when developers and enterprises can they truly benefit from it can it be the real beginning. The overall architecture of TiDB is basically based on the design of Google Spanner and F1, and is divided into two layers: TiDB and TiKV. TiDB corresponds to Google F1, which is a stateless SQL layer that is compatible with most MySQL syntax, exposes the MySQL network protocol, is responsible for parsing the user's SQL statements, generates distributed Query Plans, translates them into underlying Key Value operations and sends them to TiKV. TiKV is the real place to store data, which corresponds to Google Spanner and is a distributed Key Value database that supports elastic horizontal expansion, automatic disaster recovery and failover (high availability), and ACID cross-row transactions. It is worth mentioning that TiKV does not rely on the underlying distributed file system like HBase or BigTable, and can be better in performance and flexibility, which is very important for online business. ▲ TiDB overall architecture This group of people have a lot of ideals and are not confused by the harsh reality. In the process of choosing the development language of TiDB, they gave up Java and adopted Go. The entire TiDB project is divided into two layers. TiDB is the SQL layer, developed in Go, and TiKV is the underlying distributed storage engine, developed in Rust. The architecture is similar to FoundationDB, which is also based on a two-layer structure. FoundationDB's SQL layer uses Java and the underlying layer is C++, but it was acquired by Apple last year. The choice of programming language does not involve too much personal preference. Go is chosen for the SQL layer compared to Java: The first is that their team's background in Go makes development more efficient and has decent performance, especially for highly concurrent programs, where tools such as goroutine/channel can be used to write correct programs with less code; Second, many packages in the standard library are very friendly to network program development, which is very important for a distributed system; Third, the underlying storage engine has very high performance requirements. After all, Go is a language with GC and Runtime. There are not many options at the TiKV layer. In the past, there were basically only C or C++. However, with the maturity of the Rust language in the past two years, and after a long period of thinking and a lot of experiments, their team finally chose Rust. Rust is a static language that aims to replace C++. Its biggest feature is that it uses many syntax restrictions to prevent developers from writing programs with memory leaks and data races. It solves many problems at compile time, so that there is no need to spend extra money on GC at runtime, ensuring high performance. Therefore, writing safe programs is a big pain point for C++ programs. Although there are many improvements in C++ 11, due to the heavy historical burden or the uneven level of third-party package library developers, the important reason is not because of anything else, but because they are not a team with a deep C++ background, so they finally gave up C++ 11 and chose Rust. Rust is not only safe and high-performance, but also has a more modern syntax and higher development efficiency. It also has a very complete package management mechanism (Cargo), which allows you to write very high-performance and safe programs while not reducing development efficiency much compared to Go. It is a very correct choice for now. As one of the world's largest open source projects in the Rust community, it has also received great support from the official Rust language team. Huang Dongxu said that the Rust team will give high priority to developing or promoting some third-party libraries they need in the community. In addition, Rust has already released 1.0, and its syntax has long been stable. It is a very promising system programming language. After making his presence felt in Google, Huang Dongxu was still running on the endless grassland. He believed that only by focusing and concentrating could he get rid of the confusing interference. After continuous exploration, he finally found a way to implement the transaction model. TiDB's transaction model is based on Google's Percolator. The paper was published in 2010 and describes how Google built an ACID cross-row transaction framework on BigTable to ensure the consistency of index updates. The core idea of the algorithm is two-phase commit, but the problem with traditional distributed two-phase commit is that a single-point transaction manager cannot be expanded and will become a bottleneck for the entire system. Percolator uses a two-level lock mechanism to implement a decentralized transaction manager, which greatly improves the scalability of the entire system. ▲ Google Percolator internal implementation TiDB applies this model to the underlying storage engine and has made many engineering optimizations. Huang Dongxu gave an example, saying that the throughput of the timing service has been greatly improved through means such as batch and pipeline, and that the use of Raft + RockDB to replace the original BigTable has better performance. In addition, the optimistic transaction mechanism is used to pursue higher throughput, but this is achieved from an algorithmic level and is implemented by Percolator. TiDB vs. NOSQL Compared with these NoSQLs, the biggest feature of TiDB is that its programming interface is SQL. SQL is a more flexible way for developers to operate databases, and it is highly compatible with MySQL. The original MySQL of the business can be switched to TiDB without modifying almost a line of code. While supporting SQL, TiDB has not lost the elastic expansion capability of systems such as HBase. The business layer no longer needs to worry about the capacity of the database, consider sharding, and invest a lot of operation and maintenance power as in the past. Expansion only requires adding machines. Storage node failures are transparent to the business, and the database itself has the ability to self-repair to ensure that data will not be lost. The same is true for MongoDB. More importantly, there is no need to change users' existing habits and programs. In order to define the future form of cloud databases, TiDB is designed to scale to more than 1,000 physical nodes in a single cluster, support P-level capacity, and store more than one trillion rows of structured data. The design and technology selection under this premise are very different from MongoDB. In the case of large amounts of data, TiDB performs more stably and expands more smoothly. TiDB's SQL optimizer is a query optimizer designed for distributed storage that was implemented from scratch by Huang Dongxu and his team. It uses many new query optimization technologies in academia and ideas from distributed computing frameworks. It performs much better than MySQL under complex queries while ensuring MySQL compatibility. Solving the pain points of traditional databases Any enterprise that uses a traditional stand-alone relational database may face the problem of single point failure and single point capacity limitation when the amount of data continues to grow or when there are strict requirements on business availability. This problem has been particularly prominent in the Internet industry in recent years. At present, there are no other solutions except the sharding and middleware mentioned above, which is almost unbearable. TiDB uses the more advanced Raft algorithm to achieve horizontal expansion of the storage layer and adds distributed transactions. It builds a complete SQL query layer. It supports complex queries such as JOIN and subqueries without losing ACID transactions. It also exposes the MySQL interface to the outside world, allowing users to solve the storage problem of large amounts of structured data without intrusion. Considering that the generation gap between traditional industries and the Internet industry is about 3 years, and this time is constantly shortening, as TiDB has become more stable recently, more and more Internet users are using TiDB. I believe it will become a new mainstream choice for expanding databases in the future. Application Scenarios of TiDB The application scenario is a typical OLTP scenario, which covers a wide range and covers any enterprise. Those who encounter scalability problems on relational databases, need strong consistent transactions, and need to achieve strong consistency and high availability in multiple data centers are typical users of TiDB. TiDB has perfect support for MySQL. It is a very good choice for users or enterprises currently using MySQL who want to seek a more elegant horizontal expansion solution. In fact, most of the users currently using TiDB in online production environments are basically Internet scenarios, coming from MySQL. TiDB currently does not support stored procedures and views, so the prerequisite is that there are no such operations in the existing business. On the first day of the project, it was determined that TiDB is maximally compatible with MySQL. Huang Dongxu admitted that MySQL is a stand-alone database, and the query optimizer is designed for stand-alone scenarios. It is very difficult to build a distributed database based on this architecture. At this time, they decided to take a more thorough approach, which was to rewrite the entire SQL Parser and query optimization engine. Although it seemed almost impossible to accomplish, they actually felt that it was an easier path with a better design and complexity control. The benefits of choosing full MySQL compatibility are not limited to user-friendliness, but more importantly, it can absorb a large number of tests from the MySQL community. For a database product, it is not difficult to make it, but how to prove that you are right is more important! Huang Dongxu and his team continuously collected tens of millions of test cases from the MySQL community to ensure the correctness of each module and the consistency of MySQL behavior. The extent to which the TiDB project is open source The TiDB project is 100% open source and is committed to being a top open source project with international standards. It is actually difficult to tell from the Github repo itself that this is an open source project led by Chinese people. All submission records, all collaborations, roadmaps, issue tracking, Chinese and English documents, and code reviews are open source. The project has been iterated to Beta 4. According to the feedback from online users, the main functions have been basically perfected and stabilized. Huang Dongxu said that the next important work will be to continue to optimize performance and continue to improve stability, as well as to continue testing in a larger capacity and more severe cluster environment. Of course, peripheral tools, deployment tutorials, and more design documents are also being continuously enriched. The Future of TiDB From a longer-term perspective, everything will run in the cloud, including databases. Under the premise of massive data and large-scale clusters, there is still a lot to explore in the design and theory of relational databases. Under this cluster scale, all operations and maintenance that rely on manual labor will fail, because people cannot scale. Databases need to have the ability to self-repair and self-expand. Only in this way can the computing resources of the cluster be better utilized. This is why the TiDB team positions itself as a Cloud-Native database. They are doing a lot of basic research and preparation for the future, including a lot of exploratory work on the combination of Kubernetes and distributed databases. Huang Dongxu hopes that TiDB will define the next generation of relational databases, so that developers in the future can truly focus on their own business and no longer need to worry about how big the database is, how high the concurrency may be, when it needs to be expanded, which sharding key to choose, etc. These issues should all be hidden under a very simple SQL interface. TiDB has had a very good start. They have done it. In the next generation of relational databases, everyone can feel the wonderful productivity brought by this technology! Open source project address: https://github.com/pingcap/tidb PS: Huang Dongxu will attend the WOT2016 Big Data Technology Summit on November 26th, and will share the content of "NewSQL in action: Patterns and Tools" in the NoSQL practical technology session. Please stay tuned. WOT2016 Big Data Technology Summit official website: http://wot..com/ [51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites] |
<<: How do IT staff manage user application experience in a complex environment?
>>: What is the prospect of mobile office APP software development?
I saw a question: It said that someone invested i...
The Internet has never been short of concepts. Fo...
How to place ads on Toutiao? Advertising Process ...
Author|Ctrip's front-end framework team prov...
2021 Team Value Investment Director Wu Muyang and...
Growing up is often painful, but learning smartly...
A good marketing strategy is to minimize the conv...
The user's first impression of a website is t...
Kuaishou - "Record the world, record you&quo...
Trump signed an executive order: What does it mea...
The rise of mobile terminals has ended the 40-yea...
At present, except for Hubei Province, the epidem...
I believe that every SEMer works hard to optimize...
The reason why Tik Tok is so popular is definitel...
Where there is a world, there are rumors, and whe...