Big data has penetrated into all walks of life. McKinsey said: "Data has penetrated into every industry and business function today and has become an important production factor. With the continuous increase in data volume and the continuous improvement in data storage and retrieval requirements, database technology has also been pushed to the forefront like big data. 51CTO interviewed Wang Tao, CTO of Sequoia Database, to explain to you the methods of data processing and technology selection in the era of big data. Reporter: Could you please introduce your previous work experience and the situation of Sequoia Database? Guest: I was working on the DB2 relational database at IBM at first, but in 2011 and 2012, the big data industry was booming, and we found that IBM DB2 database did not meet the future trend, so we developed a database engine in North America, namely NoSQL. Later, we brought it to China and commercialized it. In 2012, Sequoia Database was established, and the first version was launched in 2013. Soon we had the first customer, and later our customers spread across the government, finance, telecommunications and other industries. By 2014, we completed two rounds of financing, the Pre-A round and the A round. Reporter: You mentioned your support for the government, telecommunications, finance and other industries. What is the effect and situation of using this database for independent research? Guest: First of all, NoSQL is a stable database. We are not the first in the world. Hadoop abroad is very similar to us, and we are also compatible with Hadoop in many interfaces. MonggoDB has a very large market share overseas. Foreigners have previously conducted evaluations and found that Hadoop has great advantages in terms of functionality and performance in certain scenarios. One of the characteristics of MonggoDB is that it has many functions, but there are many impractical things. We have the advantage of being a latecomer - we can see the market demand clearly, then launch products, and then update them. At the same time, the biggest difference between us and MonggoDB is that we will pay more attention to the enterprise market in terms of SQL. Reporter: You mentioned Hadoop as a storage method just now. Various storage methods have their own advantages and disadvantages. What suggestions do you have for developers regarding the selection of technology for processing big data and combining it with Hadoop Spark? Guest: Oracle is no longer in the discussion, and everyone is talking about MySQL. Although many people are using MySQL, it is not very friendly to application development and operation and maintenance. Secondly, in terms of performance, when users make some large associations, data storms are likely to occur, with a lot of data being exchanged. This is very scary, and if not handled properly, very serious problems will occur. So some people proposed to use NoSQL, a new generation of data structure. The three most commonly used branches of NoSQL are KV, wide table, and document. KV has many uses, generally used as cache, Redis cache, etc. What I want to talk about are the two major categories of real data storage; one is the wide table category, and the other is the document category. The advantage of wide tables lies in column storage, but it is not column storage in the traditional sense. It is a bit like a column cluster. For example, if there are 10,000 fields in a record, it can be concentrated into ten copies, each with 1,000 fields. These 1,000 fields represent logically similar things. I can distribute each 1,000 fields independently on the machine, and when I need to search, I only need to take out part of them. However, people rarely use so many things. In my opinion, document-based databases are the closest to relational databases. Although Hadoop has rich functions, everyone assumes that it is a document-based database. Now many documents have a row store, and generally support random indexing. For example, we can index field A, and then index field B a few days later. In this way, Vaughn can do random searches on many fields, unlike wide tables that can only index on key value segments. For example, in the application scenario of telecommunications, if I look for the calling number and the called number, I can use a document-based database to establish an index search. Reporter: The amount of data in enterprises is getting bigger and bigger, and the requirements for database expansion are also very high. What are the advantages of SQL or Sequoia Database in this regard, or how to deal with the problem of expansion? Guest: When it comes to capacity expansion, the traditional DB2 is the most familiar to everyone. When I was at IBM, there was a customer who had 256 nodes and needed to add 64 nodes. IBM sent someone to do it for a month. Nowadays, various mechanisms are used in non-relational databases. When I need to insert a new functional node, I only need to move the minimum amount of data, and the rest is still stored stably, which basically allows easy expansion. Reporter: You mentioned relational databases just now. What is the relationship between SQL and relational databases, and how do they exist relative to traditional databases? Will they replace each other? Guest: I think the relationship between the two is neither coexistence nor replacement, but integration. After all, SQL has its own value and application scope, and SQL's ability to store data in a seamless manner is still very good. Therefore, SQL will not be eliminated, but will undergo a strong structural change. There is no reason to replace SQL with NoSQL in traditional financial business, because the SQL data structure is very rigorous. On the other hand, the delay in application development caused by this rigor has led to a lack of agility, which has undoubtedly exposed its disadvantages in Internet business. In fact, this is also the position of NoSQL. There will be a trend of mutual integration between the two in the future. NoSQL itself will not have an interface. I think the so-called unstructured storage or semi-structured storage is equivalent to a part of structured storage. In a sense, the use of unstructured storage can also meet many structured storage needs. When the upper layer is improved, SQL can be introduced. We have also seen that many places are trying to quote the concept of NoSQL. The two are becoming more and more similar, and they may merge one day. Reporter: What are the drawbacks and problems in supporting the processing methods of the traditional databases such as Hadoop and Spark? What are the advantages of NoSQL in supporting Hadoop and Spark? Guest: Hadoop is about elastic expansion and horizontal expansion. The biggest problem with traditional relational databases is that they are not easy to expand. Even ORACLE is completely impossible to expand. In this case, even if you expand the upper layer to 100 machines, there is no essential improvement in many things below. So no matter how you expand the upper layer, there is still a bottleneck below. NoSQL itself is distributed, and Hadoop and Spark are also distributed. The interface we developed, conect, allows Hadoop to access local NoSQL data in a local way, so the combination is very close. |
<<: Build hybrid mobile apps with Apache Cordova
>>: 60 Problem Solving Strategies for Programmers
520 is coming Your love's call Emm...it shoul...
It has been several months since TikTok announced...
In recent years, app development has become very ...
Many people around him did not expect that Jay Ch...
How to learn dubbing? Dubbing training video tuto...
Mr. Fengkou-Wudao Pavilion (small circle) resourc...
As a novice in game promotion , how can I get sta...
[[173336]] Apple is about to release a security u...
[[426215]] According to MacRumors on September 28...
Part 01 What is Web3? Before we learn about Web3,...
The increase in short video followers mainly come...
The production of Zhengzhou electrician applet is...
Editor’s Note: Google's social dream has been...
With the continuous development of the video indu...
Editor's note: This article was originally pu...