AI is advancing rapidly: Is China ready?

AI is advancing rapidly: Is China ready?

On May 30, at the 2023 Zhongguancun Forum Results Release Conference, the "Beijing Implementation Plan for Accelerating the Construction of an Artificial Intelligence Innovation Source with Global Influence (2023-2025)" was officially released. The "Implementation Plan" requires that innovation entities be supported to focus on breakthroughs in technologies such as distributed and efficient deep learning frameworks and new infrastructure for large models, and strive to promote innovation in large model-related technologies.

This is seen by the industry as another strong proof that China will vigorously promote the development of big models. In fact, recently, from central ministries to local provinces and cities, the policy inclination towards developing AI technology and seizing big model opportunities has continued to increase, and both the density of policy issuance and the overall strategic height have reached an astonishing level.

There is reason to believe that China will achieve a rapid advancement in AI with large models as the breakthrough point. Since launching the new generation of artificial intelligence development strategy in 2017, China will be further developed in the current window of opportunity and promote the full outbreak of the AI ​​industry.

We all know that seizing the development opportunities of AI requires technological breakthroughs and infrastructure construction. When it comes to the infrastructure of the AI ​​industry, people generally mention AI chips, deep learning frameworks, and pre-trained large models, but often overlook another key issue: large models will bring huge data pressure. Data storage is also a pillar in the development of AI.

ChatGPT is the trigger for this round of AI explosion, and the data problems brought about by the subsequent large-scale application of large models have actually been written into ChatGPT.

Facing this impending pressure, is China ready?

The data challenges brought by the rise of AI from ChatGPT

Since Google released BERT in 2018, the industry has embarked on the road of pre-training large models. The characteristics of large models are the huge scale of training data and model parameters, which will bring severe challenges to storage, which is also clearly demonstrated in ChaGPT.

The so-called "big" of the pre-trained large model is reflected in the model's deep learning network with many layers, many links, complex parameters, and more complex types of data sets used for training and a richer amount of data. When deep learning algorithms were first born, mainstream models had only a few million parameters, but when BERT was released, the model parameters exceeded 100 million, pushing deep learning to the large model stage. At the ChatGPT stage, mainstream models already have hundreds of billions of parameters, and the industry has even begun planning trillion models. In a few years, the parameters of AI models have increased thousands of times, and such huge data and models need to be stored, which has become the first major test for storage in the outbreak of AI.

In addition, it is widely mentioned that the AI ​​big model adopts a new model structure, so it will have better absorption effect and robustness for unstructured data, which is very important for the final effect of AI, but it also brings a derivative problem: we need to properly handle the storage and call of massive unstructured data. For example, ChatGPT has added multimodal capabilities such as image recognition after the upgrade, so its training data also needs to add a large number of pictures on the basis of text. For example, self-driving vehicles need to store a large number of field test videos every day as a basis for model training. These unstructured data have brought about a massive increase in AI-related data, as well as the difficulty of storing and processing these data.

According to statistics, 80% of the world's new data is unstructured data, with an annual compound growth rate of 38%. Coping with the surge in diversified data has become a difficulty that must be overcome in the era of big models.

Another problem is that large models often need to read and call data frequently. ChatGPT's data access usage reached 1.76 billion times a month, with an average response speed of less than 10 seconds. The workflow of the AI ​​model includes four parts: collection, preparation, training, and reasoning. Each stage requires reading and writing different types of data. Therefore, large models also bring requirements for storage performance.

In addition, a series of data sovereignty and data protection disputes surrounding ChatGPT also remind us that large AI models bring new risks in data security. Imagine if criminals attack the database, causing the large language model to generate false information to deceive users, the consequences of which are both serious and hidden.

Overall, ChatGPT is good, but it poses challenges to the scale, performance, and security of data storage. When we are committed to developing large models and ChatGPT-like applications, we have to overcome the storage barrier.

China is saving its strength. Is it ready?

In recent years, we have been saying that computing power is productivity. But computing requires storage, and the limit of storage power also determines the upper limit of the development of digital productivity.

So, in the inevitable rapid development of China's big model, is China's storage force ready? Unfortunately, from several aspects, China's storage force is still not fully prepared today and needs further upgrading and development. Let's pay attention to several problems of China's storage force and see if they correspond to the data pressure brought by the big model.

1. Insufficient storage capacity limits the development ceiling of the AI ​​industry

Large models will bring massive amounts of data, so the first priority is to properly store this data. But at the current stage, China still has insufficient storage capacity, and a large amount of data cannot even enter the storage stage. According to data from 2022, China's data production has reached an astonishing 8.1ZB, ranking second in the world. But China's storage capacity is only about 1000EB, which means that the data storage rate is only 12%, and most of the data cannot be effectively saved. When China has clearly defined the status of data as the fifth factor of production, intelligent development needs to rely on data and make full use of data, but there is a large amount of data that is difficult to save. The problem is not serious. China still needs to maintain high-speed and large-scale storage capacity growth in order to seize the AI ​​technology development opportunities brought by large models.

2. Low management and access efficiency under the impact of massive data

As discussed above, the main data challenge brought by AI big models is the low management efficiency and processing access efficiency of massive data. Improving access efficiency requires data to be stored and written in a high-efficiency and low-energy manner, but currently 75% of data in China is still using mechanical hard drives. Compared with flash drives, mechanical hard drives have low capacity density, slow data reading, high energy consumption, and poor reliability. Relatively speaking, all-flash memory has a series of advantages such as high density, low energy consumption, high performance, and high reliability, but China still has a long way to go in replacing all-flash memory.

3. Multiple data concerns lead to a severe storage security situation

Data security has become an urgent concern for AI companies and even the AI ​​industry. In 2020, a data security incident occurred at Clearview AI in the United States, resulting in the leakage of 3 billion data from more than 2,000 customers. This case shows us that the data security situation in the AI ​​industry is very serious, and we must pay attention to security from the data storage stage. Especially when the role of AI big models in national economy and people's livelihood is becoming more and more important, storage needs to improve security capabilities to cope with various possible risks.

Objectively speaking, China's storage capacity has maintained a relatively high growth rate, but it still has certain deficiencies in terms of overall scale, all-flash memory share, and technological innovation capabilities. A storage upgrade aimed at the needs of industrial intelligence and the large-scale implementation of AI is imminent.

Opportunities and directions for the storage industry in the intelligent era

Combining the pressure on storage brought by the AI ​​big model represented by ChatGPT and the development status of China's storage capacity itself, we can clearly draw a conclusion: China's storage must support the rise of AI and complete large-scale upgrades.

We can clearly see the development direction of the storage industry. The urgency and broad space of these directions constitute major opportunities for the storage industry.

First, we need to expand storage capacity and accelerate the construction of all-flash storage.

The "silicon in, magnet out" of all-flash replacing mechanical hard disks has been the overall development trend of the storage industry for many years. Faced with the industrial opportunities brought by the rise of AI, China's storage industry needs to accelerate the implementation and implementation of all-flash replacement and maximize the advantages of all-flash such as high performance and high reliability to meet the data storage and utilization needs brought by large AI models.

In addition, it is important to note that the opportunities for all-flash distributed storage are increasing. With the rise of large AI models and the explosion of unstructured data, the importance of data is increasing significantly. At the same time, AI has penetrated into the production core of large government and enterprises. More enterprise users tend to conduct localized AI training and use file-based data storage instead of putting data on public cloud platforms. This has led to an increase and strengthening of the demand for distributed storage.

The combination of the two, which continues to rapidly promote the full-flash implementation of the storage industry, has become the core track for the development of China's storage industry.

Secondly, it is necessary to enhance innovation in storage technology to adapt to the development needs of AI models.

As mentioned above, the data challenge brought by AI is not only the large data scale, but also the challenge of data complexity and application process diversity. Therefore, the advancement of storage must be further improved. For example, in order to cope with the frequent data access needs of AI, the storage read and write bandwidth and access efficiency need to be upgraded. In order to meet the data needs of AI large models, the storage industry needs to carry out all-round technical upgrades.

In terms of data storage formats, traditional data formats, such as "files" and "objects", were not originally designed to match the training requirements of AI models, and the data formats of unstructured data are not unified, which means that in the process of AI models calling data, a lot of work needs to be done to re-understand and align the file formats, which in turn causes the model's operating efficiency to decrease and the training computing power consumption to increase.

To this end, a new "data paradigm" needs to be formed on the storage side. Taking autonomous driving training as an example, different types of data are involved in the data training process. If a new data paradigm is adopted on the storage side, it can help unify various data and better adapt to AI model training, thereby accelerating the training of autonomous driving vehicles. For example, if you imagine AI as a new animal, it needs to eat a new feed. If you feed it data in a traditional format, it will have indigestion problems. The new data paradigm is to build data that is completely suitable for AI on the storage side, so that the process of "feeding AI" is smooth and smooth.

In AI development, data management accounts for a huge proportion of the workload, and there are data islands between different data sets. Data weaving technology can effectively address these problems. Through data weaving, storage can be built with data analysis capabilities to integrate data that is physically and logically scattered, forming a global visualization of data scheduling and flow capabilities, thereby effectively managing the massive data brought by AI and achieving improved data utilization efficiency.

These technological innovations on the storage side can create a closer fit between data storage and AI development.

In addition, security capabilities need to be incorporated into the storage itself to strengthen proactive security capabilities.

As AI becomes more valuable, data security issues are causing more losses to enterprise users. Therefore, enterprises must improve their data security capabilities. The most important point is to improve data resilience, so that storage itself has security capabilities and protects data security from the source. Next, more data resilience capabilities will be embedded in data storage products, such as ransomware detection, data encryption, security snapshots, and AirGap quarantine recovery features.

It is worth noting that the industry has already explored and tried to upgrade storage in response to the rise of large AI models. Huawei Storage has achieved a close fit between storage innovation and AI development through high-quality all-flash products, integrated advanced storage technology, and built-in security capabilities.

Overall, the development of the storage industry and the improvement of China's storage capacity are of great significance to the implementation of AI big models and even the intelligent upgrade of thousands of industries. Without the development of storage, the data flood brought by AI will be difficult to properly resolve, and AI technology may even become a source without roots and a tree without roots due to the lack of data support.

The opportunities and responsibilities of the intelligent era are placed before the storage industry at the same time. With the exploration of storage power by excellent brands such as Huawei, China's storage industry is welcoming unprecedented opportunities and also shouldering the responsibilities given by the times.

Many industry experts believe that the large language model is the "iPhone moment" in the history of AI. The storage capacity upgrade brought about by AI technology may also become a milestone moment for China's storage industry and the prelude to a golden age.

<<:  People who don't like to open windows are more likely to have brain atrophy? These 5 common habits are very harmful to the brain, and you do them every day

>>:  The myopia rate among children is getting higher and higher. How can we maintain the prevention and control line?

Recommend

Why is MediaTek, which was so powerful last year, losing ground now?

MediaTek's performance last year was impressi...

In the iPhone era, facial recognition technology is hard to ban

San Francisco supervisors voted to amend the law ...

A guide to traffic channels for Douyin live streaming rooms!

When it comes to Tik Tok live streaming , the mos...

I didn’t spend a penny on promotion and achieved 23 million app downloads!

The author of this article spent 6 hours to creat...

PR secrets: a comprehensive guide to advanced editing

PR Secret Techniques: A Comprehensive Guide to Ad...