There are many AI models. If we focus on one, will it be easier to succeed?

There are many AI models. If we focus on one, will it be easier to succeed?

In early 2023, Chat GPT seemed to have set off a wave of AI big models. Since February, many domestic companies have also launched their own big models.

We are familiar with Baidu's Wenxin Yiyan, Alibaba's Tongyi Qianwen, Huawei's Pangu, Tencent's Hunyuan, etc. In addition, many technology companies and schools are making their own large models. For example, Beijing Zhiyuan Research Institute's Wudao 2.0, the Institute of Automation of the Chinese Academy of Sciences' Zidong Taichu, Fudan University's MOSS, Tsinghua University's ChatGLM.

The names of so many big models make people feel dizzy. If all these institutions focus on one model, wouldn’t it be easier to succeed?

Image source: unsplash.com

There is some truth in this thought. But looking at the history of AI development, the joint exploration of multiple technologies and multiple routes is also one of the reasons why AI can develop rapidly and break through the "winter" time and time again.

To understand the benefits of technological diversity for the development of artificial intelligence, we must first start with symbolism and connectionism.

Making computers smarter

At the Dartmouth Conference in 1956, artificial intelligence was established as an emerging discipline. Generations of scientists have devoted themselves to the research of making computers "smarter". But "becoming smarter" is too abstract. How can it be achieved specifically?

In fact, people had already begun to explore this topic long before the Dartmouth Conference. In this process, several different routes emerged, among which "symbolism" and "connectionism" are two important and representative routes.

The idea of ​​symbolism is that human intelligence is mainly reflected in high-level reasoning: "The basis of intelligence is knowledge, and the core of intelligence is the representation and reasoning of knowledge." Knowledge and logical reasoning can be transformed into mathematical symbols, and their reasoning and calculation processes can also be expressed using mathematical formula symbols (for this reason, this school of thought is called "symbolism").

The most typical technology of symbolism is "expert system". The idea behind the expert system is to express the knowledge and logic of a certain field in the form of symbols, and organize them into a knowledge base and inference engine. According to the input conditions, the expert system can use the knowledge in the knowledge base and the established logical reasoning method in the inference engine to reason step by step to get the correct result that is not in the knowledge base.

In 1955, a program called "Logic Theorist" appeared. This program can deduce 38 of the 52 theorems in "Principia Mathematica" and also provides more concise proof methods for some theorems.

This program is even called the "first artificial intelligence program" by some people. For a long time since the emergence of artificial intelligence, symbolism has played an important role in the field of artificial intelligence. Even the word "artificial intelligence" is a new name given by scientists mainly from the symbolic school.

Of course, while symbolism has been developing greatly, other explorations to give computers "intelligence" have not stopped, such as connectionism, which has also been developing simultaneously.

Connectionism and Artificial Neural Networks

Connectionism is somewhat similar to bionics, which studies and imitates the structure of the human brain from the bottom up to explain human intellectual behavior.

Connectionism believes that the activities between microscopic ganglia eventually give rise to macroscopic cognitive intelligence. This idea is also more in line with the bottom-up methodology in other scientific fields: for example, using the underlying physical knowledge to explain the principles of high-level chemical reactions. Its development has given birth to one of the most important technologies in today's artificial intelligence field - artificial neural networks.

Image source: unsplash.com

In 1943, Warren McCulloch and Walter Pitts proposed a mathematical model similar to biological neurons, the MP neuron model. This model actually imitates neuron cells, processes input signals, and gives specific information.

In 1949, neuropsychologist Hebb discovered that the signal strength transmitted between neurons in the human brain is not fixed but "plastic". This theory, later known as the "Hebb rule", played an important role in the development of artificial neural networks.

Based on the MP neuron model and Hebb's rule, Rosenblatt established the perceptron model in 1958, which is known as the earliest artificial neural network with "learning ability". The U.S. Navy has high hopes for this neural network, and has invested heavily in building hardware machines, hoping that it can become a new generation of neural computers. This project is an important project for the U.S. Navy.

However, due to the limitations of computing power and technology at the time, people soon discovered that the functions of the perceptron were too limited and could only solve very simple linear classification problems.

MIT's Marvin Minsky and Seymour Papert (also the earliest advocates of children's programming) wrote a book publicly stating that "artificial neural networks are very limited in use and cannot even solve simple 'XOR' problems."

In the late 1960s, research on artificial neural networks hit a low point, and almost at the same time, investors began to realize that the much-anticipated “AI explosion” had not arrived.

For example, in 1958, some scientists believed that within 10 years, we would be able to make a computer become a chess champion (in fact, this was not achieved until 1997, nearly 30 years later than expected). In 1970, some scientists believed that "within 3 to 8 years, we will have a robot with the intelligence of an ordinary person." But this is obviously impossible, and to this day we have not been able to build such a machine.

These "bright futures" did not come true, causing the government and investors to significantly cut research and development funding, and artificial intelligence ushered in its first cold winter.

AI Winter

Fortunately, there are many technical routes in the field of artificial intelligence. In the cold winter, the development of connectionist artificial neural networks was difficult, while symbolic expert systems were quietly rising.

In 1972, an expert system called MYCIN appeared, which could infer appropriate treatment plans based on the patient's symptoms.

Image source: unsplash.com

For example, MYCIN records the symptoms and causes of various internal medicine diseases, as well as what kind of drugs are suitable for each disease and which drugs react with each other. If a person has diarrhea, as long as the corresponding symptoms (such as body temperature, blood routine data, duration, etc.) are entered, MYCIN can infer the disease he has and prescribe the appropriate medicine.

The "acceptability score" of the treatment plan given by MYCIN is almost the same as that of human experts (MYCIN is 65%, and the five human experts are 42.5%~62.5%).

In addition to MYCIN, another expert system called XCON helped DEC save tens of millions of dollars each year (XCON can be understood as a professional order processing system). Seeing the real economic benefits of expert systems, other companies also began to follow suit in the 1980s and established their own expert systems to save costs.

However, with the popularization of expert systems, its drawbacks have gradually become apparent. For example, the knowledge in the expert system knowledge base will not be updated automatically, and the maintenance cost of the expert system is very high.

Expert systems soon reached a deadlock, and at this time, connectionist artificial neural networks ushered in their own "Renaissance".

In the 1970s and 1980s, scientists discovered the importance of the "back propagation algorithm". In 1982, Paul Webbs applied the back propagation algorithm to the multi-layer perceptron, which was very important for the development of artificial neural networks. Today's artificial neural networks are almost inseparable from the back propagation algorithm.

It can be seen that, whether in the AI ​​winter or the renaissance period, the research on symbolism and connectionism has continued, which has provided conditions for AI technology to make leaps and breakthroughs. Without these diversified research as a foundation, AI research may be stuck in a certain path and difficult to move forward.

Of course, in addition to artificial intelligence technology itself, breakthroughs in other industries will also promote the development of artificial intelligence. For example, after the 1990s, chip technology developed rapidly and computer computing power increased rapidly, which is also crucial to the development of artificial intelligence.

For example, before the 1990s, even with the back-propagation algorithm, it was very difficult to train deep neural networks with many layers (more than 5 layers), so artificial neural networks were once replaced by support vector machines. Around 2000, the emergence of GPUs greatly increased the training speed of artificial neural networks (mainly deep neural networks), and the popularization of the Internet brought a large amount of data for AI learning, and deep learning technology began to emerge.

BERT and GPT

Today, with the rapid development of artificial intelligence technology, diversified research can still bring unexpected breakthroughs. For example, ChatGPT, which we are familiar with today, is the beneficiary of multiple research going hand in hand.

In 2017, scientists at Google Brain published a paper titled "Attention Is All You Needed" and proposed the Transformer model.

Simply put, Transformer is a model that enables computers to better "understand" human language. It introduces the "attention" and "self-attention" mechanisms, which are similar to when we read a book, we focus on the more difficult to understand fragments and words, and integrate the meaning of the context to understand these fragments and words.

People continued to develop a variety of large models based on Transformer, and Google launched the BERT model in 2018. In the same year, Open AI also launched the GPT model. The two models have many similarities, but also some differences.

Simply put, BERT is better at understanding the meaning of words in text, while GPT is better at generating text.

BERT can understand the meaning of a word from the front and back sides, a bit like a fill-in-the-blank question in an exam. For example, "My pet is a barking (), and it loves to chew bones." BERT is very good at judging from the front and back of the space that the word here is most likely "dog."

GPT is unidirectional, analyzing from left to right like we read, and predicting the next word. For example, in the sentence "My pet is a barking dog, it loves ()", GPT can complete the following content based on the previous information.

After the emergence of BERT, it has made great achievements in natural language processing because of its excellent performance in semantic understanding. From 2018 to 2020, the GPT model did not receive as much attention as it does today, but its research did not stop there.

In 2019 and 2020, Open AI launched GPT 2.0 and GPT 3.0. In GPT 3.0, the parameters of GPT reached 175 billion, and the training samples exceeded 40TB. GPT 3.0 showed stronger understanding and generation capabilities than the previous GPT model.

In GPT3.5, human-labeled training methods were added, and the performance was further improved. After the emergence of Chat GPT, more and more people learned about the GPT technology, which once again pushed artificial intelligence to the center of the stage of human technological development.

Every study deserves attention

From this, we can see that in the entire process of AI development, diversified research and development have brought more possibilities to AI technology. For example, from the 1960s to the 1990s, expert systems, artificial neural networks, and support vector machines developed simultaneously. When one technology got into trouble, other technologies would rise.

This is true for the entire field of artificial intelligence, and it is also true if we focus on areas such as large models. In the field of natural language processing, scientists did not ignore GPT because of the outstanding effect of BERT. This gave Chat GPT the opportunity to become well-known in 2023.

In addition to GPT and BERT, there are many other models under research and development in the field of large models. Some of the technologies and achievements in these models may bring disruptive changes to natural language processing and even the AI ​​industry in the future.

Therefore, back to the original question, if all companies and institutions focus their efforts and resources on training a model, there is indeed a chance to create a super large model. However, in this process, some valuable "technological diversity" may be lost. The commercial considerations of various companies may also objectively promote the diversified development of AI.

References

[1] Encyclopedia of China

https://www.zgbk.com/ecph/words?SiteID=1&ID=216644&SubID=81535

[2] Stanford Encyclopedia of Philosophy

https://plato.stanford.edu/archives/fall2018/entries/connectionism/#DesNeuNet

[3]MCCULLOCH WS, PITTS W. A logical calculus of the ideas immanent in nervous activity[J].Bulletin of Mathematical Biophysics, 1943, 5: 115-133.

[4]HEBB DO The Organization of Behavior: A Neuropsychological Theory[M]. Lawrence Erlbaum Associates, New Jersey, 1949.

[5]ROSENBLATT F. The perceptron: Probabilistic model for information storage and organization in the brain[J].Psychological Review, 1958, 65(6): 386-408.

[6]Simon & Newell 1958, p. 7−8 quoted in Crevier 1993, p. 108.

[7]Yu VL, Fagan LM, Wraith SM, Clancey WJ, Scott AC, Hannigan J, Blum RL, Buchanan BG, Cohen SN. Antimicrobial selection by a computer. A blinded evaluation by infectious diseases experts. JAMA. 1979 Sep 21;242(12):1279-82. PMID: 480542.

[8]Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.

Planning and production

Author: Qin Zengchang, Professor of Beijing University of Aeronautics and Astronautics

Tian Dawei, popular science author

Audit丨Yu Yang, Head of Tencent Security Xuanwu Lab

Planning丨Xu Lai Cui Yinghao

Editor: Yinuo

<<:  Harvest in China | Satellite view of autumn harvest: good harvest everywhere

>>:  A Chinese writer won the Hugo Award again. Who is the painter of "The Painter of Space and Time"?

Recommend

Microsoft CEO: If OEMs don't make Lumia phones, we'll do it ourselves

[[140618]] Since the market share has not increas...

12 tips to thoroughly explain Douyin operation and promotion

In this article, the author will start with the r...

How much does it cost to join a lottery app in Fuyang?

How much does it cost to join a lottery app in Fu...

The industry marketing strategy that took 5 years to compile is given to you!

The home improvement industry is at a turning poi...

Why does China need its own space station?

At 14:22 on July 24, 2022, the first experimental...

Is the Coocaa art TV that sells for 8,000 yuan really just a passing fad?

Art is a general term for talent and technology, ...

“Zero-base” facial expression recognition

1. Introduction In recent years, artificial intel...

There is no vacuum in the universe, so what exactly is vacuum?

The word "vacuum" is not unfamiliar to ...