Editor’s Note: This article mainly analyzes the performance of the large language model in answering questions, especially Chinese questions, through test answering. Never try to challenge the test discipline, and take every step of your life on your own to be solid and powerful. As mentioned in the article, "Friends, don't give up learning and hope to use AI in everything in the future. Keep learning, and your smart brain will bring you the greatest surprises and rewards!" (Image source: Screenshot of the author’s conversation with AI) The above is an AI’s blessing to the students taking the college entrance examination in 2023. Do you feel its love and expectations for you? The fields and abilities that the college entrance examination tests are very comprehensive, and most people have comparative shortcomings. The author was unable to get a high score in the past because he lacked "resonance" with the test setters of the modern Chinese reading section of the college entrance examination. Recently, I, who is engaged in brain science research, had an idea: If a powerful artificial intelligence (AI) large language model (LLM) like GPT-4 were asked to answer Chinese college entrance examination questions, how would it perform? A flourishing dream university (Image source: Image generation artificial intelligence model Midjourney) Part 1 Why is pressure put on large language models? Why do large language models have strong problem-solving capabilities? Why don’t other language models developed in natural language processing (NLP) have this capability? One way of saying it is that large models have emergent ability, which means that during the training process, a model automatically learns some advanced and complex functions or behaviors that are not directly encoded or specified. Emergent ability is the most important core technology for recent breakthroughs in AI. It enables large models to perform better when dealing with new and unknown tasks because it can adaptively learn new functions or behaviors without retraining or modifying the model. Part 2 Why are humans so smart and adaptable? **There is a hypothesis called emergence,** which states that once the number of neurons in the brain exceeds a certain number, the brain's various functions, including logical thinking ability, will be able to rise to a higher level. This is the best example of quantitative change leading to qualitative change. Therefore, when the number of parameters for training the large language model and the text data fed to it continue to grow, the AI will "enlighten" one day, and its language ability will have an explosive leap from then on. So now, if you don't carefully distinguish, the essays written by AI are indistinguishable from those written by ordinary high school students. The emergence of large models (Image source: Reference [1]) After emergence, the large language model has a multimodal thinking chain and can build a high-dimensional intrinsic representation of language and meaning, thereby completing the final output through natural language reasoning in the intermediate steps. To put it simply, it can make simple inferences. Just looking at the blessing from GPT-4 at the beginning, it is actually difficult to tell whether it was written by AI or humans. Although it does not yet have real consciousness or thinking ability, it does use language similar to human thinking and reasoning processes to connect the context. GPT-4, like the previously popular ChatGPT, is a large language model based on the Generative Pre-trained Transformer (GPT) architecture. If a multi-step problem is decomposed into intermediate steps that can be solved separately, the expressive reasoning ability of the large language model will be further improved. The emergence of large-scale model thinking chain capabilities (Image source: Reference [2]) Okay, we have laid out the excellent features of so many large language models. Now it’s time to take them out for a walk. Then we will use GPT-4 to replace the large language model and see if it can redeem the author in the college entrance examination Chinese language test! Go ahead, GPT-4, and start your journey to become an AI test-taker! (Image source: "Kamen Rider Build") Part 3 Start answering questions! This article will let AI do all the 2022 national college entrance examination Chinese papers in various provinces and cities, a total of 8 sets, namely National Paper A, National Paper B, New College Entrance Examination Paper I, New College Entrance Examination Paper II, Beijing Paper, Tianjin Paper, Zhejiang Paper and Shanghai Paper, and then calculate its final score. (Because the text materials used by OpenAI to train the large language model are all before September 2021, the 2022 test papers are brand new and unopened for it.) (Image source: Screenshot of the author’s conversation with AI) The author is from Zhejiang, so I will take the Zhejiang paper as an example. The first question is about language application (20 points). The purple box below is the question and the gray box is the answer: Correct answer: C Correct answer: 2.B 3.B Correct answer: D Correct answer: ①. Because it is higher than life ②. In fact, it is full of philosophy ③. And the philosophy of life is appropriately exaggerated and dramatic Unfortunately, the first four questions were multiple-choice questions, and it only answered one correctly. After only completing 4 questions, we have to declare that it has lost the possibility of achieving high scores. AI is not very good at questions like typos, pinyin judgment, the use of words and punctuation, and the identification of incorrect sentences, which shows that its basic Chinese skills are not very solid! However, it did a good job in writing the appropriate sentence in question 5, which was basically consistent with the meaning of the answer. In addition, it was able to answer the definition and brief description questions even without the pictures it needed. It can be seen that it is good at the connection between the context and the summary of the overall central meaning, but not very particular about the details. In other words, AI has some language literacy, but not much. According to the Zhejiang paper scoring rules, 12 points will be deducted for the first question, and the score is: 8/20. **The next big question is modern Chinese reading (30 points). **After entering the original text and the question, the AI's answer is as follows: Correct answer: 7.A 8.A 9. ① Scholars: Their interest shifted from official career to food, which promoted the development of food. ② Technology: Chinese food has a long history, and food technology was greatly developed during the Ming and Qing dynasties. ③ Theory: Long-term practical experience developed into a systematic theory. Reference answer points: 10. ① Emphasis. ② Contrast. 11. ① Honest, loyal and filial. ② Bearing humiliation. ③ Active and progressive. ④ Performing duties conscientiously. 12. ① Give up small love for big love. ② Give up personal gain for justice. 13. ① Write about the urgency of the honest mother's yearning for a better life. ② Shape the character of the honest mother who is willing to endure desolation and devote herself wholeheartedly. Sadly, all the multiple-choice questions in the modern Chinese reading section were wrong, and the short-answer questions were not summarized from the original text. If graded according to the standard answers, it would only get 1 point out of 10 for the short reading comprehension. It can also be seen from the large reading comprehension that AI does not have any answering skills at all. For example, when asked about artistic techniques, the correct answers are "emphasis" and "setting off". AI worked hard to answer a lot of questions but did not hit the point, so it could only get 0 points. In the character part, the answer were sense of responsibility and selflessness. It can only be said that the AI had a certain understanding of the most superficial content of the original text, but lacked a deep understanding. Therefore, the evaluation and artistic effect were completely wrong. It can be said that AI was somewhat helpless in understanding longer modern texts. It seems that AI can only analyze what is reflected in the text itself, but cannot deeply understand the connotation that the author wants to express. Referring to the standard answer, the overall score for this question is: 4/30. The next third question is reading ancient poetry and prose (40 points). Guess what the answer will be? (Image source: 2022 Zhejiang College Entrance Examination Chinese Classical Chinese Section) Correct answer: 14.C 15.B 16.D Correct answer: 17. AI’s judgment is completely correct. 18. (1) Then (people) will think that I am a cruel person and that I am stingy (in awarding) titles and salaries. (2) Knowing that (the above) situations make sense to give (rewards for loyalty and honesty) to the people but not giving them, this is also intentional harm to the people. How about it? Didn’t you expect that AI’s classical Chinese is actually pretty good! It only got 1 out of 3 multiple-choice questions wrong, and all the punctuation was correct! However, there were many problems in the translation of classical Chinese in the last question. For example, "忍" and "爱" in the text should mean "cruel" and "stingy" respectively, but AI translated them into "endure" and "love", which is obviously a bit literal. In the end, the score for classical Chinese was: 13/20. Correct answer: 19. ①. Qinzheng Building ②. Qianqiu Festival 20. Emotionally, Wang’s poems express the nostalgia for the prosperous past, while Du’s poems express the sorrow of the past prosperity and the present decline; in terms of writing style, Wang’s poems use detailed descriptions, while Du’s poems use personification. Fill-in-the-blank questions are AI’s strong point, and it basically answered all of them correctly, even for ancient poetry. However, its understanding of the emotions and writing style of ancient poetry and its answering skills are still a bit lacking, score: 5/8. Correct answer: Omitted The answer to the third question about classical Chinese comprehension was also good, with only a few small differences from the standard answer. Score: 4/6. For the ancient poetry dictation, you only need to choose 3 out of 5. GPT (1), (2), and (4) ancient poetry sentences are completely correct, so they can be considered correct. Score: 6/6. However, "The tide is flat and the two banks are wide, there is no wind but it remains the same" is too "creative". Not only did he make up the ancient poetry himself, but he also mixed Chinese and English... Final score for the ancient poetry reading section: 28/40. **The last part is the composition, with a total score of 60 points.** The topic is as follows: (Image source: 2022 Zhejiang College Entrance Examination Chinese Composition Section) The 2022 essay material is quite down-to-earth, with very specific content and examples. AI is good at discussing the issue at hand. Let's take a look at AI's 800-word essay: (Image source: Screenshot of the author’s conversation with AI) After reading the whole article, I feel that there are too many repeated words and sentences, and the frequency of content in the quoted materials is very high. However, the logic and sentences are still smooth. Overall, it can barely give a passing score of 36 points. In this way, AI's final score was 8+4+28+36=76 points, with a full score of 150 in the Zhejiang Chinese paper. **FAIL! **GPT can only smile and type “GG”… So, if it fails the Zhejiang test, how will it perform when it tries other college entrance examination Chinese papers? Following the author's strict marking standards and giving only passing marks to the final essays, the final scores of other college entrance examination Chinese papers are summarized in the following figure: (Image source: author) A total of 8 test papers were taken, and the failure rate was as high as 87.5%... Friends, please do not give up learning and hope to use AI for everything in the future. In fact, the current large language model artificial intelligence is far inferior to your "understanding" of text. It is only good at "memory" and "content summary". Keep learning, your smart brain will bring you the greatest surprises and rewards! Part 4 Why did AI perform poorly in Chinese? How were its other subjects? In the process of marking the papers, the author found that GPT basically got all the words right, such as sentence segmentation in classical Chinese and filling in the blanks based on the context. However, when it came to the emotions and expressions of details and writing skills in modern Chinese reading and stories, it was difficult for AI to score well. Moreover, the more modern words there were, the lower its score was in this big question, which shows that it was difficult for it to grasp the key points. Why is this happening? Because the basic architecture of the GPT series itself, Transformer, is not good at processing long sequences, although OpenAI experts used sparse Transformer to improve the processing of long texts and reduce computational complexity, it still cannot focus on the key points of modern texts. Especially for prose, sparse processing means that it skips two or three paragraphs after reading one paragraph, and swallows the whole article without any effort, and may not even be able to summarize what the main storyline is about, let alone understand the author's deep meaning in the text. The reason why classical Chinese answers better than modern Chinese is that it is shorter, which effectively avoids the disadvantage of Transformer that it is not good at processing long sequences. In addition, one word in classical Chinese is usually equivalent to two or three words in vernacular Chinese, so it is more information-rich. This allows AI to maintain its attention mechanism on key points throughout the article, thereby having a better understanding of the overall content. In short, AI has not undergone systematic Chinese language learning, does not understand test answering skills, lacks a detailed grasp of Chinese pinyin and grammar, and does not have a deep understanding of the emotions and spiritual connotations that authors want to express in modern texts and ancient poems. Some people may be curious about what would happen if GTP-4 were to take on other subjects in the college entrance examination. My test results were: English was the best (after all, it is its native language); for math and physics, it was OK for simple questions, but when the questions were long, it started to make up nonsense and the scores were quite low; the results for chemistry, biology and liberal arts were average, not much different from Chinese. Part.5 Relax and have good luck in the exam This year's college entrance examination Chinese language test has come to an end. I sincerely wish all candidates can give full play to their abilities and be admitted to their ideal university! As a senior who has taken the college entrance examination, I have something to say to you: the college entrance examination is just a stage summary of life, and the score cannot be equated with future success or failure. Life is a long-distance race, and improving your cognition, broadening your horizons, grasping the direction of the times, making the right choices, and making continuous efforts are the most important things. Finally, I wish you all good luck in your exams! Win the college entrance examination! (Image source: Image generation artificial intelligence model Midjourney) References: [1] Jason Wei, Yi Tay, et al. Emergent Abilities of Large Language Models. arXiv:2206.07682.(2022) [2] Jason Wei Xuezhi Wang, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903v6.(2023) [3]Sébastien Bubeck, Varun Chandrasekaran, et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv:2303.12712. (2023) Produced by: Science Popularization China Author: Qian Yu (Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences) Producer: China Science Expo This article only represents the author's views and does not represent the position of China Science Expo This article was first published in China Science Expo (kepubolan) Please indicate the source of the public account when reprinting Please indicate the source of the reprint. Reprinting without authorization is prohibited. For reprint authorization, cooperation, and submission matters, please contact [email protected] |
>>: World Oceans Day | Can plastic be eaten? This marine microorganism has a big appetite
The following article is from China Science Daily...
Reflection: Is “battery health” meaningless? When...
WeChat Mini Program is an application that users ...
Imagine if there was a way to keep your backyard ...
The same applies to SEM bidding. Don’t use dilige...
These two days Many people's lives have been ...
This article only discusses accounts with landing...
Today is the first day of the Spring Festival tra...
Android studio not only allows you to create modu...
Before each generation of iPhone is released, the...
People usually believe that it is difficult to ru...
Android App development only needs to comply with...
Everyone loves beauty, and more and more people a...
Recently, a new survey of 100,000 Germans conduct...
Recently, the incident of a student from Shanxi U...