It's the time for the college entrance examination results to be announced again. When tens of millions of candidates and parents are filling in their applications with joy or nervousness - there is a group of special "candidates" who have just finished an unprecedented college entrance examination journey. In 2024, nine top AI models participated in this "special college entrance examination". They came from well-known companies such as OpenAI, Baidu, Alibaba, Tencent, ByteDance, as well as emerging companies such as Baichuan Intelligence, Zhipu AI, Dark Side of the Moon Technology and MiniMax. Artificial Intelligence Challenges College Entrance Examination So, how will the AI big model perform in the face of the touchstone of human intelligence, the college entrance examination? Will it be easy to get into Peking University or Tsinghua University, or will it be difficult to get into a junior college? The test results show that the big model performed particularly well in the field of liberal arts, and some even far exceeded the first-tier line; however, the performance in the field of science was not satisfactory, and the scores of mathematics and comprehensive science subjects were generally low, reflecting the challenges of the big model in solving complex mathematical problems and understanding physical and chemical concepts. Take the college entrance examination and find out how many points the AI model can get. Let's take a look at some interesting details of this exam. This exam uses the extremely difficult 2024 New Curriculum Standard Volume I, which is also the full set of test questions used in Henan Province, a province with a large number of college entrance examination candidates. The applicable scope covers many provinces such as Zhejiang, Jiangsu, Shandong, Guangdong, Hebei, Fujian, etc. The scoring method of the exam is the same as that of human candidates. Whether it is multiple-choice questions, fill-in-the-blank questions, multiple-choice questions, or essay questions, they are all strictly judged according to the college entrance examination standards. For example, for multiple-choice questions and fill-in-the-blank questions, only the final result is considered, and the accuracy of the model's problem-solving process is not considered; for multiple-choice questions, if the wrong answer is submitted, zero points will be given, and if part of the correct answer is submitted, the points will be given according to the corresponding proportion; for essay questions, the test team refers to the standard answer and calculates the points according to the problem-solving steps. Due to the randomness of the big model's answers, each big model answered twice and the average score was taken. Except for the English listening test, which was scored by default, the rest of the test paper was scored according to the standards of human candidates. The composition was scored by a backbone teacher with many years of experience in marking Chinese college entrance examination papers. After being a Chinese teacher for many years, this was the first time he scored an article written by Al. Interestingly, the composition topic of this test paper was also related to AI. AI College Entrance Examination Transcript After a fierce competition, OpenAI's ChatGPT (GPT-4o) stood out, with excellent scores of 562 in liberal arts and 469.5 in science, becoming the "top scorer" in this AI college entrance examination. According to Henan's college entrance examination score line, GPT-4o's liberal arts score can easily exceed the first-tier score by 41 points. In Henan, a province with a large number of college entrance examinations, it ranks 8811, equivalent to the top 2.45% of human test takers. Doubao's liberal arts score of 542.5 points also steadily exceeded the first-tier score, followed by Wenxin 4.0 with 537.5 points, and Baixiaoying, who just hit the first-tier liberal arts score of 521 points. Wenxin, who got the best score in science, scored 4.0, with a total score of only 478.5 points, ranking 202264, equivalent to the top 35.27%. Basically, all the big models' science scores are 70 to 80 points lower than the total score of liberal arts. But judging from the test results, the big model's current intelligence level is more than enough to find a science major in a second-tier university. Does a “top student” also have troubles? AI is also a student with a bias towards one subject In this unique AI college entrance examination, the performances of various models were different. In the field of liberal arts, they showed their talent for extensive knowledge and strong memory, especially GPT-4o, Byte Bean Bag, Wenxin 4.0 and Baichuan 4.0, which performed well in history and politics. GPT-4o scored 237 points in liberal arts, which is already in the upper middle level among the candidates. English is the subject where the big models perform best. The average score of the nine big models is as high as 132 points (out of 150). Most of the big models can achieve close to full marks in objective questions and only lose a small amount of points in composition. This is also the subject where the big models perform closest. In the Chinese language test, the objective test scores of the big model were still good, including the foreign test taker GPT-4o, who basically scored full marks. The difference was mainly reflected in the writing. 11 out of 18 articles exceeded 48 points, and the average score was around 46.8 points. Wenxin 4.0 scored 48 points, while Doubao scored the highest, 52 points. The overall evaluation of the essays written by the big models by the examiners is that their writing ability has exceeded the average level of students. The big models have different styles: Wenxin 4.0 can easily quote famous quotes, just like a student who reads a lot; Doubao's discussion of the topic is profound, reflecting better logical ability... But they also have flaws: they are not profound, rich, literary, and creative, especially the ending expression is not sublimated enough, and the routine is obvious. This time, the performance of the big model in the math test overturned people's impression that "math has always been the strong point of computers." Because among all the 9 big models participating in the test, the average score was only 47 points, and GPT-4o scored 70 points in the college entrance examination math paper, which means that the big model that performed best in this exam still failed the math test and didn't even get half the score. In addition to GPT-4o, Wenxin 4.0 and Doubao were the only two models with an average score of more than 60 points, with scores of 62.5 and 61.5 respectively. The performance of the other 6 models was not satisfactory. This result makes people wonder whether the big models are really inadequate in the field of mathematics. Through analysis, it was found that when solving mathematical problems, the big models seem to be able to only deal with those questions with relatively simple reasoning steps. For example, Doubao performs well in derivation and trigonometric function problems, and can skillfully use relevant formulas and theorems. However, once the problem becomes complicated and involves deeper derivation and proof, the performance of the big model is greatly reduced. What is even more surprising is that some big models even complicate simple problems in the process of solving problems, especially those models that have added code interpreters to PC products. They often fall into an infinite loop when solving problems, which undoubtedly affects their scores in math tests. It has to be said that this special AI college entrance examination is not only a test of the capabilities of the big model, but also an exploration of the potential of artificial intelligence in the field of education. The most intuitive conclusion is that humans have not failed miserably, and compared to a few years ago when AI could not solve the problems of elementary school students, now the big model can even get into the first-class books. This progress is undoubtedly a microcosm of the rapid development of science and technology. |
>>: Will the future of optoelectronics depend on it? This "super diode" is quite powerful!
This article is reproduced from the WeChat public...
Loading long image... Source: Damei Science...
As we all know, colorful clothes are made through...
Mixed Knowledge Specially designed to cure confus...
WeChat's business strategy has always been co...
According to the latest news today, 56.com, which...
What is a Brand Zone? Baidu Brand Zone is an info...
In the 1990s, the movie "Jurassic Park"...
Apple showed us a brand new iPhone at its Cuperti...
In life, you must have encountered the following ...
What is your home internet speed? Gigabit Etherne...
Mung beans have a long history in my country, and...
Let's first look at some data: In 2014, the p...
As the saying goes, eyes are windows to the soul,...
Whether it is a WeChat avatar, a QQ avatar, or an...