Failed Mathematics! It turns out AI is also a student with a weak academic performance...

It's the time for the college entrance examination results to be announced again. When tens of millions of candidates and parents are filling in their applications with joy or nervousness - there is a group of special "candidates" who have just finished an unprecedented college entrance examination journey.

In 2024, nine top AI models participated in this "special college entrance examination". They came from well-known companies such as OpenAI, Baidu, Alibaba, Tencent, ByteDance, as well as emerging companies such as Baichuan Intelligence, Zhipu AI, Dark Side of the Moon Technology and MiniMax.

Artificial Intelligence Challenges College Entrance Examination

So, how will the AI big model perform in the face of the touchstone of human intelligence, the college entrance examination? Will it be easy to get into Peking University or Tsinghua University, or will it be difficult to get into a junior college?

The test results show that the big model performed particularly well in the field of liberal arts, and some even far exceeded the first-tier line; however, the performance in the field of science was not satisfactory, and the scores of mathematics and comprehensive science subjects were generally low, reflecting the challenges of the big model in solving complex mathematical problems and understanding physical and chemical concepts.

Take the college entrance examination and find out how many points the AI model can get.

Let's take a look at some interesting details of this exam.

This exam uses the extremely difficult 2024 New Curriculum Standard Volume I, which is also the full set of test questions used in Henan Province, a province with a large number of college entrance examination candidates. The applicable scope covers many provinces such as Zhejiang, Jiangsu, Shandong, Guangdong, Hebei, Fujian, etc.

The scoring method of the exam is the same as that of human candidates. Whether it is multiple-choice questions, fill-in-the-blank questions, multiple-choice questions, or essay questions, they are all strictly judged according to the college entrance examination standards. For example, for multiple-choice questions and fill-in-the-blank questions, only the final result is considered, and the accuracy of the model's problem-solving process is not considered; for multiple-choice questions, if the wrong answer is submitted, zero points will be given, and if part of the correct answer is submitted, the points will be given according to the corresponding proportion; for essay questions, the test team refers to the standard answer and calculates the points according to the problem-solving steps.

Due to the randomness of the big model's answers, each big model answered twice and the average score was taken. Except for the English listening test, which was scored by default, the rest of the test paper was scored according to the standards of human candidates. The composition was scored by a backbone teacher with many years of experience in marking Chinese college entrance examination papers. After being a Chinese teacher for many years, this was the first time he scored an article written by Al. Interestingly, the composition topic of this test paper was also related to AI.

AI College Entrance Examination Transcript

After a fierce competition, OpenAI's ChatGPT (GPT-4o) stood out, with excellent scores of 562 in liberal arts and 469.5 in science, becoming the "top scorer" in this AI college entrance examination. According to Henan's college entrance examination score line, GPT-4o's liberal arts score can easily exceed the first-tier score by 41 points. In Henan, a province with a large number of college entrance examinations, it ranks 8811, equivalent to the top 2.45% of human test takers. Doubao's liberal arts score of 542.5 points also steadily exceeded the first-tier score, followed by Wenxin 4.0 with 537.5 points, and Baixiaoying, who just hit the first-tier liberal arts score of 521 points.

Wenxin, who got the best score in science, scored 4.0, with a total score of only 478.5 points, ranking 202264, equivalent to the top 35.27%. Basically, all the big models' science scores are 70 to 80 points lower than the total score of liberal arts. But judging from the test results, the big model's current intelligence level is more than enough to find a science major in a second-tier university.

Does a “top student” also have troubles? AI is also a student with a bias towards one subject

In this unique AI college entrance examination, the performances of various models were different. In the field of liberal arts, they showed their talent for extensive knowledge and strong memory, especially GPT-4o, Byte Bean Bag, Wenxin 4.0 and Baichuan 4.0, which performed well in history and politics. GPT-4o scored 237 points in liberal arts, which is already in the upper middle level among the candidates.

English is the subject where the big models perform best. The average score of the nine big models is as high as 132 points (out of 150). Most of the big models can achieve close to full marks in objective questions and only lose a small amount of points in composition. This is also the subject where the big models perform closest.

In the Chinese language test, the objective test scores of the big model were still good, including the foreign test taker GPT-4o, who basically scored full marks. The difference was mainly reflected in the writing. 11 out of 18 articles exceeded 48 points, and the average score was around 46.8 points. Wenxin 4.0 scored 48 points, while Doubao scored the highest, 52 points.

The overall evaluation of the essays written by the big models by the examiners is that their writing ability has exceeded the average level of students. The big models have different styles: Wenxin 4.0 can easily quote famous quotes, just like a student who reads a lot; Doubao's discussion of the topic is profound, reflecting better logical ability... But they also have flaws: they are not profound, rich, literary, and creative, especially the ending expression is not sublimated enough, and the routine is obvious.

This time, the performance of the big model in the math test overturned people's impression that "math has always been the strong point of computers." Because among all the 9 big models participating in the test, the average score was only 47 points, and GPT-4o scored 70 points in the college entrance examination math paper, which means that the big model that performed best in this exam still failed the math test and didn't even get half the score. In addition to GPT-4o, Wenxin 4.0 and Doubao were the only two models with an average score of more than 60 points, with scores of 62.5 and 61.5 respectively. The performance of the other 6 models was not satisfactory.

This result makes people wonder whether the big models are really inadequate in the field of mathematics. Through analysis, it was found that when solving mathematical problems, the big models seem to be able to only deal with those questions with relatively simple reasoning steps. For example, Doubao performs well in derivation and trigonometric function problems, and can skillfully use relevant formulas and theorems. However, once the problem becomes complicated and involves deeper derivation and proof, the performance of the big model is greatly reduced. What is even more surprising is that some big models even complicate simple problems in the process of solving problems, especially those models that have added code interpreters to PC products. They often fall into an infinite loop when solving problems, which undoubtedly affects their scores in math tests.

It has to be said that this special AI college entrance examination is not only a test of the capabilities of the big model, but also an exploration of the potential of artificial intelligence in the field of education. The most intuitive conclusion is that humans have not failed miserably, and compared to a few years ago when AI could not solve the problems of elementary school students, now the big model can even get into the first-class books. This progress is undoubtedly a microcosm of the rapid development of science and technology.

<<: What is the sugar-acid ratio of the popular "fruit ring"? It tastes good but it also needs to be in the right amount

>>: Will the future of optoelectronics depend on it? This "super diode" is quite powerful!

Which market makes the most money? Tencent tells you!

How can the elderly, people with underlying diseases, and ordinary people protect themselves from the COVID-19 pandemic? Authoritative response!

Blog

As expected of the “elves” on the rock wall, the mountain goats’ climbing ability is truly amazing!

Blog

3 key words to teach you how to create a hit product!

Blog

In a few words: You have to be ruthless to lose weight, is the ketogenic diet effective? The pitfalls on the road to weight loss

Blog

VR products will change the world of vision

Blog

I'm tired of waiting and found out that iOS 9.2 beta4 is the official version

Blog

Recommend

Seven-part training camp: A compulsory course for short video traffic monetization, gaining huge traffic through adaptation

This course has a total of 134 complete episodes....

Failed Mathematics! It turns out AI is also a student with a weak academic performance...

Which market makes the most money? Tencent tells you!

Chinese pine: a symbol of Chinese character

Tencent Weishi launches video red envelopes, which may be a very special red envelope gameplay this year

Advertising creativity is highly homogenized. How can we break the deadlock?

How can the elderly, people with underlying diseases, and ordinary people protect themselves from the COVID-19 pandemic? Authoritative response!

As expected of the “elves” on the rock wall, the mountain goats’ climbing ability is truly amazing!

3 key words to teach you how to create a hit product!

In a few words: You have to be ruthless to lose weight, is the ketogenic diet effective? The pitfalls on the road to weight loss

VR products will change the world of vision

I'm tired of waiting and found out that iOS 9.2 beta4 is the official version

Recommend

Seven-part training camp: A compulsory course for short video traffic monetization, gaining huge traffic through adaptation

This lizard, named after the snake monster, has the ability to walk on water.

How to eat during Chinese New Year?

Eating the wrong oil may promote breast cancer? Which cooking oil should I choose?

5 different learning methods for different levels of operations!

Only 40% of households own a TV: Is the Indian home appliance market really an investment low point?

100 methods of creativity training: Creativity can be learned as long as you are not lazy!

How to leverage the college entrance examination for marketing?

Commercialization and monetization guide: How to improve the eCPM of information flow advertising?

The 9 most effective free fission customer acquisition, retention and monetization methods in 2020!

Which CDN is the fastest for server rental configuration?

Amazon launches new Fire TV to compete with Apple in the living room

How to attract traffic to Tik Tok? Practical tips from registration to popularity!

Why are brands rushing to collaborate across industries?

How do big WeChat self-media accounts make money?