"Surpassing" GPT-4 in all aspects, approaching human comprehension ability! Is the world's most powerful model really here?

Recently, Anthropic, a large model company known as "OpenAI's strongest competitor", launched its third-generation artificial intelligence (AI) model - the Claude 3 series of models, including Claude 3 Opus, Claude 3 Sonnet and Claude 3 Haiku .

Among them, Claude 3 Opu is the strongest version of the Claude 3 series model. It has close to human understanding capabilities and can deftly handle open prompts and complex tasks. According to official information, its performance exceeds GPT-4 in all aspects.

It is worth mentioning that the Claude 3 Series models have the same sophisticated visual capabilities as other leading models and can handle a variety of visual formats, including photos, charts, graphs and technical diagrams.

Anthropic said in its official X that the Claude 3 series of models "set new industry benchmarks in reasoning, mathematics, coding, multilingual understanding, and vision."

Claude 3 Opus and Claude 3 Sonnet are now directly accessible via the API. The API is now fully open, and developers can start using these models immediately.

In addition, Claude 3 Sonnet is available for free trial on the website (http://claude.ai) for users in some regions, while the use of Claude 3 Opus is only open to Claude Pro users.

Additionally, the Anthropic team says the Claude 3 Series models address the “unnecessary rejection” that was a common problem with previous models.

A new standard of intelligence

Evaluation results show that Claude 3 Opus outperforms similar products on most common evaluation benchmarks for AI systems, including undergraduate-level expert knowledge (MMLU), graduate-level expert reasoning (GPQA), basic mathematics (GSM8K), etc. It demonstrates near-human-level understanding and fluency on complex tasks, "leading the forefront of general intelligence."

All Claude 3 models have improved capabilities in analysis and prediction, nuanced content creation, code generation, and conversation in non-English languages such as Spanish, Japanese, and French.

Near-instant results

The Claude 3-series models support live customer chat, auto-completion, and data extraction tasks where responses are immediate and real-time.

Among them, Claude 3 Haiku is the fastest and most cost-effective model in the same category on the market. It can read an information and data-intensive research paper (about 10k tokens) on arXiv in three seconds, with accompanying charts and graphs.

Claude 3 Sonnet is 2x smarter than Claude 2 and Claude 2.1 in the vast majority of workloads. It excels in tasks that require fast responses, such as knowledge retrieval or sales automation. Claude 3 Opus is slower, similar to Claude 2 and Claude 2.1, but smarter.

Improved accuracy

Compared to Claude 2.1, Claude 3 Opus achieves two times higher accuracy (or correct answers) on challenging open-ended questions, while also reducing incorrect answers.

In addition to making responses more believable, the Claude 3 Series models will enable citations so that answers can be verified by pointing to a precise sentence in a reference.

200K context windows and near-perfect memory

Currently, the Claude 3 series models provide 200K context windows. However, all three models can accept inputs of more than 1 million tokens, and this service may be provided to specific customers who need enhanced processing power in the future. In addition, Claude 3 Opus achieves near-perfect recall and over 99% precision.

The Anthropic team said that to improve the security and transparency of the model, they will continue to develop methods such as Constitutional AI and fine-tune the model to mitigate privacy issues that may arise from the new model.

Although the Claude 3 series models have made progress in key indicators such as biological knowledge, network-related knowledge, and autonomy compared to previous models, they are still at AI Safety Level 2 (ASL-2) according to the Responsible Scaling Policy. The red team assessment results show that the Claude 3 series models currently have a very low probability of causing catastrophic risks.

Easier to use

Claude 3-series models are better at following complex, multi-step instructions. They are particularly good at following brand voice and response guidelines and developing customer-facing experiences that users can trust. Additionally, Claude 3-series models are better at generating popular structured outputs in formats like JSON, making them easier to guide use cases like natural language classification and sentiment analysis.

At the end of the official blog, the Anthropic team wrote:

“As we push the boundaries of AI capabilities, we are equally committed to ensuring our security safeguards keep pace with these leaps in performance. Our premise is that being at the forefront of AI development is the most effective way to guide it towards positive societal outcomes.”

Reference Links:

https://www.anthropic.com/news/claude-3-family

<<: This fatal sore throat was cured thanks to his “fighting poison with poison” a hundred years ago!

>>: World Glaucoma Day | Do you often turn off the lights and look at your phone before going to bed? Beware of this vision "thief"!

In-depth analysis! Understand the massive amount of Qianchuan delivery techniques in one article!

Probability can also deceive people. How can we make the football team's performance look better? Using Simpson's paradox

In our daily work and life, we often like to anal...

"Surpassing" GPT-4 in all aspects, approaching human comprehension ability! Is the world's most powerful model really here?

In-depth analysis! Understand the massive amount of Qianchuan delivery techniques in one article!

Understand the functional features and application scenarios of WeChat Enterprise Account

Chery's capital increase and share expansion officially started, and nearly 30% of its shares will be transferred

Wuwei Academy main line capture dragon first issue

For nucleic acid testing, you must read these 10 points!

Night Parade of One Hundred Demons丨What kind of ghost will a person become after he dies?

Tencent advertising paid promotion account opening and recharge!

World Immunization Day | Why is the best medicine in the human body?

Case summary: How can an offline store attract 2,000+ people in two hours?

I really don’t recommend you to use the internet-famous “Copenhagen Weight Loss Method”!

Recommend

A brief analysis of the necessity and construction methods of the user growth system for B-end products

Mini Program Enterprise Entity Migration, How to Change the Entity of WeChat Mini Program?

How does "Meipian", which has 80 million users, conduct content operations?

How do community apps stimulate users to produce high-quality content?

Is it possible to start a business with 0 cost? 5 zero-cost side hustle projects! Each one is regular and long-lasting

Musk claims Tesla batteries are 100% recyclable, but experts refute it; electric vehicles still pose environmental risks

13 Marketing Keywords for the First Half of 2021

Where do the rocks in the universe come from? Are most planets made of rocks? Here's the truth

Nan'an SEO Training: What details should be paid attention to when building a mobile website?

How important is WeChat to Tencent? 10 major functions from Apple to Toutiao

Marketing node reminder in December 2017

Probability can also deceive people. How can we make the football team's performance look better? Using Simpson's paradox

For community operation, you need to build a good “personality”!

What did Zhou Chu say when he eliminated the three evils?

Patrick tells us that it’s okay to evolve to have heads all over the body…