"Surpassing" GPT-4 in all aspects, approaching human comprehension ability! Is the world's most powerful model really here?

"Surpassing" GPT-4 in all aspects, approaching human comprehension ability! Is the world's most powerful model really here?

Recently, Anthropic, a large model company known as "OpenAI's strongest competitor", launched its third-generation artificial intelligence (AI) model - the Claude 3 series of models, including Claude 3 Opus, Claude 3 Sonnet and Claude 3 Haiku .

Among them, Claude 3 Opu is the strongest version of the Claude 3 series model. It has close to human understanding capabilities and can deftly handle open prompts and complex tasks. According to official information, its performance exceeds GPT-4 in all aspects.

It is worth mentioning that the Claude 3 Series models have the same sophisticated visual capabilities as other leading models and can handle a variety of visual formats, including photos, charts, graphs and technical diagrams.

Anthropic said in its official X that the Claude 3 series of models "set new industry benchmarks in reasoning, mathematics, coding, multilingual understanding, and vision."

Claude 3 Opus and Claude 3 Sonnet are now directly accessible via the API. The API is now fully open, and developers can start using these models immediately.

In addition, Claude 3 Sonnet is available for free trial on the website (http://claude.ai) for users in some regions, while the use of Claude 3 Opus is only open to Claude Pro users.

Additionally, the Anthropic team says the Claude 3 Series models address the “unnecessary rejection” that was a common problem with previous models.

A new standard of intelligence

Evaluation results show that Claude 3 Opus outperforms similar products on most common evaluation benchmarks for AI systems, including undergraduate-level expert knowledge (MMLU), graduate-level expert reasoning (GPQA), basic mathematics (GSM8K), etc. It demonstrates near-human-level understanding and fluency on complex tasks, "leading the forefront of general intelligence."

All Claude 3 models have improved capabilities in analysis and prediction, nuanced content creation, code generation, and conversation in non-English languages ​​such as Spanish, Japanese, and French.

Near-instant results

The Claude 3-series models support live customer chat, auto-completion, and data extraction tasks where responses are immediate and real-time.

Among them, Claude 3 Haiku is the fastest and most cost-effective model in the same category on the market. It can read an information and data-intensive research paper (about 10k tokens) on arXiv in three seconds, with accompanying charts and graphs.

Claude 3 Sonnet is 2x smarter than Claude 2 and Claude 2.1 in the vast majority of workloads. It excels in tasks that require fast responses, such as knowledge retrieval or sales automation. Claude 3 Opus is slower, similar to Claude 2 and Claude 2.1, but smarter.

Improved accuracy

Compared to Claude 2.1, Claude 3 Opus achieves two times higher accuracy (or correct answers) on challenging open-ended questions, while also reducing incorrect answers.

In addition to making responses more believable, the Claude 3 Series models will enable citations so that answers can be verified by pointing to a precise sentence in a reference.

200K context windows and near-perfect memory

Currently, the Claude 3 series models provide 200K context windows. However, all three models can accept inputs of more than 1 million tokens, and this service may be provided to specific customers who need enhanced processing power in the future. In addition, Claude 3 Opus achieves near-perfect recall and over 99% precision.

The Anthropic team said that to improve the security and transparency of the model, they will continue to develop methods such as Constitutional AI and fine-tune the model to mitigate privacy issues that may arise from the new model.

Although the Claude 3 series models have made progress in key indicators such as biological knowledge, network-related knowledge, and autonomy compared to previous models, they are still at AI Safety Level 2 (ASL-2) according to the Responsible Scaling Policy. The red team assessment results show that the Claude 3 series models currently have a very low probability of causing catastrophic risks.

Easier to use

Claude 3-series models are better at following complex, multi-step instructions. They are particularly good at following brand voice and response guidelines and developing customer-facing experiences that users can trust. Additionally, Claude 3-series models are better at generating popular structured outputs in formats like JSON, making them easier to guide use cases like natural language classification and sentiment analysis.

At the end of the official blog, the Anthropic team wrote:

“As we push the boundaries of AI capabilities, we are equally committed to ensuring our security safeguards keep pace with these leaps in performance. Our premise is that being at the forefront of AI development is the most effective way to guide it towards positive societal outcomes.”

Reference Links:

https://www.anthropic.com/news/claude-3-family

<<:  This fatal sore throat was cured thanks to his “fighting poison with poison” a hundred years ago!

>>:  World Glaucoma Day | Do you often turn off the lights and look at your phone before going to bed? Beware of this vision "thief"!

Recommend

Why did the chick I bought as a child die so quickly? I finally know the answer!

We often see some merchants or small farmers carr...

Case Analysis | How did Zhangmen 1-on-1 grow from 0 to 300,000 users?

Faced with high public domain customer acquisitio...

How to build a marketing platform from 0 to 1?

Marketing platform , different people have differ...

Who invented the concept of virtual age?

At the end of every year, the firecrackers in the...

up to date! Data rankings of 60 information flow advertising platforms!

Today I bring you the latest traffic rankings of ...

How to increase product user growth? Share the 8-step plan!

The author uses a real case to explain how to bui...

4-step method for product data operation

Product managers do a lot of things from 0 to 1, ...

50-day Zhihu Pilot Academy fan increase and monetization plan_Dangxing Academy

Tips for increasing followers/selling products/mo...

WeChat Mini Program users reinstalled the deleted app one week after its release

On January 9, WeChat Mini Program was officially ...

Basic knowledge of data that operators must understand (Part 2)

Which indicators do website operations pay more a...