Late-night blockbuster! Google releases the most powerful AI model Gemini, which "beats" GPT-4 in 30 benchmark tests

After much anticipation, Google's most anticipated large model, Gemini, is finally here.

Google CEO Sundar Pichai and Google DeepMind CEO Demis Hassabis described it as "a huge leap forward for AI models" and said it "will eventually impact almost all of Google's products." Sundar Pichai said in a statement, "These are our first models into the Gemini era and the first realization of our vision when we founded Google DeepMind earlier this year. This new era of models represents one of the biggest scientific and engineering efforts we've ever made as a company." It is reported that this time Google released three models : Gemini Nano, Gemini Pro and Gemini Ultra . Among them,

Gemini Nano is a lighter version that runs natively offline on Android devices, such as the Pixel 8 Pro;

Gemini Pro is a more powerful version that will soon power a host of Google AI services and will be plugged into Bard starting today;

Gemini Ultra is a more powerful version, the most powerful large model Google has created so far, designed primarily for data centers and enterprise applications and scheduled to be launched next year.

In terms of performance, Gemini is ahead of GPT-4 in 30 out of 32 benchmarks , including a wide range of overall tests such as the multi-task language understanding benchmark, as well as tests of its ability to generate Python code.

Figure | Gemini outperforms the state-of-the-art on a range of benchmarks including text and encoding.

Figure | Gemini outperforms the state-of-the-art across a range of multi-modal benchmarks.

In addition, Gemini Ultra scored 90.0%, making it the first model to surpass human experts in MMLU (Massive Multi-Task Language Understanding), which combines 57 subjects including mathematics, physics, history, law, medicine, and ethics to test world knowledge and problem-solving skills.

Gemini’s clearest advantage in these benchmarks comes from its ability to understand and interact with both video and audio. This is largely by design: multimodality was part of the Gemini project from the beginning. Google didn’t train separate models for images and speech, as OpenAI did when it created DALL-E and Whisper; instead, it built a “multisensory” model from the beginning. Demis Hassabis says Google has always been interested in very general systems, and is particularly interested in how to mix all of these modes—collecting as much data as possible from any number of inputs and senses, and then giving equally diverse responses.

Right now, Gemini's most basic mode is text input and text output, but more powerful models like Gemini Ultra can process images, video, and audio. Demis Hassabis said Gemini will also have features like motion and touch - more like robot-type features, and will gain more senses over time, become more sentient, and become more accurate and grounded in the process , "these models will better understand the world around them." Of course, the Gemini model will still produce hallucinations. Benchmarks aren't everything, though. The true test of Gemini's capabilities will ultimately come from everyday users who want to use it to brainstorm, find information, write code, and more. Google seems to see coding in particular as Gemini's killer app, using a new code generation system called AlphaCode 2, and saying it outperforms 85% of coding competition entrants, 50% higher than the original AlphaCode. Equally important to Google, though, is that Gemini is clearly a more efficient model. It was trained on Google's own Tensor Processing Units, running faster and cheaper than Google's previous models like PaLM. Along with the launch of the new model, Google also launched a new version of the TPU system - TPU v5p, a computing system designed specifically for data centers for training and running large-scale models.

It is worth noting that Gemini is currently only available in English, and other language versions will be launched in the future. But Sundar Pichai said that the model will eventually be integrated into Google's search engine, advertising products, Chrome browser, etc.

So, the era of artificial intelligence brought by ChatGPT has lasted for a year. Does the release of Gemini by Google mean that Google has caught up? Or, can Google now stand at the top of the artificial intelligence industry again?

Attached: Statement from Sundar Pichai, CEO of Google and Alphabet:

Every technological revolution is an important opportunity for scientific discovery, human progress, and improved lives. I firmly believe that the artificial intelligence (AI) transformation we are experiencing will be the most profound change our generation has experienced, far more impactful than the previous mobile Internet or network revolutions. AI will not only create opportunities for people around the world, from the everyday to the extraordinary, but will also drive a new wave of knowledge, learning, creativity, and productivity on a scale we have never seen before.

That’s what excites me: making AI useful to everyone in the world.

We’re nearly eight years into our journey as an AI-first company. And the pace of progress has not slowed down, it’s accelerated: Today, millions of people are using generative AI in our products to do things that weren’t possible just last year, like answering more complex questions and using new tools to collaborate and innovate. At the same time, developers around the world are using our models and infrastructure to build new generative AI applications, and startups and enterprises of all sizes are using our AI tools to grow.

It’s an incredible dynamic, but we’ve only begun to explore the possibilities.

We are doing this in a bold and responsible way. This means pursuing ambitious goals in our research, developing technologies that can bring great benefits to people and society, while also building in safeguards and working with governments and experts to address the risks that emerge as AI capabilities grow. We continue to invest in the best tools, foundational models, and infrastructure, and we use our AI principles as a guide to continually improve our products and services.

Today, we are taking the next step in our journey with the launch of Gemini, our most advanced and general model yet, which performs extremely well on multiple leading benchmarks. Our first release, Gemini 1.0, is optimized for different scales, including Ultra, Pro, and Nano. These are our first models entering the Gemini era, and the first realization of the vision we set when we founded Google DeepMind earlier this year. Models in this new era represent one of the largest scientific and engineering efforts we have undertaken as a company. I am incredibly excited about the upcoming developments and the opportunities that Gemini will bring to people around the world.

–Sundar

Reference Links:

https://blog.google/technology/ai/google-gemini-ai/#capabilitieshttps://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf

<<: It's snowing outside, but your body is so lively? | Science Museum

>>: Drinking alcohol and getting red on the face! Is the "Asian flush" an evolutionary "defect"?

Cross-border car-making real estate developers enter the new energy vehicle market

Recommend

Talking about products and operations: What are user expectations?

Well, I’ve been a bad product manager . Therefore...

Android adds five new features: map enhancements, Gboard Emoji combinations, and more

In the coming weeks, Google will launch five new ...

World Heart Day | Is the popular "anti-sudden death package" really useful? Here are the most important things to do to protect your heart →

...

Late-night blockbuster! Google releases the most powerful AI model Gemini, which "beats" GPT-4 in 30 benchmark tests

Cross-border car-making real estate developers enter the new energy vehicle market

The latest data from Tencent QQ: Internet interest report of post-95s!

NetEase, don’t drift away

Case Analysis | A Brief Discussion on Task-based Activity Operations

Marketing promotion: Besides expressing love, what other marketing can be done on 520?

China's 5G mobile phone sales ranking: Huawei is still the first, Xiaomi is overtaken

Watch the fifth season of "The Debaters" and talk about how to operate talents

2020 Wang Ruikuan Shen's Gynecology Tongue Diagnosis Basic Outpatient Treatment Key Points Case Video 17 Lectures

When the Great Cold comes, how should we eat the “best meat for winter tonic”?

How to make YouTube video promotion content more attractive?

Recommend

Talking about products and operations: What are user expectations?

Android adds five new features: map enhancements, Gboard Emoji combinations, and more

World Heart Day | Is the popular "anti-sudden death package" really useful? Here are the most important things to do to protect your heart →

Clothing wholesale WeChat applet, how to develop clothing WeChat applet?

Qualcomm announces $40 million strategic investment in touch technology

Are sleep twitches divided into those that wake yourself up and those that wake others up?

Why do some animals sleep standing up? Aren't they tired?

The rise of HTML5 - not just an inspirational drama, but also a palace drama

[New Generation of Traders] The second episode of the real-time video of the logic of the trader Jenny's circle

Do you think it's cool to open the bottle cap with your teeth? "Open" has "teeth" logic!

Should community O2O bypass property management?

.NET and cloud computing: integrated applications and best practices

Qiuqiu's 7-day traffic explosion attack and defense strategy (Issue 1-2)

Comprehensive improvements! Detailed experience of the latest WeChat version features

What is the principle of anti-peeping film?