Wikimedia Foundation: AI crawlers cause Wikimedia Commons bandwidth demand to surge 50%

The Wikimedia Foundation, the management organization of Wikipedia and more than a dozen other crowdsourced knowledge projects, said on Monday that bandwidth consumption for downloading multimedia from Wikimedia Commons has surged 50% since January 2024.

The reason stems not from growing demand from knowledge-hungry humans but from automated, data-hungry crawlers that want to train artificial intelligence models, the company wrote in a blog post on Tuesday.

“Our infrastructure is built to withstand sudden surges in traffic from humans during high-profile events, but the volume of traffic generated by bots is unprecedented and comes with increasing risks and costs,” the post reads.

Wikimedia Commons is a freely accessible repository of images, video and audio files that are available under open licenses or are in the public domain.

Digging deeper, Wikipedia says that nearly two-thirds (65%) of the most "expensive" traffic (i.e., the most resource-intensive in terms of the type of content consumed) comes from bots. Yet only 35% of overall page views come from these bots. According to Wikipedia, the reason for this disparity is that frequently accessed content is stored closer to the user in its cache, while other, less frequently accessed content is stored farther away in "core data centers," from where it costs more to serve the content. This is the type of content that bots typically seek out.

"While human readers tend to focus on specific (often similar) topics, crawler bots tend to 'batch read' large numbers of pages and visit less popular pages," Wikipedia wrote. "This means that these types of requests are more likely to be forwarded to core data centers, making them more expensive for our resources."

All in all, the Wikimedia Foundation’s Site Reliability Team has to spend a lot of time and resources blocking bots to avoid disruption to regular users. And that’s before considering the cloud costs the Foundation faces.

In fact, it represents part of a rapidly growing trend that is threatening the existence of the open internet. Last month, software engineer and open source advocate Drew DeVault complained that AI crawlers were ignoring “robots.txt” files designed to protect against automated traffic. And “pragmatic engineer” Gergely Orosz last week complained that AI crawlers from companies like Meta were increasing bandwidth demands on his own projects.

While open source infrastructure is particularly at the forefront, developers are fighting back with “ingenuity and a vengeance.” Some tech companies are also doing their part to address the problem — Cloudflare, for example, recently launched AI Labyrinth, which uses AI-generated content to slow down crawlers.

However, this is more of a cat-and-mouse game that could ultimately force many publishers to hide behind logins and paywalls — something that would be detrimental to everyone using the web today.

<<: Huawei Band B5 review: Breaking the awkward positioning, dual-purpose for both business and sports

>>: 360WiFi6 whole-house router review: not only can it run full bandwidth in the bathroom and balcony, but it is also a network security manager

What are the functions of the Guangzhou WeChat restaurant ordering mini program? How much does the WeChat food ordering app cost per year?

Blog

Birds called "stupid birds" are actually very smart | Nature Trumpet

Xpeng Motors Financial Report: In 2023, Xpeng's total revenue will be 30.68 billion yuan, and its gross profit margin will drop to 1.5%

Although performance hit a new high, Xpeng's ...

Science in this week's hot topics: Playing with your phone for 8 minutes before bed can make you excited for more than an hour

2022, Week 9, Issue 7, Total Issue 23 I wanted to...

Wikimedia Foundation: AI crawlers cause Wikimedia Commons bandwidth demand to surge 50%

What are the functions of the Guangzhou WeChat restaurant ordering mini program? How much does the WeChat food ordering app cost per year?

Birds called "stupid birds" are actually very smart | Nature Trumpet

3 steps to teach you how to operate a good community

4 tricks for brand copywriting!

Android 7.0 Nougat's five biggest flaws: no support for floating windows

Why do 1 million people believe the Earth is flat?

Game Hacking

Application and expansion of MVP model in Ctrip Hotels

To eat this delicious mushroom, you may need to go to the desert to dig sand

A must-have tool for parents to help their children! Youdao Children's Dictionary is launched in 2019

Recommend

Feeding on human blood, outbreaks in many countries! Beware of this bug...

Feng Gong's 12-part film collection

What are the number ranges for 400 telephone numbers?

President Xi Jinping delivers 2024 New Year message

In addition to spending money to buy traffic, what other ways are there to gain traffic for free with iOS 11?

Many parents are buying it for more than 1,000 yuan! Experts: There are risks, and it must be used under doctor's advice

Xpeng Motors Financial Report: In 2023, Xpeng's total revenue will be 30.68 billion yuan, and its gross profit margin will drop to 1.5%

Science in this week's hot topics: Playing with your phone for 8 minutes before bed can make you excited for more than an hour

Yakeshi SEO Training: What traps should you avoid when exchanging friendly links?

We talk about operations every day, so what kind of data does operations need?

Foreign media: Huawei is preparing to release its own mobile operating system

10 UI/UX Lessons for Designing a Product from Scratch

Overseas promotion and marketing, complete tutorials on YouTube advertising!

MIUI6 has caused controversy. Where will the customized system go in the future?

When people reach middle age, their bodies become more and more round. What's going on?