Wikimedia Foundation: AI crawlers cause Wikimedia Commons bandwidth demand to surge 50%

Wikimedia Foundation: AI crawlers cause Wikimedia Commons bandwidth demand to surge 50%

The Wikimedia Foundation, the management organization of Wikipedia and more than a dozen other crowdsourced knowledge projects, said on Monday that bandwidth consumption for downloading multimedia from Wikimedia Commons has surged 50% since January 2024.

The reason stems not from growing demand from knowledge-hungry humans but from automated, data-hungry crawlers that want to train artificial intelligence models, the company wrote in a blog post on Tuesday.

“Our infrastructure is built to withstand sudden surges in traffic from humans during high-profile events, but the volume of traffic generated by bots is unprecedented and comes with increasing risks and costs,” the post reads.

Wikimedia Commons is a freely accessible repository of images, video and audio files that are available under open licenses or are in the public domain.

Digging deeper, Wikipedia says that nearly two-thirds (65%) of the most "expensive" traffic (i.e., the most resource-intensive in terms of the type of content consumed) comes from bots. Yet only 35% of overall page views come from these bots. According to Wikipedia, the reason for this disparity is that frequently accessed content is stored closer to the user in its cache, while other, less frequently accessed content is stored farther away in "core data centers," from where it costs more to serve the content. This is the type of content that bots typically seek out.

"While human readers tend to focus on specific (often similar) topics, crawler bots tend to 'batch read' large numbers of pages and visit less popular pages," Wikipedia wrote. "This means that these types of requests are more likely to be forwarded to core data centers, making them more expensive for our resources."

All in all, the Wikimedia Foundation’s Site Reliability Team has to spend a lot of time and resources blocking bots to avoid disruption to regular users. And that’s before considering the cloud costs the Foundation faces.

In fact, it represents part of a rapidly growing trend that is threatening the existence of the open internet. Last month, software engineer and open source advocate Drew DeVault complained that AI crawlers were ignoring “robots.txt” files designed to protect against automated traffic. And “pragmatic engineer” Gergely Orosz last week complained that AI crawlers from companies like Meta were increasing bandwidth demands on his own projects.

While open source infrastructure is particularly at the forefront, developers are fighting back with “ingenuity and a vengeance.” Some tech companies are also doing their part to address the problem — Cloudflare, for example, recently launched AI Labyrinth, which uses AI-generated content to slow down crawlers.

However, this is more of a cat-and-mouse game that could ultimately force many publishers to hide behind logins and paywalls — something that would be detrimental to everyone using the web today.

From Chinese Industry Information Station

<<:  Huawei Band B5 review: Breaking the awkward positioning, dual-purpose for both business and sports

>>:  360WiFi6 whole-house router review: not only can it run full bandwidth in the bathroom and balcony, but it is also a network security manager

Recommend

What new hot spots have been brought by international designers' performances?

With the opening of the Shanghai Auto Show, car d...

2020 Marketing Trend Prediction

It seems that every year, articles reviewing mark...

Tractica: Global deep learning software market will reach $67.2 billion in 2025

Deep learning is a buzzword that has received muc...

How to choose between React Native and Flutter?

In the field of mobile application development, R...

They produce 20% of Earth's oxygen, but are disappearing

Most life on Earth depends on oxygen to survive. ...

What will happen to the natural world when humans disappear?

Leviathan Press: Rather than going against our in...

BlackBerry launches classic model again

"If you don't do it, you won't die&q...

Jietuo D2308U Mini PC Review

Giada D2308U is an upgraded version of D2308. Alth...