Cloudflare to Block AI Bots from Scraping Sites

Cloudflare, a major internet services provider, is taking a strong stand against the rise of unauthorized data scraping by artificial intelligence systems. The company has announced that it will now block AI bots from collecting content from websites by default. This decision aims to give power back to content creators and ensure fair use of online data.

Starting this week, every new website that signs up for Cloudflare’s services will be asked whether they want to allow AI crawlers. The default setting will block these bots unless the site owner gives permission. Website owners can also choose to let AI companies access their data for a fee through a new “pay per crawl” option.

Cloudflare works as a content delivery network (CDN). It speeds up how websites load by caching information close to users around the world. According to a company report from 2023, about 16% of global internet traffic passes through Cloudflare’s systems.

The company says AI bots have been collecting website content without limits. These crawlers often take articles, images, and other web content to train language models like those made by OpenAI and Google. But Cloudflare believes this practice is harmful to web publishers and creators.

“AI crawlers have been scraping content without limits. Our goal is to put the power back in the hands of creators,” said Matthew Prince, CEO and co-founder of Cloudflare. He added that the move will help protect the future of a free and open internet.

AI crawlers are automated tools that scan the internet to collect huge amounts of text, images, and data. The information they gather is used to train large AI systems. But many of these bots do not ask for permission before collecting the data.

Before AI systems, search engines would send users directly to the original source of the content. This helped creators earn income from traffic and ads. Today, many AI platforms use collected content to answer questions directly. Users often get the answers they need without ever visiting the original site.

This shift means publishers are losing valuable web traffic. That also means lower ad income and less support for the content creators behind the scenes.

Cloudflare had already launched a tool in 2023 that let publishers block AI bots with a single click. Now, that tool becomes the default option for all websites using its platform.

Some AI companies have pushed back. OpenAI, which is backed by Microsoft, said it refused to join Cloudflare’s plan. The company said Cloudflare is inserting itself as a “middleman” in the system. OpenAI also pointed out that it uses robots.txt, a common internet rule that tells bots which content they can and cannot use. It says it always respects those rules set by publishers.

Legal experts say the move by Cloudflare could make it harder for AI companies to gather data. Matthew Holman, a lawyer from the UK, noted that AI bots are often very selective in the content they collect. He said they can overload websites and cause problems for regular users.

He added that if Cloudflare’s new rules are effective, it could slow down the development of AI models. Fewer data sources could make it harder for AI to learn and improve. This may lead to a short-term drop in AI performance and could even change how future models are built.

Cloudflare’s move represents a shift in how tech companies handle data rights. By giving website owners more control, the company hopes to balance the needs of AI developers with the rights of content creators. As more people rely on AI tools, this type of regulation may become more common.

Website owners who use Cloudflare will now have a clear choice: block AI bots, charge them, or allow free access. The decision is in their hands.