Connectivity cloud company Cloudflare, Inc., unveils Content Signals Policy, a tool to help website owners and publishers gain more control over their content.
The new policy simplifies the process of updating robots.txt, a text file that instructs web crawlers on how to access a site. It's designed to make it easy for website operators to express their data usage preferences and opt out of AI overviews and inference, providing a sense of ease and control.

Matthew Prince, co-founder and CEO of Cloudflare, said: "To ensure the web remains open and thriving, we're giving website owners a better way to express how companies are allowed to use their content. Robots.txt is an underutilised resource that we can help strengthen, and make it clear to AI companies that they can no longer ignore a content creator's preferences."
Content Signals Policy
While robots.txt files may not completely prevent unwanted scraping, they do inform bot operators about the website owner's preferences regarding the content. The policy explains how to interpret the content signals in simple terms: "Yes" means allowed, "no" means not permitted, and "no signal" means no expressed preference. It also defines the different ways that a crawler typically uses content in clear terms, including search, AI input, and AI training. Importantly, the policy reminds companies that website operators' preferences in robots.txt files can have legal significance.
Cloudflare will automatically update the robots.txt files to include this new policy language for all customers who request that the company manage their robots.txt file.

"For the web to remain a place for authentic human interaction, platforms that empower communities must be sustainable. We support initiatives that advocate for clear signals protecting against the abuse and misuse of content," said Chris Slowe, CTO of Reddit.