Perplexity disputes Cloudflare - is AI traffic being misread as bots?

Perplexity AI has challenged Cloudflare's assertion that its AI-powered assistants ignore robots.txt directives, arguing that the traffic flagged as "bot" activity is actually on-demand retrieval initiated by users.

Perplexity vs Cloudflare: Key Details

In a post titled Agents or Bots? Making Sense of AI on the Open Web, the San Francisco startup outlined why it believes Cloudflare’s bot-management system is misclassifying its requests. Perplexity said each assistant query triggers a single page fetch, contrasting that with the continuous crawling typical of search engines or data scrapers.

The company says fetched pages are neither indexed nor added to training datasets.
Perplexity identifies its traffic with the "PerplexityBot" user-agent string and supplies a valid referrer header in about 90% of requests.
Site-level API keys can be honored when publishers choose to provide them.
Cloudflare previously accused Perplexity of bypassing robots.txt and rotating IP addresses to avoid detection.
Perplexity contends that Cloudflare’s rules group its assistants with large-scale scrapers, blocking legitimate answers for end users.
The startup likened its one-off fetches to Google's text-to-speech requests, which also fall outside robots.txt conventions.

Background Context

Cloudflare protects more than 20 million web properties and in May 2024 released new controls allowing site owners to block AI model scrapers. At launch, the company called out Perplexity for allegedly ignoring robots.txt, warning that such activity could degrade site performance.

Perplexity, launched in December 2022, offers a conversational answer engine that combines retrieval augmented generation with licensed language models. The service surpassed 100 million monthly visits in early 2024 and operates on both free and paid tiers.

While the startup has faced copyright questions, it maintains that it only retrieves brief excerpts needed to answer a query and does not engage in broad data harvesting used to train foundation models.

Robots.txt, created in 1994, is a voluntary protocol. Compliance is widespread among major search engines but remains unenforceable, leaving interpretation to individual services and security vendors.

Cloudflare’s detection engine scores requests based on behavior patterns, IP reputation, and header consistency. Customers can override those scores with custom rules. Perplexity said it is willing to work with security providers to reduce false positives but has not provided a timeline for discussions with Cloudflare.

Source Citation

Perplexity AI blog post "Agents or Bots? Making Sense of AI on the Open Web" (June 2024)

Perplexity disputes Cloudflare - is AI traffic being misread as bots?

Perplexity vs Cloudflare: Key Details

Background Context

Source Citation

More articles