Googlebot Is Not What You Think: Inside Google's Undocumented Crawlers

Google Search representatives Gary Illyes and Martin Splitt recently detailed Googlebot's structure on Google's official Search Off The Record podcast. They explained that Google uses many separate crawlers across products, most of which are not publicly documented.

Google Says They Deploy Hundreds Of Undocumented Crawlers — Google says it operates many internal crawlers beyond those that are publicly documented.

Key Details: Googlebot Crawlers and Infrastructure

Illyes outlined how Googlebot fits into a larger internal crawling service that powers multiple Google products:

The Googlebot name dates back to when Google relied on a single crawler for a single product.
Today, Google operates many different crawlers across products, but the Googlebot label is still widely used as a catch-all term.
Googlebot is one client of a larger internal crawling service, not the crawling infrastructure itself.
The internal crawling system has its own internal name, which Illyes declined to disclose publicly. During the discussion, he referred to it as "Jack" and described it as a software-as-a-service platform.
Internal product teams access this service through API endpoints, passing parameters such as user agent, robots token, and timeouts.
Default values exist for many parameters, which reduces configuration work for internal users.
The core purpose of the system is to fetch content efficiently without overloading websites that allow crawling.

Background Context on Googlebot's Evolution

Illyes described the early 2000s as a period when Google likely ran a single crawler tied to one product. As new products launched, including AdWords, additional crawlers were added to support each service's specific needs.

Despite this growth, the Googlebot name continued to be used broadly for Google's crawling activity. Illyes stressed that talking about Googlebot as a single crawler is technically inaccurate today. In Google's internal model, Googlebot is just one client that sends requests to a shared infrastructure layer, which can serve many distinct crawlers configured for different Google products or internal tools.

Undocumented Crawlers and Documentation Limits

According to Illyes, teams across Google use the shared crawling infrastructure for a wide range of fetch and crawl tasks. He said there may be dozens or even hundreds of internal crawlers beyond those documented for developers.

Only major crawlers and special fetchers are listed in the public developer documentation today. Illyes explained that documenting every small or low-volume crawler would require tracking many additional user agents and would crowd the existing documentation page. Very small crawlers that fetch relatively few URLs typically remain undocumented under the current approach.

Crawlers, Fetchers, and Monitoring Processes

Illyes drew a distinction between crawlers and fetchers within Google's systems:

Crawlers process batches of URLs continuously and usually run as ongoing services for a team, receiving a constant stream of URLs.
Fetchers handle single URL requests and require a user waiting for the response, rather than running continuously.

Illyes said he uses an internal tool that alerts him when a crawler or fetcher crosses a certain volume threshold. When that happens, he follows up with the responsible team to confirm the crawler's purpose and behavior. If a crawler begins fetching enough URLs to be widely noticed, he evaluates whether it should be added to the public documentation.

Source Citations

This report is based on statements by Gary Illyes and Martin Splitt on Google's Search Off The Record podcast and related official documentation.