Hostinger's server-log study on AI crawlers indicates that website owners are increasingly separating AI training bots from AI search and assistant bots, with OpenAI's search crawler now reaching over half of observed sites while GPTBot's training coverage collapses.
Executive snapshot - OpenAI search crawler coverage
- Hostinger analyzed 66.7 billion bot requests across more than 5 million hosted websites over three separate 6-day windows, mapping bots using the AI.txt classification scheme.[S1][S2]
- OpenAI's GPTBot (training crawler) dropped from 84% website coverage to 12% over the study period, indicating widespread blocking of AI training bots.[S1][S2]
- OpenAI's OAI-SearchBot (ChatGPT search crawler) reached 55.67% average coverage, while TikTok's bot reached 25.67% coverage with 1.4 billion requests, and Apple's bot reached 24.33% coverage.[S1][S2]
- Googlebot held at 72% average coverage with 14.7 billion requests, and Bingbot at 57.67%, showing relatively stable access compared with the volatility in AI-focused crawlers.[S1][S2]
- Ahrefs had the broadest SEO-tool crawler coverage at 60%, but the overall SEO tool crawler category declined as more site owners restrict resource-heavy bots.[S1][S2][S4]
Implication for marketers: differentiated bot policies - blocking model-training crawlers while allowing search and assistant bots - are becoming standard and may shape both AI search visibility and infrastructure costs.[S1][S2][S3]
Method and source notes for AI and search crawler analysis
Hostinger's AI bot analysis is based on anonymized server logs for 66.7 billion bot requests on more than 5 million websites hosted on its infrastructure.[S1][S2] The study aggregates three separate 6-day time windows, classifying bots using the open AI.txt project taxonomy into categories such as AI training crawlers, AI assistant and search crawlers, classic search engine bots, and SEO and marketing tools.[S1][S2] Coverage is defined as the share of sampled sites that received at least one request from a given bot during those windows, not the share of total traffic.[S1][S2]
The Search Engine Journal (SEJ) report by Matt G. Southern provides the primary public summary of this Hostinger analysis.[S2] That report cross-references:
- BuzzStream and SEJ data showing 79% of top news publishers now block at least one AI training crawler.[S3]
- Cloudflare "Year in Review" findings that GPTBot, ClaudeBot, and CCBot lead in full disallow directives across top domains.[S3]
- Vercel infrastructure data indicating GPTBot generated 569 million requests in a single month for some hosted properties, raising server-load concerns.[S4]
- OpenAI documentation that distinguishes OAI-SearchBot (governs ChatGPT search indexation and respects robots.txt) from ChatGPT-User (user-initiated browsing that may not follow robots.txt in the same way).[S5]
Key limitations: the dataset only covers Hostinger-hosted sites; full bot user-agent lists and raw logs are not public; time windows are brief; and coverage does not directly measure traffic quality, clicks, or revenue impact.[S1][S2]
Findings on AI training bots, assistant crawlers, and search engine bots
Hostinger's data points to a two-track pattern: AI training crawlers are losing access, while assistant and search-style AI bots are expanding. Classic search crawlers remain relatively stable, and SEO tools show a gradual pullback.[S1][S2]
AI training crawlers
- OpenAI's GPTBot coverage declined sharply from 84% of observed sites to 12% across the measured windows.[S1][S2]
- Meta's ExternalAgent was the largest AI training crawler by request volume in Hostinger's dataset, yet the training-bot category overall showed the strongest coverage decline, which Hostinger attributes partly to active blocking.[S1][S2]
- External studies align with this trend: 79% of major news publishers block at least one training bot, based on BuzzStream's analysis of leading news domains.[S3]
- Cloudflare's recent review similarly reported that GPTBot, ClaudeBot, and CCBot top the list of bots facing full robots.txt disallows across prominent domains.[S3]
AI assistant and search-style crawlers
- OpenAI's OAI-SearchBot reached 55.67% average coverage across the sample, making it one of the most widespread AI assistant bots.[S1][S2]
- TikTok's bot achieved 25.67% coverage and generated 1.4 billion requests in Hostinger's sample, indicating aggressive content fetching linked to its search or recommendation features.[S1][S2]
- Apple's bot reached 24.33% coverage, suggesting quiet but meaningful crawling activity for its own search or assistant products.[S1][S2]
- These assistant bots are reported as user-triggered and targeted, fetching content to answer specific user queries rather than for large-scale model training.[S1][S2]
Classic search engine crawlers
- Googlebot maintained about 72% coverage across the three windows, with 14.7 billion requests logged, and showed no major shifts in access levels.[S1][S2]
- Bingbot coverage remained near 57.67%, again pointing to relative stability versus AI training bots.[S1][S2]
- Hostinger notes that blocking Googlebot or Bingbot would directly affect traditional search visibility, which likely keeps their coverage more stable despite growing concern about AI crawlers.[S1][S2]
SEO and marketing tool crawlers
- Ahrefs had roughly 60% site coverage, the broadest among SEO tools, but the category as a whole showed declining access across the study period.[S1][S2]
- Hostinger attributes this reduction to two forces: tools focusing their crawling on domains actively engaged in SEO work, and site owners blocking resource-intensive bots for performance and cost reasons.[S1][S2][S4]
- Vercel's reported 569 million GPTBot hits in a month for some publishers illustrates the scale of resource usage that pushes owners to tighten bot rules.[S4]
Interpretation and implications for AI crawler management
Likely - site owners are segmenting bots by perceived value. Taken together, Hostinger, BuzzStream, and Cloudflare data indicate that site operators are making a clear distinction between AI bots that take content for model training and those that can drive visibility or user discovery.[S1][S2][S3] Training bots lose coverage sharply, while assistant and search bots such as OAI-SearchBot grow. For marketing and product teams, this suggests that robots.txt is increasingly managed like a channel-mix decision rather than an all-or-nothing stance on AI.
Likely - allowing OAI-SearchBot is becoming a requirement for ChatGPT search visibility. OpenAI's documentation states that OAI-SearchBot controls whether a site can appear in ChatGPT's search results and that it respects robots.txt directives.[S5] Combined with Hostinger's 55.67% coverage number, this suggests that inclusion in ChatGPT-driven search experiences will depend on not blocking this user agent.[S1][S2][S5] For brands that care about AI search exposure, whitelisting OAI-SearchBot while blocking GPTBot is emerging as a common configuration.
Likely - resource costs will keep driving restrictions on high-volume crawlers. The Vercel example of 569 million GPTBot requests in a month underlines the infrastructure burden that unrestricted AI crawling can create.[S4] Hostinger's observation of declining SEO-tool coverage and training-bot coverage aligns with a cost-management response: many owners are starting from a "deny by default" stance for heavy bots, especially where no direct traffic or revenue path is visible.[S1][S2][S4] CDN-level blocking is increasingly used to ease origin load.[S1][S2]
Tentative - AI assistant crawlers may become a parallel discovery channel next to classic search. As OAI-SearchBot, TikTok's bot, and Apple's crawler expand coverage while Googlebot and Bingbot remain steady, AI assistants appear to be forming a parallel layer of content discovery.[S1][S2] For brands, this likely means that metadata, crawlability, and content freshness need review not only for search engines but also for assistant and search bots that generate AI answers. Evidence on click-through and conversions from these channels is still limited, so financial impact remains uncertain.
Speculative - selective access could influence how models represent your brand. Blocking training bots while allowing assistant and search bots may limit how future models learn from your content but still allow short-form answers referencing your site when users ask AI assistants.[S1][S2][S5] The long-term effect on brand representation in model outputs is not yet well measured; businesses may decide based on risk tolerance around content reuse versus potential reach through AI interfaces.
Contradictions and gaps in current AI crawler data
Several gaps and potential conflicts remain in the public data:
- Limited visibility beyond one hosting provider. Hostinger's dataset is large but only reflects sites hosted on its infrastructure; behavior on enterprise, government, or niche hosting environments may differ.[S1][S2]
- Short observation windows. Three 6-day windows provide a snapshot, not a full seasonal or yearly view. Spikes from specific launches or model updates could skew coverage and request counts.[S1][S2]
- Coverage vs. impact not yet linked. The study tracks whether bots visit a site, not what they do with that content or how it affects impressions, clicks, or revenue in AI or traditional search surfaces.[S1][S2]
- Robots.txt behavior for some bots remains unclear. OpenAI notes that ChatGPT-User may not be governed by robots.txt in the same way as OAI-SearchBot.[S5] That raises questions about how fully site owners can control all AI-related access through conventional crawling rules.
- No uniform standard for AI bot labeling. While Hostinger uses AI.txt classifications, not all AI crawlers are transparent with user agents, and some may rotate fingerprints or use generic identifiers, leaving blind spots in log-based analyses.[S1][S2]
As a result, marketers and technical teams have directional evidence that blocking training bots while allowing assistant and search bots is spreading, but they lack precise ROI metrics on how these choices change visibility and conversions.
Data appendix for AI crawler coverage and traffic
Approximate coverage and request metrics from Hostinger's analysis, as reported by SEJ:[S1][S2]
| Bot / Category | Role | Avg. site coverage | Request volume notes |
|---|---|---|---|
| GPTBot (OpenAI) | LLM training crawler | 84% to 12% | High volume; sharp coverage drop |
| ExternalAgent (Meta) | LLM training crawler | Not specified | Largest training bot by volume |
| OAI-SearchBot (OpenAI) | ChatGPT search crawler | 55.67% | Expanding assistant and search reach |
| TikTok bot | AI/search assistant crawler | 25.67% | 1.4 billion requests |
| Apple bot | Assistant/search crawler | 24.33% | Growing but quieter footprint |
| Googlebot | Classic search crawler | 72% | 14.7 billion requests |
| Bingbot | Classic search crawler | 57.67% | Stable coverage |
| Ahrefs | SEO tool crawler | 60% | Category in overall decline |
For policy decisions, Hostinger advises checking actual server or CDN logs to see which bots hit your properties, then adjusting robots.txt and network-level rules to match goals around visibility, legal risk, and infrastructure cost.[S1][S2][S4][S7]
Sources referenced
[S1] Hostinger, AI bot analysis based on internal server logs (66.7B requests across 5M+ sites, three 6-day windows), as summarized.
[S2] Matt G. Southern, "OpenAI Search Crawler Passes 55% Coverage In Hostinger Study," Search Engine Journal.
[S3] BuzzStream and SEJ analysis of AI training bot blocking across major news publishers; Cloudflare "Year in Review" on AI crawler disallows.
[S4] Vercel infrastructure data on GPTBot request volumes and resource usage, as reported by SEJ.
[S5] OpenAI, "Bots" documentation describing OAI-SearchBot and ChatGPT-User behavior and robots.txt handling.
[S6] SEJ report on major AI crawler user agents and identification guidance for webmasters.






