New Data Reveals How Rarely Googlebot's 2 MB Limit Actually Matters For SEO

This report compares Googlebot’s 2 MB crawl limit with real-world HTML page sizes, based on HTTP Archive data and tooling tests.

New Data Shows Googlebot’s 2 MB Crawl Limit Is Enough — New data indicates Googlebot’s 2 MB crawl limit is more than sufficient for most pages.

Executive snapshot: Googlebot 2 MB crawl limit and HTML size

Median HTML size across real-world pages is about 33 KB, far below Googlebot’s 2 MB HTML crawl cap.[S1][S2]
At the 90th percentile, HTML weight is about 155 KB, so around 90% of pages ship less than 8% of the 2 MB limit.[S1][S2]
Only at the extreme 100th percentile do HTML sizes expand into hundreds of megabytes (for example, ~401.6 MB desktop and ~389.2 MB mobile homepages), making 2 MB-plus HTML pages statistical outliers.[S1][S2]
HTML sizes for desktop and mobile, and for homepages vs inner pages, remain very similar until the heaviest tail; differences only become large at the 100th percentile.[S1][S2]
A simulation by Tame The Bots shows that truncating HTML at 2 MB almost never affects what Googlebot would “see” for normal sites, underscoring how rare these outliers are.[S1][S4]

For marketers, the implication is that HTML file size almost never limits Google crawling or indexing. Performance, accessibility, and content quality have far more impact than staying comfortably under 2 MB.

Method and source notes on Googlebot crawl limit and HTML weight data

The figures above come primarily from HTTP Archive’s Web Almanac “Page Weight” chapter (2025 edition) and are summarized in the Search Engine Journal (SEJ) article referenced in the sources.[S1][S2]

What was measured

Metric: “HTML bytes” - the textual weight of the HTML document, including markup and any inline script or style content, but excluding external JS/CSS files and media. As HTTPArchive explains, this metric reflects the actual HTML payload a browser or crawler receives.[S2]

Scope: Live web pages drawn from the Chrome UX Report corpus. Previous Almanac editions used millions of URLs across desktop and mobile; the 2025 chapter follows the same pattern.[S2][S3]

Breakdowns:

Percentiles of HTML size (10th-100th).[S2]
Desktop vs mobile HTML size.[S2]
Homepages vs inner pages.[S2]

Googlebot limit and tooling

Googlebot HTML limit: SEJ reports a Googlebot crawl cap of 2 MB for HTML and other text-based responses, with assets (JS, CSS, images) fetched separately.[S1]

Tame The Bots: Tame The Bots updated their tool so that the Fetch & Render feature truncates text-based responses at 2 MB to simulate how much of the HTML Googlebot would read.[S1][S4] Dave Smart also posted about this update, noting how rarely typical sites lose meaningful content under this limit.[S4]

Page-weight tools:

The Toolsaday web page size checker measures total page weight for one URL at a time.[S1][S5]
The Small SEO Tools Website Page Size Checker measures total page weight for up to ten URLs at a time.[S1][S6]

These tools are useful for overall payload, but they report combined resource size, not just HTML bytes.

Key limitations and caveats

HTTP Archive samples the web at scale, but it is still a sample. It may over- or under-represent certain site types, such as very small regional sites or authenticated apps.[S2][S3]
The reported HTML sizes include inline JS/CSS, which can inflate weight on some frameworks. That is still what Googlebot sees, so it is the relevant metric for the 2 MB discussion.[S2]
The extreme 100th-percentile values (hundreds of megabytes of HTML) likely include abnormal or misconfigured pages and should be treated as edge cases, not standard practice.[S2]
SEJ’s reporting of a 2 MB cap is based on Google documentation and community testing, but Google has not published distribution data showing how many of its indexed pages actually exceed that limit.[S1]

Findings on typical HTML page weight and the 2 MB Googlebot ceiling

The central data point is that raw HTML is very small compared with the 2 MB crawl cap. Raw HTML is effectively a text file; reaching 2 MB would usually require more than two million characters.[S1]

According to the latest HTTP Archive data, the median HTML weight is 33 KB.[S1][S2] That means half of all measured pages ship 33 KB or less of HTML - roughly 1.6% of the 2 MB limit. At the 90th percentile, HTML size is about 155 KB.[S1][S2] Even there, pages use under 8% of the 2 MB threshold. In practical terms, 90% of sites send less than ~155 KB of HTML, including inline scripts and styles.

Only at the 100th percentile do HTML sizes become extremely large. The Web Almanac notes that at this top percentile, desktop HTML reaches ~401.6 MB and mobile HTML ~389.2 MB for some homepages.[S1][S2] For inner pages, the divergence is even bigger: inner-page HTML around 624.4 MB vs 166.5 MB for homepages, a difference of about 375%.[S1][S2] These values exceed the 2 MB crawl cap by two orders of magnitude and reflect rare, heavily bloated or misconfigured documents.

Overall, this distribution shows a steep curve: the vast majority of HTML documents are tens or low hundreds of kilobytes, while a tiny minority balloon into hundreds of megabytes. That shape means the 2 MB crawl cap sits far out in the tail, well above typical and even heavy real-world HTML sizes.[S2][S3]

Findings on mobile vs desktop HTML size and template behavior

The HTTP Archive report highlights how similar HTML sizes are between mobile and desktop experiences. For the 10th and 25th percentiles, HTML size is effectively the same on both device types.[S2] Starting from the 50th percentile, desktop pages become slightly larger, but the difference is small and remains minor until the extreme tail.

The authors interpret this as evidence that most sites serve nearly identical HTML to both mobile and desktop users:[S2]

“The size difference between mobile and desktop is extremely minor, this implies that most websites are serving the same page to both mobile and desktop users.”[S2]

This pattern holds for both homepages and inner pages. The report notes “little disparity” in HTML size between these template types up to around the 75th percentile, with differences becoming notable only in the heaviest segment.[S2] At the 100th percentile, inner pages show much more bloat (624.4 MB) than homepages (166.5 MB), which likely represents unusual implementations such as extremely long filter result pages, giant reports, or technical anomalies.[S1][S2]

From a crawl-limit standpoint, the important point is that ordinary mobile and desktop versions of both homepages and inner pages almost always fall far below 2 MB of HTML. Shared templates across devices tend to keep HTML distribution similar, and even when mobile and desktop differ, those differences are usually measured in tens of kilobytes, not megabytes.[S2][S3]

Interpretation and implications for SEO strategy and crawl management

Interpretation - analyst view, based on the data above. Labels indicate confidence.

Likely: HTML size is a non-issue for crawl limits on almost all marketing sites

Given a 33 KB median and 155 KB at the 90th percentile, a 2 MB HTML cap is many times larger than what 90-95% of public pages actually ship.[S1][S2] For small and mid-size business sites, news sites, standard SaaS marketing pages, and similar properties, it is very unlikely that any indexable page’s HTML approaches 2 MB.

For SEO teams, chasing HTML byte reductions purely to stay under 2 MB is low-value work.
Technical effort is better spent on areas that affect both search and user experience: overall page speed, JavaScript execution cost, Core Web Vitals, and information architecture.[S2][S3]

Likely: Only unusual implementations are at risk of hitting the 2 MB cap

Pages that might approach or exceed 2 MB of HTML tend to share patterns such as:[S2][S3]

Very large inline <script> blocks or JSON blobs (for example, full page data serialized into the HTML rather than fetched via API).
Massive tables or reports rendered server-side in a single document.
Inlined resources that would normally be external (for example, base64-encoded images or fonts embedded directly in HTML).

For such edge cases, tools like Tame The Bots’ 2 MB-capped Fetch & Render view can confirm how much of the page Googlebot would actually receive.[S1][S4]

Likely: Key content should appear early in the HTML regardless of size

Even if a page never reaches 2 MB, placing critical navigation, internal links, and primary text content reasonably early in the markup remains a sound technical SEO pattern. This helps:[S2][S3]

Guard against any unforeseen truncation for atypical pages.
Reduce time to first meaningful render for users as browsers parse HTML from the top down.

Tentative: HTML bloat still matters for performance even below 2 MB

While a 300-500 KB HTML file is well below the crawl cap, it can still slow initial load and interact badly with heavy JavaScript bundles.[S2][S3] Trimming unnecessary markup, moving large inline scripts to external files, and simplifying templates can yield performance and maintainability gains, even if Googlebot’s 2 MB limit is not directly threatened.

Speculative: Severe outliers may see content or links ignored after 2 MB

For the tiny fraction of pages with hundreds of megabytes of HTML, Googlebot will not read the full document if the practical cap is 2 MB.[S1][S2] In those cases, content, internal links, or structured data placed very deep in the HTML could be unseen by Google. There is limited public evidence on how often this happens or how Google treats such pages in ranking; real-world examples are rare, and many such pages likely have other quality or usability issues that limit their search visibility.

Contradictions and gaps around the 2 MB limit and HTML sizing

Several uncertainties and gaps in the available data are relevant for cautious planning:

No direct count of 2 MB-plus pages in Google’s index. HTTP Archive shows that 2 MB-plus HTML pages exist but are confined to the extreme tail (100th percentile).[S2] Google has not published how many such pages it actually crawls or indexes, or how it handles them beyond the documented cap.
Differences between lab measurements and Google’s infrastructure. HTTP Archive uses a standardized testing stack (Lighthouse/WebPageTest) and the Chrome UX Report sample.[S2][S3] Google’s crawler may face different conditions (compression, streaming behavior, or error handling). While the broad distribution is likely directionally accurate, exact byte counts per page may differ.
Ambiguity around which resource types are affected. SEJ reports a 2 MB cap on HTML and text-based content, with assets fetched separately.[S1] Google’s public documentation has focused on text-based resources, but it is not fully clear how this limit applies to non-HTML formats or edge cases such as HTML served with unusual headers.
Outlier pages are poorly understood. The presence of 400-600 MB HTML pages at the 100th percentile indicates misconfigurations or very unusual applications.[S2] There is little research on how search engines treat such extreme documents, partly because they are so rare and often not core marketing pages.
Tooling mismatch with the specific HTML metric. Toolsaday and Small SEO Tools report overall page size, not HTML bytes alone.[S1][S5][S6] A page might have a small HTML document but a large total payload due to images and scripts. Conversely, a page with heavy inline scripts could have high HTML bytes even if total payload is moderate. For precise HTML size checks, developers often need browser dev tools or server-side inspection, which are not directly addressed in the SEJ article.

These gaps suggest that HTML size should be monitored as a sanity check rather than a primary KPI, with special attention only where templating or application architecture indicates a risk of very large inline content.

Data appendix on HTML size percentiles vs the 2 MB crawl cap

The table below summarizes key data points from the SEJ article and HTTP Archive report, and how they relate to Googlebot’s 2 MB cap.[S1][S2]

Metric (HTML bytes)	Approx. size	Share of 2 MB cap	Notes
Median HTML size	33 KB	~1.6%	50% of pages are at or below this size.[S1][S2]
90th percentile HTML size	155 KB	~7.8%	90% of pages are at or below this size.[S1][S2]
2 MB Googlebot crawl cap (HTML)	2,000 KB	100%	Reported limit for HTML/text responses.[S1]
100th percentile homepage HTML (desktop)	401.6 MB	~20,000%	Extreme outlier; likely abnormal pages.[S1][S2]
100th percentile homepage HTML (mobile)	389.2 MB	~19,000%	Similar extreme tail for mobile.[S1][S2]
100th percentile inner page HTML	624.4 MB	~31,000%	Very heavy outliers on non-home pages.[S1][S2]
100th percentile homepage HTML	166.5 MB	~8,300%	Shows disparity vs inner pages at the top tail.[S1][S2]

From this distribution, the operational takeaway is that 2 MB is far beyond normal HTML usage, and only a minute fraction of pages risk truncation by Googlebot, usually because of extreme inline content or errors rather than everyday site design.[S1][S2]

Sources

[S1] Montti, R., “New Data Shows Googlebot’s 2 MB Crawl Limit Is Enough,” Search Engine Journal, 2026.
[S2] HTTP Archive, Web Almanac 2025, “Page Weight” chapter - HTML bytes and percentile data.
[S3] HTTP Archive, Web Almanac 2019-2022, “Page Weight” chapters - historical HTML size distributions.
[S4] Smart, D., “Tame The Bots Fetch & Render” tool update adding a 2 MB cap simulation, 2026.
[S5] Toolsaday, “Web Page Size Checker” - online tool documentation.
[S6] Small SEO Tools, “Website Page Size Checker” - online tool documentation.