Google Search Advocate John Mueller recently explained how to check whether content deeper on a page is indexed in Google Search, during a discussion on the Bluesky social network. The exchange, highlighted by Search Engine Journal, focused on very long pages, HTML size limits, and how Google's crawlers process them.
Google Shows How To Check Passage Indexing: Key Details
In a Bluesky thread, a participant posted a question asking Mueller how many megabytes of HTML Googlebot can crawl and index per page. The question referenced figures of 2 megabytes and 15 megabytes and raised the possibility that HTML or resources might be truncated on very large pages.
Mueller answered that Google uses many different crawlers for different purposes and directed users to Google's public documentation, which includes a list of all the crawlers and their roles.
Addressing the size concern directly, he wrote that "2MB of HTML (for those focusing on Googlebot) is quite a bit" for most sites, and described practical issues caused by this HTML size limit as "extremely rare."
Instead of focusing on page weight measurements, Mueller suggested a straightforward way to confirm that passages are being indexed: search in Google for a distinctive sentence or quote that appears further down on the page. If that specific text surfaces in search results, it indicates that the passage has been indexed.
- Platform: Bluesky social network, in a public thread with John Mueller.
- Topic: How much HTML Googlebot processes per page (2 MB vs. 15 MB) and potential truncation on very long pages.
- Mueller's view: Multiple Google crawlers share responsibilities, and 2 MB of HTML is "quite a bit" for typical sites, with related issues being "extremely rare."
- Recommended check: Run a Google search for a distinctive sentence located later on a page to see if it appears in search results.
- Reference: Google's crawler overview page documents the main bots and their functions.
Background Context on Google Crawlers and Passage Handling
Google operates several automated systems that fetch and process web content, commonly grouped under the label Googlebot. Its publicly available crawler overview provides details on search, ads, image, video, and other specialized bots with their own user agents and responsibilities, and includes a list of all the crawlers.
Community discussions have focused on how Google's documented HTML size limits interact with very long pages and embedded resources. In the Bluesky thread, one participant described a hypothetical case where some resources might load fully while others could be cut off if limits were reached.
Mueller characterized such situations as "extremely rare" in real-world use. Rather than trying to reverse-engineer technical thresholds, he reiterated that site owners can verify indexing by checking whether specific, meaningful on-page text appears in Google Search.
Google has also described its ability to rank individual passages from longer documents, not only entire pages. This allows search systems to surface relevant sections of a multi-topic page when a query matches a particular passage.
Source Citations
The following sources contain the main statements and documentation referenced in this report:






