During the Search Central Deep Dive in Asia on 11 July 2025, Google analyst Gary Illyes detailed how Google isolates and indexes a webpage’s “centerpiece” content while warning about the crawl budget wasted on soft 404 pages.
How Google Weighs Main Content
Illyes explained that Google fully renders a page, identifies the primary visible section, and assigns that area the greatest ranking weight. When text migrates from lower-priority zones (such as sidebars) into the centerpiece, its ability to rank improves.
- Navigation, header, and footer copy receive lower weighting.
- Google’s positional analysis maps tokens to their on-screen location.
- Semantic HTML helps the crawler delineate sections accurately.
After sectioning, Google tokenizes the centerpiece text and stores only numeric representations, not the raw HTML. This abstraction supports semantic search capabilities and reduces index size.
Soft 404 Pages Drain Crawl Budget
Illyes cautioned that pages returning HTTP 200 but displaying an error, empty body, or “not found” message are treated as soft 404s. Google marks them as critical issues, and repeatedly fetching those URLs can consume crawl budget that could be spent on valid content.
Background
Google first defined “main content” in its 2015 Search Quality Rater Guidelines and has consistently encouraged publishers to use semantic markup and return proper 404 responses for removed pages.
Why It Matters for Marketers
- Place high-value keywords in the most visible, central section of each page.
- Use semantic HTML elements (e.g., <main>, <article>, <nav>) to clarify structure.
- Audit for soft 404s and ensure genuinely missing content returns a true 404 status.
Primary Sources
Search Quality Rater Guidelines
Google Search Console Help - Soft 404 errors
Google documentation on handling 404 pages
Event notes as summarized by Kenichi Suzuki