Google Search analyst Gary Illyes recently explained how the search engine isolates a page’s primary content and why soft 404 errors can drain crawl budget. The remarks were made during the Google Search Central Deep Dive event in Asia and reported about 11 hours ago.
How Google identifies a page's main content
Illyes said Google renders the entire page, performs positional analysis and labels the portion that delivers the page’s main purpose as “centerpiece content.” Words that appear inside this block carry more ranking weight than text in headers, footers or sidebars, so keeping key terms in the main body can improve relevance signals.
Event details
- Event: Google Search Central Deep Dive, Asia
- Speaker: Gary Illyes, Google Search analyst
- Process: Render page - detect centerpiece content - elevate tokens in that zone
- Benefit: Terms moved from sidebars to the main body gain ranking influence
- Helpful practice: Use semantic HTML to separate primary and ancillary elements
Soft 404 errors classified as critical
Illyes labeled soft 404 responses a “critical error” because they waste crawl resources and degrade user experience.
What is a soft 404?
- A URL returns HTTP 200 OK but displays an error message or very little content
- Google treats the page like a true 404 and may drop it from the index
- Even Google's own soft 404 documentation page was once excluded for this reason
- Fix: Serve a proper 404 for deleted content or redirect only when a clear replacement exists
Tokenization powers Google’s index
Illyes confirmed that Google converts page text into tokens before storage. The index therefore contains tokenized data, which supports semantic matching and reduces dependence on exact-match keywords.
Crawl budget implications
Every site receives a finite crawl budget. Large numbers of soft 404 pages can consume resources that would be better spent on valuable URLs, so accurate status codes are essential for efficient crawling.
Sources
The comments were first summarized by Kenichi Suzuki and later reported by Search Engine Journal’s Roger Montti. Additional guidance is available in Google Search Central documentation.