Google flips switch on MUVERA-style search - discover the AI it left out

At Search Central Live: Deep Dive Asia in Singapore on 11 June 2024, Google Search analyst Gary Illyes confirmed that the company's live ranking pipeline already uses a retrieval approach comparable to the MUVERA architecture revealed last month. He also said he is not aware of any production use of the Graph Foundation Model (GFM) in web search.

MUVERA retrieval system confirmed

During a Q&A, independent consultant José Manuel Morgal LinkedIn profile asked about MUVERA - Multi-Vector Retrieval via Fixed-Dimensional Encodings. After requesting a brief definition, Illyes responded that Search already employs "something similar", though the technology is not branded MUVERA internally.

His remark matches a Google Research blog post and paper published in May 2024, which describe compressing multi-vector token embeddings into single fixed-length vectors. The research showed lower latency than the PLAID baseline while improving recall, indicating the method is suitable for large production indexes. Illyes did not share rollout dates, affected features or specific metrics, but his acknowledgment implies the technique is active in live search.

Key technical takeaways

Fixed-Dimensional Encodings approximate multi-vector similarity within one vector.
Maximum inner product search accelerates retrieval across large indexes.
Chamfer similarity reranking refines results with minimal added latency.

Graph Foundation Model status

In a follow-up question, Morgal asked whether Google has integrated its new Graph Foundation Model into search. Illyes joked that he was unfamiliar with the term and said he believes the model is not in production for ranking. The GFM research, disclosed in May 2024, details an architecture that learns relationships from relational tables and adapts to unseen schemas without retraining.

According to the blog post, GFM delivered large precision gains on internal classification tasks, particularly spam detection in advertising data. However, Illyes noted that he does not oversee every research release and therefore cannot confirm future deployment plans.

Performance notes from the GFM team

Average precision improvements of 3x to 40x on internal tasks.
Benchmarks run on graphs containing billions of nodes and edges.
Training and inference executed with JAX on TPU infrastructure.

Background and sources

Both MUVERA and GFM were introduced on the Google Research blog in May 2024. MUVERA's paper is available on arXiv, while a preprint for GFM has not yet been released. Search Central Live: Deep Dive Asia is a recurring Google outreach event for practitioners in the Asia-Pacific region. Illyes's comments provide operational insight but do not constitute official product roadmaps. A related LinkedIn post by Morgal summarises the session.