Google introduced BlockRank, an in-context ranking method for search, in a new Google Research paper. The approach uses a language model to rank candidate documents against a query. The authors report competitive accuracy and improved efficiency in experiments.
Key details
- BlockRank prompts a model with task instructions, candidate documents, and a query to produce rankings.
- The method leverages two observed attention patterns: inter-document block sparsity and query-document block relevance.
- Experiments used a 7B Mistral model across BEIR, MS MARCO, and Natural Questions datasets.
- Comparisons included FIRST, RankZephyr, RankVicuna, and a fine-tuned Mistral baseline.
- The authors report BlockRank matched or exceeded baseline accuracy while improving training and inference efficiency.
- Evaluation was limited to Mistral-7B and did not cover additional base models.
- The paper does not announce production deployment in Google products or services.
Background
In-context ranking uses a model’s context window to assess relevance across provided documents and a query. It can reduce reliance on separate retrievers by enabling direct ranking within the model. Interest in this approach has grown as context windows and model capabilities expand.
In 2024, Google researchers examined whether long-context models can subsume retrieval-like tasks. That study explored how far long-context models can replace or absorb external retrieval for certain workloads. The new BlockRank paper focuses on efficiency and accuracy for ranking scenarios. See the prior study on arXiv PDF and the BlockRank paper on Google Research: Scalable In-context Ranking with Generative Models.






