Operator note

DeepPolisher halves genome errors - see how Google's Transformer rewrites DNA accuracy

Google Research and UC Santa Cruz unveil DeepPolisher, an open-source Transformer that cuts assembly errors 50 percent in hours - find out how.

Minimalist tech illustration of an AI chip polishing a DNA helix with a researcher observing and check marks on the clean side

Google Research and the UC Santa Cruz Genomics Institute released DeepPolisher on 6 August 2025. The open-source deep-learning tool halves residual errors in long-read genome assemblies and already supports the Human Pangenome Reference Consortium (HPRC) second data release.

DeepPolisher at a glance

The software uses an encoder-only Transformer to identify and correct miscalls in draft assemblies. In tests across 180 human genomes, it delivered substantial accuracy gains.

  • Publication: Paper in Genome Research, 6 Aug 2025.
  • Code: Apache 2.0 Code Repo.
  • Error reduction: total errors drop by roughly 50 percent; indels fall 70 percent.
  • Quality: average Q-score rises from Q66.7 to Q70.1 in benchmark regions.
  • Training data: HG002 cell line from the Personal Genomes Project, certified by NIST and the NHGRI.
  • Inputs: base calls, Phred quality, mapping confidence, phase labels, mismatch flags.
  • Deployment: applied to 232 assemblies in the HPRC second release.
  • Performance: runs on standard GPUs and polishes a human genome in under four hours.

The tool integrates with existing long-read assembly workflows and outputs polished FASTA files ready for public release.

Why polishing matters

Short-read sequencers – largely developed by Illumina – provide high per-base accuracy but read only a few hundred bases at a time. Long-read platforms from Pacific Biosciences stretch reads to tens of thousands of bases, yet early versions carried error rates near 10 percent.

Pacific Biosciences lowered raw errors to about 1 percent with circular consensus sequencing. Google’s DeepConsensus later pushed residual errors below 0.1 percent, but draft assemblies still contained millions of mismatches and indels that disrupt the reading frame.

DeepPolisher learns from multiple overlapping reads at each genomic position, correcting those final discrepancies. Assemblies polished with the model now approach reference-grade accuracy, supporting variant discovery, population studies and clinical genetics.

How it works

The Transformer ingests base calls, quality scores, mapping confidence and phase information, then outputs a corrected consensus sequence. By considering every k-mer context, the model recognises systematic errors that evade traditional tools.

Unlike earlier neural methods that operate at the read level, DeepPolisher focuses on the assembly itself, making it agnostic to specific aligners or variant callers. The authors report that phased assemblies maintain accurate haplotype structure after polishing.

Road ahead

DeepPolisher is already part of the HPRC pipeline and is expected to feature in plant reference genomes, biodiversity initiatives and other large projects. Google Research says the approach builds on its first demonstration of this on a human genome and will continue to evolve alongside sequencing technology.

Researchers can download the tool from the Code Repo and review the methods in the Paper. Additional performance data are available on the Google Research blog.

Keep reading

Related articles

AI powered shopping cart protocol illustration with funnel price tag alert loyalty user tapping toggleInside Google's Universal Commerce Protocol that lets AI agents tap carts, catalogs and loyalty pricing2 min readMinimalist illustration of AI checkout hub with Cart Catalog Identity cards and user tapping settingsGoogle quietly upgrades AI shopping protocol: what Cart, Catalog and Identity Linking change next2 min readMinimalist tablet health UI privacy risk toggle character adjusting shield and prescription funnelGoogle and DocMorris Launch AI Health Companion for Europe - What Changes Next2 min readMinimalist site health dashboard illustration with 404 410 toggle funnel filtering errors into green checksWorried About Endless 404 Reports In Search Console? John Mueller Reveals What They Really Mean3 min read