Google Research announced an AI system for writing and optimizing scientific software on September 9, 2025, in a post by Lizzie Dorfman and Michael Brenner. The release includes a Paper and an Interactive website with tree visualizations showcasing results across six evaluation tasks.
What the system does
The system generates empirical software optimized to maximize a predefined quality score. It accepts a problem description, a scoring metric, and train-validation-test data, then proposes and evaluates thousands of program variants in a sandbox. Under the hood it uses Gemini models, tree search with an upper-confidence-bound strategy inspired by AlphaZero, and automated code rewriting. Outputs are executable code, and full candidate solution trees are viewable online. While large language models can already perform traditional coding tasks, this system targets scorable tasks across genomics, public health, geospatial analysis, neuroscience, mathematics, and time-series forecasting.
Results at a glance
- Genomics: On the V2.0.0 batch integration benchmark for Single-cell RNA sequencing aimed at correcting batch effects, the system discovered 40 new methods. The top solution improved the overall score by 14% over ComBat.
- COVID-19 forecasting: Produced 14 models that outperformed the CovidHub Ensemble model on the COVID-19 Forecast Hub, as measured by weighted interval score.
- Remote sensing segmentation: On the DLRSD dense labeling remote sensing dataset, the top three solutions exceeded 0.80 mean intersection over union, using UNet++ and U-Net and SegFormer.
- Neuroscience: On ZAPBench, results surpassed baselines, including a recent video-based model. Google also reported hybrid models that incorporate the Jaxley neuron simulator.
- Mathematics: Correctly evaluated 17 of 19 difficult integrals where the standard approach failed.
- Time series: Built a general forecasting library optimized on General Time Series Forecasting Model Evaluation, hill-climbing mean absolute scaled error.