Google Cloud unveiled MLE-STAR in Mountain View on 1 August 2025 - a machine-learning engineering agent that automates model building for tasks ranging from tabular classification to image denoising. The tool is already topping public benchmarks and promises shorter development cycles for data-science teams.
MLE-STAR Launch
The agent searches the internet via Google’s web search grounding API, assembles starter Python code, then refines individual pipeline blocks through automated ablation studies. It generates multiple candidate models before producing a single ensemble plan that it keeps improving.
Key details
- Developers: Jinsung Yoon and Jaehyun Nam, Google Cloud
- Evaluation set: 11 Kaggle contests in the MLE-Bench-Lite suite
- Medal rate: 63.6 percent, up from the 25.8 percent baseline
- Gold medals: 36 percent of contests
- Added modules: automated debugger, data-leakage checker, data-usage checker
- Code release: open-source codebase built on the broader Agent Development Kit
How the Agent Works
Traditional ML-engineering agents often rely on preset libraries such as scikit-learn and swap out entire pipelines at once, limiting nuanced exploration. MLE-STAR ranks each code block by importance, isolates the most impactful component, and tries focused variations before moving on. Its built-in checker prevents test-set leakage and confirms that every permitted data source is used.
The design builds on recent investigations into large language models as autonomous ML engineers while addressing shortcomings highlighted in these agents.
Performance Analysis
On the 11 Kaggle challenges, MLE-STAR earned medals in 63.6 percent of cases and took gold in 36 percent. Logs show the agent favors up-to-date architectures such as EfficientNet, ViT, and RealMLP, whereas competitors like AIDE often default to earlier models such as ResNet.