Google Ads found a 500-sample shortcut that boosts AI safety 65%

Google Ads engineers have introduced an active learning workflow that reduces the amount of expert-labeled data needed to fine-tune safety models from roughly 100 000 examples to fewer than 500. In internal tests the approach improved alignment with expert decisions by up to 65 percent, allowing policy updates to reach production in days instead of weeks.

Active Learning Training Data Reduction

The pipeline begins with a few-shot model that labels billions of ads as benign or potentially unsafe. Clustering then isolates samples near the decision boundary - the cases the model finds most confusing. These “hard” pairs are sent to policy specialists for dual annotation.

Each pair is labeled twice, and agreement is measured with Cohen’s Kappa. Training and evaluation sets are refreshed every round until model-human agreement plateaus.

Pilot tasks: clickbait detection and complex policy checks.
Models: Gemini Nano-1 (1.8 B) and Gemini Nano-2 (3.25 B).
Baseline: 100 000 crowdsourced labels with 5 percent positives.
Curated: 250-450 expert labels with roughly 40 percent positives.
Nano-2 Kappa rose from .36 to .56 on the simpler task and from .23 to .38 on the harder task.
Overall alignment gains: 55-65 percent using three orders of magnitude less data.
Production models achieved comparable quality with 10 000× less data.

The lower-complexity task plateaued after six rounds (450 fine-tuning and 250 evaluation samples). The harder task stabilized after five rounds (250 and 150 samples).

Baseline crowdsourced labels aligned at .59 Kappa on the simpler set and .41 on the harder set. Expert labels scored .81 and .78, setting the ceiling for model performance. Nano-1 held steady at .25 Kappa, while Nano-2 benefited most from curated data, confirming that larger models gain more from high-signal examples.

When reviewer hours were scarce, the system prioritized clusters by surface area, ensuring broad coverage despite a small labeling budget.

Google Ads Safety Model Background

Google Ads screens billions of creatives each day for scams, adult themes, and misinformation. Ongoing concept drift forces frequent model updates. Large language models offer richer context understanding but traditionally demand costly expert annotations. The new workflow focuses annotation on ambiguous content, sharply reducing that burden.

Cohen’s Kappa was chosen because many safety decisions lack a single ground truth; scores above .80 indicate expert-level agreement.

Engineering was led by Markus Krause, Nancy Chang, and Steve Walker, with project management from Kelsie McElroy of Ads Privacy and Safety.

Source Citations

Findings are detailed in a Google Ads Research blog post dated 7 August 2025 by Markus Krause and Nancy Chang. Model specifications appear in the Gemini 1 Technical Report.

Google Ads found a 500-sample shortcut that boosts AI safety 65%

Active Learning Training Data Reduction

Google Ads Safety Model Background

Source Citations

More articles