Claude Opus 4.1 tops coding benchmarks - the unseen tweaks powering Anthropic's upgrade

Anthropic released Claude Opus 4.1 on 30 May 2024 for Claude Pro and Claude Code subscribers, as well as developers using the API, Amazon Bedrock, or Google Cloud Vertex AI. Company benchmarks indicate higher coding accuracy, stronger agent performance, and improved safety metrics, all delivered without changes to pricing or parameters.

Performance gains in coding and agent tasks

The new model scores 74.5 percent on the SWE-bench Verified benchmark—a clear jump over Opus 4 on real-world debugging. Anthropic notes better multi-file refactoring and faster identification of fixes in large codebases. Early testers at Rakuten and Windsurf confirm noticeable speed and precision gains.

Generates up to 32 000 tokens per response for extended content.
Offers adjustable “thinking budgets” via the API to balance cost and reasoning depth.
Ranks near the top of TAU-bench for long-horizon agent workflows.
Harmlessness refusals rose to 98.76 percent, while benign over-refusal held at 0.08 percent.
No regressions detected in bias, discrimination, or child-safety checks.

Continuity with prior safety framework

Opus 4.1 is a drop-in replacement for Opus 4 and retains Anthropic’s AI Safety Level 3 controls. Earlier releases, such as Claude Sonnet 3.7 and Sonnet 4, laid the foundation for today’s reasoning and coding capabilities. Anthropic positions 4.1 as a stability-focused release ahead of larger upgrades planned for later in the year.

Recent timeline

March 2024: Claude 3 family introduced larger context windows and multimedia input.
October 2023: Anthropic adopted AI Safety Levels, mapping each model to specific risk thresholds.
SWE-bench and TAU-bench established as internal standards for coding and agent evaluation.
Claude Pro pricing and API parameters remain unchanged following the 4.1 rollout.

Claude Opus 4.1 tops coding benchmarks - the unseen tweaks powering Anthropic's upgrade

Performance gains in coding and agent tasks

Continuity with prior safety framework

Recent timeline

Further reading

Related articles