Anthropic released Claude Opus 4.1 on 30 May 2024 for Claude Pro and Claude Code subscribers, as well as developers using the API, Amazon Bedrock, or Google Cloud Vertex AI. Company benchmarks indicate higher coding accuracy, stronger agent performance, and improved safety metrics, all delivered without changes to pricing or parameters.
Performance gains in coding and agent tasks
The new model scores 74.5 percent on the SWE-bench Verified benchmark—a clear jump over Opus 4 on real-world debugging. Anthropic notes better multi-file refactoring and faster identification of fixes in large codebases. Early testers at Rakuten and Windsurf confirm noticeable speed and precision gains.
- Generates up to 32 000 tokens per response for extended content.
- Offers adjustable “thinking budgets” via the API to balance cost and reasoning depth.
- Ranks near the top of TAU-bench for long-horizon agent workflows.
- Harmlessness refusals rose to 98.76 percent, while benign over-refusal held at 0.08 percent.
- No regressions detected in bias, discrimination, or child-safety checks.
Continuity with prior safety framework
Opus 4.1 is a drop-in replacement for Opus 4 and retains Anthropic’s AI Safety Level 3 controls. Earlier releases, such as Claude Sonnet 3.7 and Sonnet 4, laid the foundation for today’s reasoning and coding capabilities. Anthropic positions 4.1 as a stability-focused release ahead of larger upgrades planned for later in the year.
Recent timeline
- March 2024: Claude 3 family introduced larger context windows and multimedia input.
- October 2023: Anthropic adopted AI Safety Levels, mapping each model to specific risk thresholds.
- SWE-bench and TAU-bench established as internal standards for coding and agent evaluation.
- Claude Pro pricing and API parameters remain unchanged following the 4.1 rollout.
Further reading
For complete benchmark data and API details, see Anthropic’s Claude Opus 4.1 documentation. Additional coverage is available in Search Engine Journal’s report by Matt G. Southern (31 May 2024).