Operator note

Claude Opus 4.1 tops coding benchmarks - the unseen tweaks powering Anthropic's upgrade

Anthropic's new drop-in model scores 74.5% on SWE-bench and boosts 32k-token outputs. Discover how its 'thinking budget' dial changes dev costs.

Developer avatar turning a brain-shaped dial with code panels and boost indicator in minimalist tech illustration

Anthropic released Claude Opus 4.1 on 30 May 2024 for Claude Pro and Claude Code subscribers, as well as developers using the API, Amazon Bedrock, or Google Cloud Vertex AI. Company benchmarks indicate higher coding accuracy, stronger agent performance, and improved safety metrics, all delivered without changes to pricing or parameters.

Performance gains in coding and agent tasks

The new model scores 74.5 percent on the SWE-bench Verified benchmark—a clear jump over Opus 4 on real-world debugging. Anthropic notes better multi-file refactoring and faster identification of fixes in large codebases. Early testers at Rakuten and Windsurf confirm noticeable speed and precision gains.

  • Generates up to 32 000 tokens per response for extended content.
  • Offers adjustable “thinking budgets” via the API to balance cost and reasoning depth.
  • Ranks near the top of TAU-bench for long-horizon agent workflows.
  • Harmlessness refusals rose to 98.76 percent, while benign over-refusal held at 0.08 percent.
  • No regressions detected in bias, discrimination, or child-safety checks.

Continuity with prior safety framework

Opus 4.1 is a drop-in replacement for Opus 4 and retains Anthropic’s AI Safety Level 3 controls. Earlier releases, such as Claude Sonnet 3.7 and Sonnet 4, laid the foundation for today’s reasoning and coding capabilities. Anthropic positions 4.1 as a stability-focused release ahead of larger upgrades planned for later in the year.

Recent timeline

  • March 2024: Claude 3 family introduced larger context windows and multimedia input.
  • October 2023: Anthropic adopted AI Safety Levels, mapping each model to specific risk thresholds.
  • SWE-bench and TAU-bench established as internal standards for coding and agent evaluation.
  • Claude Pro pricing and API parameters remain unchanged following the 4.1 rollout.

Further reading

For complete benchmark data and API details, see Anthropic’s Claude Opus 4.1 documentation. Additional coverage is available in Search Engine Journal’s report by Matt G. Southern (31 May 2024).

Keep reading

Related articles

AI powered shopping cart protocol illustration with funnel price tag alert loyalty user tapping toggleInside Google's Universal Commerce Protocol that lets AI agents tap carts, catalogs and loyalty pricing2 min readMinimalist illustration of AI checkout hub with Cart Catalog Identity cards and user tapping settingsGoogle quietly upgrades AI shopping protocol: what Cart, Catalog and Identity Linking change next2 min readMinimalist tablet health UI privacy risk toggle character adjusting shield and prescription funnelGoogle and DocMorris Launch AI Health Companion for Europe - What Changes Next2 min readMinimalist site health dashboard illustration with 404 410 toggle funnel filtering errors into green checksWorried About Endless 404 Reports In Search Console? John Mueller Reveals What They Really Mean3 min read