Researchswe benchcode generationllmbenchmarks

SWE-bench Updates Bash-Only Coding Leaderboard With New Model Rankings

simonwillison.net

|February 19, 2026

8.1

Relevance Score

SWE-bench Updates Bash-Only Coding Leaderboard With New Model Rankings

On 19 February 2026, SWE-bench published a fresh full run of its February 2025 'Bash Only' coding benchmark, evaluating models on 2,294 real-world problems drawn from 12 open-source repositories. Claude Opus 4.5 ranked first, followed by Gemini 3 Flash and MiniMax M2.5; OpenAI's GPT-5.2 placed sixth while GPT-5.3-Codex was absent, and the run used a uniform system prompt for fair comparison.

SWE-bench Updates Bash-Only Coding Leaderboard With New Model Rankings

More AI & Data Science News

Bihar Launches Mobile AI Remote Voting

Astrolab Integrates HPE Computer On Rover

Ethereum Foundation Targets Faster Transactions And Quantum Security

India Excludes Bill Gates From AI Summit

Scoring Rationale

Sources