A gold-standard scientific study found AI coding tools made experienced developers 19% slower -- while they believed they were 24% faster. The perception gap changes everything.
By LDS Team
February 20, 2026
Every week, another CEO gets on an earnings call and says some version of the same thing: AI is making our developers dramatically more productive. Spotify's Co-CEO said his best engineers haven't written a single line of code since December. Google's CEO said more than 25% of new code at Google is AI-generated. Shopify's CEO told employees to prove AI cannot do a task before they are allowed to hire a human.
The message from the top is clear. AI is the future of coding. If you are not using it, you are falling behind.
There is just one problem. The most rigorous study ever conducted on AI coding productivity found the exact opposite. AI made experienced developers slower. And the developers themselves had no idea it was happening -- they were convinced it was making them faster.
The Study That Surprised Everyone
In July 2025, a nonprofit called METR published a study with a title that undersold its findings: "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity."
This was not a survey. Not a self-reported productivity estimate. Not a blog post. It was a randomized controlled trial -- the same methodology used in clinical drug trials, and the gold standard of scientific evidence.
Here is what they did:
| Study Detail | Description |
|---|---|
| Design | Randomized controlled trial (RCT) |
| Participants | 16 experienced open-source developers |
| Repositories | Large-scale projects (averaging 22,000+ stars, 1M+ lines of code) |
| Developer experience | Multiple years contributing to their specific repos |
| Total tasks | 246 real issues (bug fixes, features, refactors) |
| AI tools | Cursor Pro with Claude 3.5/3.7 Sonnet (frontier models at the time) |
| Average task duration | ~2 hours each |
| Compensation | 150 USD per hour |
| Method | Each issue randomly assigned to "AI allowed" or "AI disallowed" |
The authors -- Joel Becker, Nate Rush, Elizabeth Barnes, and David Rein -- expected to find that AI made developers faster. They said so explicitly in the paper: "We initially were broadly expecting to see positive speedup."
They found the opposite.
When developers were allowed to use AI tools, they took 19% longer to complete their tasks.
The Perception Gap Changes Everything
The slowdown is striking. But it is not the most important finding.
This is:
| Metric | Value |
|---|---|
| Actual impact of AI | Developers were 19% slower |
| What developers expected beforehand | AI would make them 24% faster |
| What developers believed afterward | AI had made them 20% faster |
| Real gap between perception and reality | ~40 percentage points |
Read that again. Even after completing their tasks -- even after experiencing the slowdown firsthand -- developers still estimated that AI had sped them up. The gap between what they felt and what actually happened was enormous.
METR called this a "substantial and persistent gap between perceived and actual performance."
The study also ruled out the obvious objections:
- "Were the developers AI beginners?" No. Nearly all had dozens to hundreds of hours of prior experience with LLMs.
- "Did AI produce worse code?" No. The quality of submitted code was the same with and without AI.
- "Did developers game the system?" No. They did not differentially drop harder tasks or shift behavior.
- "Is it a statistical fluke?" No. The slowdown persisted across different statistical methodologies, task types, and data subsets.
Worth noting: METR is a nonprofit funded by donations. They have no financial stake in whether AI works or not. They designed the study to measure AI acceleration because they research AI R&D risks -- and they expected to find positive speedup.
Why AI Made Experts Slower
METR investigated 20 potential factors that might explain the slowdown. They found evidence that five likely contributed. The full factor analysis is in their paper, but the key themes are clear:
| Factor | Why It Slows Experts Down |
|---|---|
| Deep codebase familiarity | These developers had years of experience in repos with 1M+ lines of code. They carry enormous implicit knowledge -- conventions, patterns, undocumented requirements -- that AI simply does not have. |
| High quality standards | These were real PRs that needed to pass code review, including style, testing, documentation, and linting. AI-generated code required significant cleanup to meet these standards. |
| The prompting tax | Time spent writing prompts, reading outputs, evaluating suggestions, and course-correcting when AI went wrong often exceeded the time it would have taken to just write the code. |
| Context window limits | AI tools could not absorb the full context of massive codebases. Developers spent time breaking problems down for the AI that they would not have needed to break down for themselves. |
| False confidence | AI outputs look polished and plausible. Developers may have accepted suboptimal suggestions that required rework, rather than writing correct code from scratch. |
The pattern is consistent: the more an expert knows their codebase, the less AI has to offer them -- and the more time they waste trying to make it useful.
Worth noting: METR acknowledged that Cursor "does not sample many tokens from LLMs" and "may not use optimal prompting/scaffolding." Better tooling or repository-specific fine-tuning could potentially yield positive results. This is a snapshot, not a final verdict.
What the CEOs Are Saying
While METR was publishing controlled experiments, the corporate world was telling a very different story. Here is how the narrative has unfolded:
The contrast is stark. Controlled experiments finding slowdowns and skill loss on one side. CEOs on earnings calls claiming revolutionary gains on the other. Both cannot be entirely right.
Anthropic's Own Study Made It Worse
In January 2026, a study titled "How AI Impacts Skill Formation" appeared on arXiv. The authors were Judy Hanwen Shen and Alex Tamkin -- and Tamkin works at Anthropic, the company that builds Claude, one of the most popular AI coding assistants in the world.
Their findings:
| Area Measured | What They Found |
|---|---|
| Efficiency gains | No significant gains on average from AI assistance |
| Conceptual understanding | Impaired by AI use |
| Code reading ability | Impaired by AI use |
| Debugging ability | Impaired by AI use |
| Full delegation to AI | Some productivity boost -- but at the cost of actually learning |
| Interaction patterns | 6 distinct patterns found; only 3 preserved learning |
The setup: developers learned a new asynchronous programming library with and without AI help. Those who used AI got through the tasks. But when tested afterward, their understanding of the material was measurably worse. They completed the work without learning how it worked.
This study hit 3,946 upvotes on r/programming with 686 comments. The irony was not lost on anyone: the company selling AI coding tools had published research showing those tools impair the skills developers need most.
Worth noting: The two studies measured different things. METR measured speed on familiar codebases (experts got slower). Anthropic measured learning on unfamiliar ones (novices stopped learning). Together, they suggest AI hurts experts by adding overhead and hurts novices by preventing skill development. Neither group clearly benefits.
The Counter-Arguments Are Real
The pessimistic reading is not the whole story. The critics raise genuinely fair points:
The tools have changed dramatically. METR used early-2025 models (Claude 3.5/3.7 Sonnet via Cursor). Since then, Claude Code with Opus 4.5 launched in December 2025 -- the specific tool Spotify's Soderstrom credited as a breakthrough. Background agents like Spotify's Honk represent a fundamentally different workflow than Cursor's inline suggestions. METR themselves said "near-future AI" could perform differently.
The setting was unusually hard for AI. These were massive, mature open-source projects with strict quality standards. The developers already knew the code intimately. For greenfield projects, unfamiliar codebases, or rapid prototyping, the equation could look entirely different.
Sixteen developers is small. Enough for statistical significance in an RCT, but still a narrow sample. The results may not generalize to every developer or every codebase.
Spotify has production data. This is not just an earnings call quote. Spotify published a three-part engineering blog series documenting Honk, their internal AI agent. The system has merged over 1,500 pull requests across hundreds of repos, with time savings of 60-90% on specific migration tasks. These are measured outcomes, not vibes.
GitHub's CEO disagrees. Thomas Dohmke argues that "the smartest companies will hire more software engineers, not less, as AI develops." His framing: AI expands what is possible, creating more work for humans, not less. That post hit 7,514 upvotes on r/programming.
The Real Paradox
The deepest finding is not that AI makes developers slower. It is that developers cannot tell.
This matters because virtually every corporate AI productivity claim relies on self-reporting. When a CEO says "our developers are dramatically more productive," that assessment comes from what developers themselves report, or from proxy metrics like pull requests merged or lines of code generated -- metrics that do not measure actual productivity.
The METR study showed that self-reports can be off by 40 percentage points. If this perception gap holds broadly, the entire corporate narrative around AI coding could be built on a measurement error that nobody notices -- because it genuinely feels real.
This is not a new phenomenon. Psychology has a name for it. Humans are notoriously bad at estimating their own performance, especially when using tools that feel effortless:
- GPS makes people worse at navigation, but driving feels easier.
- Spellcheck makes people worse at spelling, but writing feels smoother.
- Calculators make people worse at mental math, but arithmetic feels faster.
AI coding tools may follow the same pattern: they make the process feel productive while actually making the work take longer. Or they might genuinely help in different contexts. The honest answer is that the industry does not know yet -- and it is making trillion-dollar decisions based on vibes rather than evidence.
The Bottom Line
The METR study is not the final word on AI coding productivity. It is the first word -- the first time anyone applied the gold standard of scientific evidence to the question. The answer came back negative.
Nineteen percent slower. With frontier models. On tasks the developers knew inside out. And a perception gap so large that nobody in the study realized it was happening.
Anthropic's own research added another layer: AI assistance may impair the very skills developers need to supervise AI-generated code. The tools designed to make developers more productive might be eroding the expertise that makes supervision possible.
None of this means AI coding tools are useless. Spotify's Honk system is real. The tools have improved substantially since METR ran its study. And there are plenty of settings -- prototyping, boilerplate, unfamiliar codebases -- where AI likely does help.
But the next time a CEO gets on an earnings call and declares that AI has revolutionized their engineering team, remember the most important finding from the most rigorous study ever conducted on this question:
The developers thought so too. They were wrong.
Sources
- METR: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (Jul 10, 2025)
- METR Study Full Paper (arXiv) (Jul 2025)
- Reuters: AI slows down some experienced software developers, study finds (Jul 10, 2025)
- Shen and Tamkin: How AI Impacts Skill Formation (arXiv) (Jan 2026)
- Spotify Q4 2025 Earnings Call Transcript (Feb 10, 2026)
- Spotify Engineering Blog: Background Coding Agent Part 1 (Nov 2025)
- Reddit r/programming: METR Study Discussion (2,496 upvotes, 613 comments) (Jul 2025)
- Reddit r/programming: Anthropic AI Coding Study Discussion (3,946 upvotes, 686 comments) (Jan 2026)
- Reddit r/programming: GitHub CEO on Hiring More Engineers (7,514 upvotes) (2026)
- Wikipedia: Tobias Lutke -- Shopify AI Memo (Apr 2025)