AI Coding Tools Made Developers 19% Slower: METR Study

A gold-standard scientific study found AI coding tools made experienced developers 19% slower -- while they believed they were 24% faster. The perception gap changes everything.

By LDS Team

February 20, 2026

Every week, another CEO gets on an earnings call and says some version of the same thing: AI is making our developers dramatically more productive. Spotify's Co-CEO said his best engineers haven't written a single line of code since December. Google's CEO said more than 25% of new code at Google is AI-generated. Shopify's CEO told employees to prove AI cannot do a task before they are allowed to hire a human.

The message from the top is clear. AI is the future of coding. If you are not using it, you are falling behind.

There is just one problem. The most rigorous study ever conducted on AI coding productivity found the exact opposite. AI made experienced developers slower. And the developers themselves had no idea it was happening -- they were convinced it was making them faster.

The Study That Surprised Everyone

In July 2025, a nonprofit called METR published a study with a title that undersold its findings: "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity."

This was not a survey. Not a self-reported productivity estimate. Not a blog post. It was a randomized controlled trial -- the same methodology used in clinical drug trials, and the gold standard of scientific evidence.

Here is what they did:

Study Detail	Description
Design	Randomized controlled trial (RCT)
Participants	16 experienced open-source developers
Repositories	Large-scale projects (averaging 22,000+ stars, 1M+ lines of code)
Developer experience	Multiple years contributing to their specific repos
Total tasks	246 real issues (bug fixes, features, refactors)
AI tools	Cursor Pro with Claude 3.5/3.7 Sonnet (frontier models at the time)
Average task duration	~2 hours each
Compensation	150 USD per hour
Method	Each issue randomly assigned to "AI allowed" or "AI disallowed"

The authors -- Joel Becker, Nate Rush, Elizabeth Barnes, and David Rein -- expected to find that AI made developers faster. They said so explicitly in the paper: "We initially were broadly expecting to see positive speedup."

They found the opposite.

When developers were allowed to use AI tools, they took 19% longer to complete their tasks.

The Perception Gap Changes Everything

The slowdown is striking. But it is not the most important finding.

This is:

Metric	Value
Actual impact of AI	Developers were 19% slower
What developers expected beforehand	AI would make them 24% faster
What developers believed afterward	AI had made them 20% faster
Real gap between perception and reality	~40 percentage points

Read that again. Even after completing their tasks -- even after experiencing the slowdown firsthand -- developers still estimated that AI had sped them up. The gap between what they felt and what actually happened was enormous.

METR called this a "substantial and persistent gap between perceived and actual performance."

The study also ruled out the obvious objections:

"Were the developers AI beginners?" No. Nearly all had dozens to hundreds of hours of prior experience with LLMs.
"Did AI produce worse code?" No. The quality of submitted code was the same with and without AI.
"Did developers game the system?" No. They did not differentially drop harder tasks or shift behavior.
"Is it a statistical fluke?" No. The slowdown persisted across different statistical methodologies, task types, and data subsets.

Worth noting: METR is a nonprofit funded by donations. They have no financial stake in whether AI works or not. They designed the study to measure AI acceleration because they research AI R&D risks -- and they expected to find positive speedup.

Why AI Made Experts Slower

METR investigated 20 potential factors that might explain the slowdown. They found evidence that five likely contributed. The full factor analysis is in their paper, but the key themes are clear:

Factor	Why It Slows Experts Down
Deep codebase familiarity	These developers had years of experience in repos with 1M+ lines of code. They carry enormous implicit knowledge -- conventions, patterns, undocumented requirements -- that AI simply does not have.
High quality standards	These were real PRs that needed to pass code review, including style, testing, documentation, and linting. AI-generated code required significant cleanup to meet these standards.
The prompting tax	Time spent writing prompts, reading outputs, evaluating suggestions, and course-correcting when AI went wrong often exceeded the time it would have taken to just write the code.
Context window limits	AI tools could not absorb the full context of massive codebases. Developers spent time breaking problems down for the AI that they would not have needed to break down for themselves.
False confidence	AI outputs look polished and plausible. Developers may have accepted suboptimal suggestions that required rework, rather than writing correct code from scratch.

The pattern is consistent: the more an expert knows their codebase, the less AI has to offer them -- and the more time they waste trying to make it useful.

Worth noting: METR acknowledged that Cursor "does not sample many tokens from LLMs" and "may not use optimal prompting/scaffolding." Better tooling or repository-specific fine-tuning could potentially yield positive results. This is a snapshot, not a final verdict.

What the CEOs Are Saying

While METR was publishing controlled experiments, the corporate world was telling a very different story. Here is how the narrative has unfolded:

Oct 2024

Sundar Pichai (Google)

On Google's Q3 2024 earnings call, Pichai reveals that more than 25% of all new code at Google is AI-generated. Engineers still review and accept the code, but the sheer volume signals a major shift.

Feb 2025

Andrej Karpathy Coins "Vibe Coding"

The former Tesla AI director and OpenAI co-founder describes a new way of programming: describe what you want, accept the AI output, and "give in to the vibes." The term goes viral.

Apr 2025

Tobi Lutke (Shopify)

The Shopify CEO circulates a memo: teams must prove AI cannot do a task before requesting new headcount. AI is the default. Humans are the exception.

Jul 2025

METR Publishes the RCT

The first randomized controlled trial on AI coding productivity drops. Result: 19% slower. Developers' belief: 24% faster. The gap between corporate narrative and scientific evidence has never been wider.

Jan 2026

Anthropic Publishes a Second Study

Researchers at Anthropic -- the company behind Claude -- find AI assistance impairs conceptual understanding, code reading, and debugging. No significant efficiency gains on average.

Feb 2026

Gustav Soderstrom (Spotify)

Spotify's Co-CEO tells analysts his best developers have not written a single line of code since December. "They only generate code and supervise it." The quote goes viral.

The contrast is stark. Controlled experiments finding slowdowns and skill loss on one side. CEOs on earnings calls claiming revolutionary gains on the other. Both cannot be entirely right.

Anthropic's Own Study Made It Worse

In January 2026, a study titled "How AI Impacts Skill Formation" appeared on arXiv. The authors were Judy Hanwen Shen and Alex Tamkin -- and Tamkin works at Anthropic, the company that builds Claude, one of the most popular AI coding assistants in the world.

Their findings:

Area Measured	What They Found
Efficiency gains	No significant gains on average from AI assistance
Conceptual understanding	Impaired by AI use
Code reading ability	Impaired by AI use
Debugging ability	Impaired by AI use
Full delegation to AI	Some productivity boost -- but at the cost of actually learning
Interaction patterns	6 distinct patterns found; only 3 preserved learning

The setup: developers learned a new asynchronous programming library with and without AI help. Those who used AI got through the tasks. But when tested afterward, their understanding of the material was measurably worse. They completed the work without learning how it worked.

This study hit 3,946 upvotes on r/programming with 686 comments. The irony was not lost on anyone: the company selling AI coding tools had published research showing those tools impair the skills developers need most.

Worth noting: The two studies measured different things. METR measured speed on familiar codebases (experts got slower). Anthropic measured learning on unfamiliar ones (novices stopped learning). Together, they suggest AI hurts experts by adding overhead and hurts novices by preventing skill development. Neither group clearly benefits.

The Counter-Arguments Are Real

The pessimistic reading is not the whole story. The critics raise genuinely fair points:

The tools have changed dramatically. METR used early-2025 models (Claude 3.5/3.7 Sonnet via Cursor). Since then, Claude Code with Opus 4.5 launched in December 2025 -- the specific tool Spotify's Soderstrom credited as a breakthrough. Background agents like Spotify's Honk represent a fundamentally different workflow than Cursor's inline suggestions. METR themselves said "near-future AI" could perform differently.

The setting was unusually hard for AI. These were massive, mature open-source projects with strict quality standards. The developers already knew the code intimately. For greenfield projects, unfamiliar codebases, or rapid prototyping, the equation could look entirely different.

Sixteen developers is small. Enough for statistical significance in an RCT, but still a narrow sample. The results may not generalize to every developer or every codebase.

Spotify has production data. This is not just an earnings call quote. Spotify published a three-part engineering blog series documenting Honk, their internal AI agent. The system has merged over 1,500 pull requests across hundreds of repos, with time savings of 60-90% on specific migration tasks. These are measured outcomes, not vibes.

GitHub's CEO disagrees. Thomas Dohmke argues that "the smartest companies will hire more software engineers, not less, as AI develops." His framing: AI expands what is possible, creating more work for humans, not less. That post hit 7,514 upvotes on r/programming.

The Real Paradox

The deepest finding is not that AI makes developers slower. It is that developers cannot tell.

This matters because virtually every corporate AI productivity claim relies on self-reporting. When a CEO says "our developers are dramatically more productive," that assessment comes from what developers themselves report, or from proxy metrics like pull requests merged or lines of code generated -- metrics that do not measure actual productivity.

The METR study showed that self-reports can be off by 40 percentage points. If this perception gap holds broadly, the entire corporate narrative around AI coding could be built on a measurement error that nobody notices -- because it genuinely feels real.

This is not a new phenomenon. Psychology has a name for it. Humans are notoriously bad at estimating their own performance, especially when using tools that feel effortless:

GPS makes people worse at navigation, but driving feels easier.
Spellcheck makes people worse at spelling, but writing feels smoother.
Calculators make people worse at mental math, but arithmetic feels faster.

AI coding tools may follow the same pattern: they make the process feel productive while actually making the work take longer. Or they might genuinely help in different contexts. The honest answer is that the industry does not know yet -- and it is making trillion-dollar decisions based on vibes rather than evidence.

The Bottom Line

The METR study is not the final word on AI coding productivity. It is the first word -- the first time anyone applied the gold standard of scientific evidence to the question. The answer came back negative.

Nineteen percent slower. With frontier models. On tasks the developers knew inside out. And a perception gap so large that nobody in the study realized it was happening.

Anthropic's own research added another layer: AI assistance may impair the very skills developers need to supervise AI-generated code. The tools designed to make developers more productive might be eroding the expertise that makes supervision possible.

None of this means AI coding tools are useless. Spotify's Honk system is real. The tools have improved substantially since METR ran its study. And there are plenty of settings -- prototyping, boilerplate, unfamiliar codebases -- where AI likely does help.

But the next time a CEO gets on an earnings call and declares that AI has revolutionized their engineering team, remember the most important finding from the most rigorous study ever conducted on this question:

The developers thought so too. They were wrong.

Sources

METR: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (Jul 10, 2025)
METR Study Full Paper (arXiv) (Jul 2025)
Reuters: AI slows down some experienced software developers, study finds (Jul 10, 2025)
Shen and Tamkin: How AI Impacts Skill Formation (arXiv) (Jan 2026)
Spotify Q4 2025 Earnings Call Transcript (Feb 10, 2026)
Spotify Engineering Blog: Background Coding Agent Part 1 (Nov 2025)
Reddit r/programming: METR Study Discussion (2,496 upvotes, 613 comments) (Jul 2025)
Reddit r/programming: Anthropic AI Coding Study Discussion (3,946 upvotes, 686 comments) (Jan 2026)
Reddit r/programming: GitHub CEO on Hiring More Engineers (7,514 upvotes) (2026)
Wikipedia: Tobias Lutke -- Shopify AI Memo (Apr 2025)

Developers Thought AI Made Them Faster. The Data Said Otherwise.