Mercor Finds AI Agents Fail Consulting Tasks
Mercor published the APEX-Agents benchmark showing leading AI agents completed under 25% of real-world consulting, banking, and legal tasks on the first try and only about 40% after eight attempts; OpenAI's GPT-5.2 initially completed roughly 23% while Anthropic's Opus 4.6 reached nearly 33%. The study found agents perform well at research and single-tool data analysis but fail on long-horizon, multi-step planning and cross-file coordination, and Mercor CEO Brendan Foody says rapid model improvement could displace some consulting roles soon.
Scoring Rationale
Moderate novelty and practical relevance, limited by a single-company benchmark and lack of peer-reviewed validation.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.
Sources
- Read OriginalAI agents failed at real-world consulting tasks — but Mercor's CEO says they're still on track to replace consultantsbusinessinsider.com
