Researchllmbreast cancerclinical guidelineschatgpt

LLMs Demonstrate Mixed Performance in Breast Cancer Diagnostics

medinform.jmir.org

|February 2, 2026

7.1

Relevance Score

LLMs Demonstrate Mixed Performance in Breast Cancer Diagnostics

A 2026 JMIR Medical Informatics study evaluated nine large language models, including ChatGPT‑4o and Claude 3 Opus, on 50 breast‑cancer guideline questions, comparing yes/no answers and analyses to radiologists (residents, fellows, attendings). Using 2024 NCCN and 2013 ACR BI‑RADS standards, ChatGPT‑4o and Claude models scored highest and outperformed fellow physicians in some metrics (P<.05), yet could not fully replace clinical expertise.

Scoring Rationale

Rigorous peer‑reviewed evaluation with practical clinician comparisons, but limited question set and scope limit generalizability.

Practice with real Health & Insurance data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active PPO Plans with Rx CoverageEasy

Approved High-Value ClaimsMedium

Denial Rate by Plan TypeHard

250 free problems · No credit card

See all Health & Insurance problems

Free Career Roadmaps8 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

Explore all career paths

Sources

Evaluation of Large Language Models for Radiologists’ Support in Multidisciplinary Breast Cancer Teams: Comparative Study
medinform.jmir.org
Read Original