Researchmultimodal llmcystoscopyopenai o3in context learning
Multimodal LLMs Evaluate Cystoscopy Image Interpretation
7.2
Relevance Score
A 2026 study evaluates four multimodal LLMs (OpenAI-o3, ChatGPT-4o, Gemini 2.5 Pro, MedGemma-27B) on clinician-defined cystoscopy stress-test datasets (401-image free-text task; 113-image 7-class classification). OpenAI-o3 showed best overall balance with 88.3% lesion detection accuracy, 92% sensitivity, 73.1% specificity, and biopsy-classification accuracy 73.5%. Authors conclude MM-LLMs offer assistive, interpretable outputs but require further optimization before clinical deployment.


