CPC-Bench is a benchmark for evaluating the clinical reasoning capabilities of AI models using expert-curated cases licensed from The New England Journal of Medicine. The benchmark consists of 10 distinct tasks that test various aspects of clinical reasoning, from differential diagnosis to medical image interpretation.
Click on any task box below for detailed information and examples.
Generate ranked list of potential diagnoses given clinical presentation.
Recommend appropriate diagnostic tests and procedures.
Search for literature to support a medical claim.
Diagnosis during key moments of clinical course.
Clinical concept understanding and factual knowledge.
Identify the confirmatory and disconfirmatory evidence for each diagnosis.
Provide a differential diagnosis using only background information and initial presentation.
Multiple-choice questions from NEJM Image Challenge.
Multiple-choice medical imaging questions constructed from the figures and captions in CPCs.
Provide a differential diagnosis provided only the figures and tables from the initial presentation.
Rank | Model | DDx@1 | DDx@10 | DDx@1 (Omit Normal) | Testing Plan | Literature Search | QA | VQA | Visual Differential Diagnosis | Image Challenge | Image Challenge Diagnosis | Average |
---|