A benchmark for evaluating AI clinical reasoning using expert-curated cases from The New England Journal of Medicine.
An AI expert discussant that provides comprehensive differential diagnoses in written and video form.
CPC-Bench is a benchmark for evaluating the clinical reasoning capabilities of AI models using expert-curated cases licensed from The New England Journal of Medicine. The benchmark consists of 10 distinct tasks that test various aspects of clinical reasoning, from differential diagnosis to medical image interpretation.
Dr. CaBot is an AI that provides comprehensive differential diagnoses in the style of an expert discussant. Dr. CaBot can produce both written and slide-based video presentations. The model searches the clinical literature and similar cases to produce an evidence-based response.
CPC-Bench is a benchmark for evaluating the clinical reasoning capabilities of AI models using expert-curated cases licensed from The New England Journal of Medicine. The benchmark consists of 10 distinct tasks that test various aspects of clinical reasoning, from differential diagnosis to medical image interpretation.
Click on any task box below for detailed information and examples.
Generate ranked list of potential diagnoses given clinical presentation.
Recommend appropriate diagnostic tests and procedures.
Search for literature to support a medical claim.
Diagnosis during key moments of clinical course.
Clinical concept understanding and factual knowledge.
Identify the confirmatory and disconfirmatory evidence for each diagnosis.
Provide a differential diagnosis using only background information and initial presentation.
Multiple-choice questions from NEJM Image Challenge.
Multiple-choice medical imaging questions constructed from the figures and captions in CPCs.
Provide a differential diagnosis provided only the figures and tables from the initial presentation.
| Rank | Model | DDx@1 | DDx@10 | Testing Plan | Literature Search | QA | VQA | Visual Differential Diagnosis | Image Challenge | Image Challenge Diagnosis | Average |
|---|