Overview

Bulfinch Building

CPC-Bench is a benchmark for evaluating the clinical reasoning capabilities of AI models using expert-curated cases licensed from The New England Journal of Medicine. The benchmark consists of 10 distinct tasks that test various aspects of clinical reasoning, from differential diagnosis to medical image interpretation.

7,102
Clinical Cases
100+
Years of CPCs
10
Benchmark Tasks
4,503
Unique Diagnoses

Benchmark Tasks

Click on any task box below for detailed information and examples.

Leaderboard

Model Performance
Updated: Loading...
Rank Model DDx@1 DDx@10 DDx@1 (Omit Normal) Testing Plan Literature Search QA VQA Visual Differential Diagnosis Image Challenge Image Challenge Diagnosis Average

Performance Analysis

Text-based Tasks
Multimodal Tasks