Log in to access Dr. CaBot and CPC-Bench
CPC-Bench

CPC-Bench

A benchmark for evaluating AI clinical reasoning using expert-curated cases from The New England Journal of Medicine.

Dr. CaBot

Dr. CaBot

An AI expert discussant that provides comprehensive differential diagnoses in written and video form.

Try Dr. CaBot

CPC-Bench Overview

CPC-Bench

CPC-Bench is a benchmark for evaluating the clinical reasoning capabilities of AI models using expert-curated cases licensed from The New England Journal of Medicine. The benchmark consists of 10 distinct tasks that test various aspects of clinical reasoning, from differential diagnosis to medical image interpretation.

7,102
Clinical Cases
100+
Years of CPCs
10
Benchmark Tasks
4,009
Unique Diagnoses

Dr. CaBot Overview

CaBot AI Expert

Dr. CaBot is an AI that provides comprehensive differential diagnoses in the style of an expert discussant. Dr. CaBot can produce both written and slide-based video presentations. The model searches the clinical literature and similar cases to produce an evidence-based response.

Clinical Literature Retrieval: Dr. CaBot searches through 1.6M+ clinical abstracts from leading clinical journals to generate its response.
Style Adaptation: The model is provided the two most similar case presentations from the chosen era. These help the model simulate the style of an expert discussant.

Overview

CPC-Bench

CPC-Bench is a benchmark for evaluating the clinical reasoning capabilities of AI models using expert-curated cases licensed from The New England Journal of Medicine. The benchmark consists of 10 distinct tasks that test various aspects of clinical reasoning, from differential diagnosis to medical image interpretation.

7,102
Clinical Cases
100+
Years of CPCs
10
Benchmark Tasks
4,009
Unique Diagnoses

Benchmark Tasks

Click on any task box below for detailed information and examples.

Text-based Challenges
Differential Diagnosis (DDx)

Generate ranked list of potential diagnoses given clinical presentation.

Testing Plan

Recommend appropriate diagnostic tests and procedures.

Literature Search

Search for literature to support a medical claim.

Diagnostic TouchpointsPhysician Annotations

Diagnosis during key moments of clinical course.

Question Answering (QA)

Clinical concept understanding and factual knowledge.

Clinical ReasoningPhysician Annotations

Identify the confirmatory and disconfirmatory evidence for each diagnosis.

Information OmissionPhysician Annotations

Provide a differential diagnosis using only background information and initial presentation.

Multimodal Challenges
NEJM Image Challenge

Multiple-choice questions from NEJM Image Challenge.

Visual Question Answering (VQA)

Multiple-choice medical imaging questions constructed from the figures and captions in CPCs.

Visual Differential Diagnosis

Provide a differential diagnosis provided only the figures and tables from the initial presentation.

Leaderboard

Model Performance
Updated: Loading...
Rank Model DDx@1 DDx@10 Testing Plan Literature Search QA VQA Visual Differential Diagnosis Image Challenge Image Challenge Diagnosis Average

Performance Analysis

Text-based Tasks
Multimodal Tasks