CPC-Bench: Benchmark & Dr. CaBot, the AI Expert Discussant

CPC-Bench

A benchmark for evaluating AI clinical reasoning using expert-curated cases from The New England Journal of Medicine.

Dr. CaBot

An AI expert discussant that provides comprehensive differential diagnoses in written and video form.

Try Dr. CaBot

CPC-Bench Overview

CPC-Bench is a benchmark for evaluating the clinical reasoning capabilities of AI models using expert-curated cases licensed from The New England Journal of Medicine. The benchmark consists of 10 distinct tasks that test various aspects of clinical reasoning, from differential diagnosis to medical image interpretation.

7,102

Clinical Cases

100+

Years of CPCs

Benchmark Tasks

4,009

Unique Diagnoses

Dr. CaBot Overview

Dr. CaBot is an AI that provides comprehensive differential diagnoses in the style of an expert discussant. Dr. CaBot can produce both written and slide-based video presentations. The model searches the clinical literature and similar cases to produce an evidence-based response.

Clinical Literature Retrieval: Dr. CaBot searches through 1.6M+ clinical abstracts from leading clinical journals to generate its response.

Style Adaptation: The model is provided the two most similar case presentations from the chosen era. These help the model simulate the style of an expert discussant.

Overview

7,102

Clinical Cases

100+

Years of CPCs

Benchmark Tasks

4,009

Unique Diagnoses

Benchmark Tasks

Click on any task box below for detailed information and examples.

Leaderboard

Model Performance

Updated: Loading...

Rank	Model	DDx@1	DDx@10	DDx@1 (Omit Normal)	Testing Plan	Literature Search	QA	VQA	Visual Differential Diagnosis	Image Challenge	Image Challenge Diagnosis	Average

CPC-Bench

Dr. CaBot

CPC-Bench Overview

Dr. CaBot Overview

Overview

Benchmark Tasks

Text-based Challenges

Differential Diagnosis (DDx)

Testing Plan

Literature Search

Diagnostic TouchpointsPhysician Annotations

Question Answering (QA)

Clinical ReasoningPhysician Annotations

Information OmissionPhysician Annotations

Multimodal Challenges

NEJM Image Challenge

Visual Question Answering (VQA)

Visual Differential Diagnosis

Leaderboard

Model Performance

Performance Analysis

Text-based Tasks

Multimodal Tasks