Intro

In February 2026, Nature Medicine published a structured evaluation of ChatGPT Health’s triage recommendations.

The study reported 52% under-triage among gold-standard emergency cases in structured vignette testing.

Evidence quality

Rating: Moderate (stress test) • Low (real-world outcomes)

Strengths

Peer-reviewed publication
Structured factorial testing
Focused on escalation safety

Limitations

Vignette-based, not real-world
Gold-standard assignment dependent on study definitions

Human Comparison

Published Human ED Triage Benchmarks

Under-triage: often reported in the low single digits to low-teens depending on definition and proxy outcomes
Over-triage: commonly higher, reflecting an intentionally conservative bias

Human triage systems are designed to tolerate inefficiency (over-triage) to reduce catastrophic misses (under-triage).

Telephone / Remote Triage

Misclassification can be meaningfully higher when vitals and examination are unavailable
Safety depends heavily on escalation rules and “red flag” capture

AI Triage (Nature Medicine 2026 Stress Test)

52% under-triage among gold-standard emergency cases in structured vignette testing

Sharpened takeaway: methodologies differ, but the magnitude gap is substantial and suggests escalation threshold calibration remains a central safety issue for consumer AI triage.

Sources

FAQ

Q1: Does 52% reflect real-world harm?
A: No. It reflects structured stress testing results.

Q2: Was this a clinical trial?
A: No.

Q3: Why is under-triage critical?
A: It may delay life-saving treatment.

Q4: Can thresholds be improved?
A: Likely yes, with rule-based escalation.

Q5: Is AI triage unsafe?
A: It shows calibration risks but not proven real-world harm.

/guides/ai-in-health
/guides/automation-bias-in-clinical-practice

AI Triage Safety: What the 2026 ChatGPT Health Study Found

Intro

Evidence quality

Strengths

Limitations

Human Comparison

Human vs AI Emergency Under-Triage: Contextual Comparison

Published Human ED Triage Benchmarks

Telephone / Remote Triage

AI Triage (Nature Medicine 2026 Stress Test)

Sources

FAQ

Intro

Evidence quality

Strengths

Limitations

Human Comparison

Human vs AI Emergency Under-Triage: Contextual Comparison

Published Human ED Triage Benchmarks

Telephone / Remote Triage

AI Triage (Nature Medicine 2026 Stress Test)

Sources

FAQ

Related Guides