AI Triage Safety: What the 2026 ChatGPT Health Study Found

A structured review of the 2026 Nature Medicine evaluation of ChatGPT Health triage recommendations.

Intro

In February 2026, Nature Medicine published a structured evaluation of ChatGPT Health’s triage recommendations.

The study reported 52% under-triage among gold-standard emergency cases in structured vignette testing.

Evidence quality

Rating: Moderate (stress test) • Low (real-world outcomes)

Strengths

  • Peer-reviewed publication
  • Structured factorial testing
  • Focused on escalation safety

Limitations

  • Vignette-based, not real-world
  • Gold-standard assignment dependent on study definitions

Human Comparison

Human vs AI Emergency Under-Triage: Contextual Comparison

A quick benchmark view. Definitions vary across studies; treat these as context, not a head-to-head.

Published Human ED Triage Benchmarks

  • Under-triage: often reported in the low single digits to low-teens depending on definition and proxy outcomes
  • Over-triage: commonly higher, reflecting an intentionally conservative bias

Human triage systems are designed to tolerate inefficiency (over-triage) to reduce catastrophic misses (under-triage).

Telephone / Remote Triage

  • Misclassification can be meaningfully higher when vitals and examination are unavailable
  • Safety depends heavily on escalation rules and “red flag” capture

AI Triage (Nature Medicine 2026 Stress Test)

  • 52% under-triage among gold-standard emergency cases in structured vignette testing

Sharpened takeaway: methodologies differ, but the magnitude gap is substantial and suggests escalation threshold calibration remains a central safety issue for consumer AI triage.

FAQ

Q1: Does 52% reflect real-world harm?
A: No. It reflects structured stress testing results.

Q2: Was this a clinical trial?
A: No.

Q3: Why is under-triage critical?
A: It may delay life-saving treatment.

Q4: Can thresholds be improved?
A: Likely yes, with rule-based escalation.

Q5: Is AI triage unsafe?
A: It shows calibration risks but not proven real-world harm.


  • /guides/ai-in-health
  • /guides/automation-bias-in-clinical-practice