Comprehensive Summary
In their study, Lee et al. (2025) evaluated the effectiveness of large language models (LLMs) at accurately assessing emergency department patients using the Korean Triage Acuity Scale (KTAS).. The LLMs tested analyzed 1,057 real world triage conversations from recorded data, and sorted each patient into (KTAS) levels 3-5, with 3 indicating urgent and 5 non-urgent presentations. Each LLM was measured on its accuracy, specificity, sensitivity, and response time in comparison to human nurses. Gemini 2.5 Flash achieved the highest overall accuracy (73.8%) and specificity (88.9%). In contrast, Gemini 2.5 Pro demonstrated low specificity, tending to over-triage patients into higher acuity categories. Response times varied widely: from <1 second to >30 seconds, raising concerns about feasibility in high-volume emergency department workflows.
Outcomes and Implications
Rapid and accurate triage directly impacts patient outcomes and reduces strain on emergency department resources. While systems such as KTAS standardize acuity assessment, they rely on human processing, which in comparison to LLM processing can be slower and take away resources from other patients. Integrating LLMs into triage could free nurses for direct patient care, while ensuring that high-acuity patients are prioritized more rapidly. Although further validation and optimization are required, these early findings suggest LLMs could ultimately improve patient outcomes and alleviate emergency department workload.