Comprehensive Summary
This study, presented by Thotapalli et al., examines the accuracy with which GPT-4 large language models (LLMs) can triage young adults experiencing psychiatric emergencies. Twenty-two psychiatric emergency vignettes were created by a research team and were then given to two nurse practitioners as well as to three GPT-4 models. The nurse practitioners and GPT-4 models evaluated the vignettes for need of admission, urgency and risk. An expert committee then reviewed the responses and assigned them scores based on accuracy, clarity and completeness. The GPT-4 models were found to perform well across each of these criteria, with GPT-4o having the highest overall score. The models were also found to have substantial agreement both with clinicians and among each other. Regarding admission, GPT-4 models had several false positives and no false negatives when compared to clinicians, suggesting a tendency to over-admit patients. Overall, GPT-4 models appear reliable for the majority of cases but may struggle in edge cases where contextual factors are important.
Outcomes and Implications
Young adults are currently experiencing a mental health crisis, and services such as mental health emergency hotlines that act as prehospital triage centers are often understaffed and unable to give quality care to all patients. The use of AI tools could streamline the process of triage, allowing for more patients to be treated in a timely manner. This study shows promise in the use of AI tools as assistants but also has significant limitations. Real patient data and scenarios were not incorporated into the study, and vignettes are not representative of actual interactions between clinicians and patients. The capacity of GPT models to provide real-time feedback was also not tested, and no other LLM models were examined.