Comprehensive Summary
The study evaluates the potential role of GPT-4 (March 2024 version) in supporting clinical decision-making for patients with mild traumatic brain injury (TBI). Seventeen mild TBI reports were drawn from PubMed Central, and GPT-4 was prompted to provide resources which were evaluated by four board-certified emergency physicians in Türkiye (5-10 years post-board). Evaluations covered clarity, scientific adequacy and overall satisfaction using a 7-point Likert scale and flagged critical errors that could impact patient safety. Critical errors were found in 29.4% of cases and associated with significantly lower ratings of adequacy and satisfaction. While the clarity of the responses was similar regardless of errors, GPT-4’s answers were consistently more difficult to read than case descriptions.
Outcomes and Implications
The findings presented in the study highlight both the promise and the limitations of large language models like GPT-4 in acute care. GPT-4 produced scientific and comprehensible responses in many cases, suggesting it could supplement clinicians while managing mild TBI. However, the presence of critical errors such as missed recognition of stroke in a pediatric case, underscores the risk of relying on the models without physician oversight. Readability challenges further reduce the practicality of using these models in fast-paced emergency settings. GPT-4 may serve as a supportive tool for education, but its role in direct bedside decision-making remains limited until further refinement and validation in real-world clinical cases.