Comprehensive Summary
Researchers in this study examined whether natural language processing (NLP) can be applied to electronic patient records (EPRs) containing free-text clinical notes to objectively identify high and low suicide risk among the veteran patient population as a method to reduce reliance on subjective clinician judgment. Researchers extracted 20,342 EPRs from a UK veteran mental health organization and split them into 70% for training (training a machine learning classification model to categorize risk as a binary outcome; 0 indicating low risk, and 1 indicating high risk) and 30% for testing. Out of the six machine learning classifiers, Logistic Regression, trained on term frequency-inverse document frequency (TF-IDF) features derived from manually annotated risk-related terms, emerged as the best performing classifier following cross-validation. The final model demonstrated the feasibility of classifying risk with a mean accuracy of 73%, an F1 score of 0.741 (74%), sensitivity of 75%, and negative predictive value of 73%. The analysis revealed that high-risk records were significantly longer and contained more stop words than low-risk records, suggesting that clinicians provide more detailed documentation when risk is high. The terms "suicidal thoughts", "self-harm", and "hopeless", "impulsive", "previous attempts", and "risk to self" were the most predictive of high risk, whereas terms such as "supportive family", "engaged in treatment", and "stable" were associated with low-risk classification. Researchers emphasized that the tool should support and not replace clinical risk assessment, and highlighted that the use of a simple Logistic Regression model prioritized interpretability and feasibility for deployment in resource-constrained environments typical of third-sector organizations or NHS, where advanced infrastructure and resources required for complex models may not be available.
Outcomes and Implications
Clinical risk assessment still remains inherently subjective and difficult, particularly in complex veteran populations where current methods often yield a high rate of false-positives. The developed tool demonstrates how underused EPRs can be utilized to improve the effectiveness of risk evaluation, allowing NLP machine learning algorithms to classify and find patterns in free-text notes that may be overlooked by individual clinicians; NLP can also detect suicide risk patterns that are not always caught during interviews, highlighting a scalable opportunity to support clinicians. The most predictive key words like "suicidal thoughts" and "hopeless", also align closely with established clinical reasoning to classify a patient as high risk, and in practice, tools could flag patients whose documentation contains high-risk language, help triage procedures for resource-constrained areas, and help identify high risk patient populations (such as veterans) who may underreport risk. Since a low-complexity model was chosen specifically for real-world deployment within the computational limits of NHS and veteran services, the path to clinical implementation is considered practical. Although current methods of the model are insufficient for clinical decision making alone, it can be used to augment clinical decision making and identifies a pathway toward future deployment in veteran services and potentially clinical implementation, strengthening suicide-prevention efforts.