Comprehensive Summary
This systematic review, presented by Yulianti et al., assesses the performance of Natural Language Processing (NLP) models at detecting mental health disorders in clinical and community settings. Seven databases were searched for studies involving NLP-based diagnostic, triage or prediction tools that were compared to traditional diagnostic methods. After screening and evaluation, 17 studies met the inclusion criteria. Data extraction was conducted independently by two reviewers using a standardised template, and a three-domain rubric was developed to assess deployability performance. This rubric assessed accuracy, efficiency and user satisfaction across classification, regression and qualitative studies, and was based on WHO (World Health Organization) guidance on AI. Bias in AI-based diagnostics was also assessed. Most studies included were conducted in middle- to high-income regions and focused on depression. The majority of studies indicated good performance, with 4 having mixed performance and 1 exhibiting lower performance. Mixed outcomes indicate inconsistency and the need for further refinement of AI tools. LLMs (Large Language Models) were found to perform better than other AI models, but non-English models underperformed, indicating a need to train AI tools with diverse datasets that consider cultural differences.
Outcomes and Implications
NLP-based tools show significant promise and could improve the efficiency and accuracy of mental health diagnosis. Their accessibility and ability to work with large amounts of unstructured data make them especially suitable for clinical application. However, this systematic review has several limitations. Databases that contain relevant studies may have been missed; the included studies relied on cross-sectional and non-randomized designs that may compromise internal validity; some studies used small or culturally narrow samples that could introduce bias; and studies included varied performance metrics, hindering direct assessment of AI performance. AI models may also have issues with explainability and implementation, and models that are more culturally adaptive and refined need to be developed.