Comprehensive Summary
Chen et al. primarily focused on 3 objectives in this study: developing a model able to accurately classify delirium using video of a patient, comparing the effects of different forms of training for classification in different models, and demonstrating the clinical value and feasibility of these methods in comparison to other methods. The study included 129 total videos of children ages 14 and under from Fujian Children’s Hospital and Shanghai Children’s Medical Center. These videos were separately classified by 2 experts using the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), considered the golden standard in diagnoses of delirium. Any disagreements between classification from both experts resulted in the video’s removal from the study’s sample. ResNet and transformer-based models output a probability of delirium being present and not present with a threshold 0.5 (greater than 0.5 being positive for delirium). The model “r2+” was used as the backbone model of the study due to its use of less overall processing power, accuracy (87.18% correct classification), ROC-AUC (second best overall at 0.913, with 1 indicating no false positives), and high F1 score (0.8715, with 1 meaning perfect precision and recall). Researchers also trained versions of the “r2+” model differently, those being with frozen pretrained weights (model cannot learn with training), without pretrained initialization (not pre-trained on ImageNet), and those with pretrained initialization (pre-trained on ImageNEt). It was found that the initially pretrained model had the highest average accuracy and ROC-AUC value. Chen et al. also tested the processing times of models with differing lengths of video duration. It was found that videos fed into the model with an “n” value of 100 (longest video duration) had a higher accuracy of 0.8717, compared to those fed with an “n” value of 1 (shortest video duration) had a lower accuracy of 0.7949. However, real-world application of the model had an F1 of 0.5385 indicating the need for further improvements. Clinicians were surveyed and most cited the application’s acceptable accuracy, usability, and diagnostic efficiency. Limitations included the limited data set and privacy concerns in using videos of patients' faces. In the future, Chen et. al noted that enhanced performance runtime and interpretability, larger real-world data sets, privacy protection for patients, and possible prediction of delirium severity would aid in further research and application of this technology.
Outcomes and Implications
Delirium is characterized by fluctuating consciousness and brain functions, and affects a large percentage of critically ill children. It is associated with longer hospital stays, increased mortality, and symptoms that can last for weeks to months. Standardized diagnostic tools that are resource-friendly and time efficient are needed for clinical settings. Use of videos in identifying delirium are less invasive for patients and can provide evidence to start early intervention and therapeutic treatments to limit long-term effects, especially in children. While further research, testing, and approval are needed before implementing models like this, the use of video-based machine learning to diagnose delirium efficiently is very promising.