Comprehensive Summary
Yan and colleagues performed a systematic review and meta-analysis to evaluate how well machine learning predicts in-hospital mortality among patients admitted with heart failure. Across 28 studies including 106 machine learning models and over 1,900,000 total patient samples, the authors compared performance metrics such as C-index, sensitivity, and specificity. In training datasets, the models achieved a pooled C-index of 0.781 (95% CI 0.766–0.796) with sensitivity 0.56 and specificity 0.94. Logistic regression was the most common method and reached a C-index of 0.795 in training and 0.751 in validation. Tree-based methods such as XGBoost and random forests consistently showed the strongest discrimination, with XGBoost reaching a training C-index of 0.831 and validation 0.809. However, sensitivity for identifying death remained low across all methods. Across models, the most common predictors included age, blood urea nitrogen, heart rate, sodium, and creatinine.
Outcomes and Implications
These findings illustrate that machine learning morality tools for heart failure can have strong performance, but remain limited in identifying patients high-risk for death. The consistently low sensitivity means many high-risk patients may go unrecognized, which limits the AI’s readiness for important care decisions. Before adoption, future models will need transparent reporting, and external validation across diverse health systems. With these improvements, such prediction models can support more accurate and reliable risk stratification among hospitalized heart failure patients