Comprehensive Summary
The authors in this paper explore if machine-learning models can predict health-related quality of life (HRQoL) among adult diabetes patients. The authors trained five classifiers (logistic regression, naïve Bayes, random forest, support vector machine, and XGBoost) using 2016–2020 Korea National Health and Nutrition Examination Survey data (n=2,501) after preprocessing (single imputation, class-imbalance handling via SMOTENC, scaling) and stratified 10-fold cross-validation parameter tuning. On held-out test cases, XGBoost was best (accuracy 0.940, recall 0.943, precision 0.940, F1 0.942, AUC 0.984). SHAP analysis revealed significant predictors of poor HRQoL: self-assessed health, work status, triglycerides, education, and AST/ALT ratio; other frequent contributors via models were age, aerobic exercise, frequency of alcohol intake, and hypertension. Low Brier score calibration and decision-curve analysis showed the model's probability estimates and net clinical benefit were stable at realistic thresholds.
Outcomes and Implications
This research demonstrates how machine learning can identify diabetic patients at risk for poor health-related quality of life (HRQoL), enabling targeted interventions such as counseling, blood pressure and lipid control, or tailored education. Because the model relies on routinely collected variables, it has high potential for integration into clinical workflows. However, external validation is essential before deployment across health systems. If validated, such tools could optimize resource allocation, personalize care plans, and improve long-term outcomes.