Comprehensive Summary
This study examined the risk factors associated with subjective life expectancy among middle-aged and older adults by comparing their levels of physical activity. Using 2018 CHARLS data from 10,945 participants, Yang et al. separately built five machine-learning models (Random Forest, Logistic Regression, Support Vector Machine, XGBoost, and LightGBM) for active and inactive subgroups. The dataset underwent SMOTE balancing, was split 70/30 into training and test sets, and the models were tuned using grid search with 10-fold cross-validation. In the inactive group, consisting of 1,353 participants, the SVM model achieved the best performance (AUC=0.797), while LightGBM showed a better predictive capability in the active group (AUC=0.775). SHAP values showed the common top predictors for risk across both groups included age, perceived health, and physical pain. Subjective life expectancy risk was noticeably higher among the inactive adults group (38.8%) when compared with active adults (26.8%), implying that physical activity is associated with lower subjective life expectancy risk. Overall, this study demonstrates how machine learning can effectively detect subgroup-specific risk patterns that linear models often miss. This is due to their ability to have more powerful modeling features and handle nonlinear relationships in multidimensional data, which enables more tailored prediction models for life expectancy, especially when analyzing specific populations.
Outcomes and Implications
Physical inactivity is a major risk factor for low life expectancy and is a major public health concern in countries worldwide. Subjective life expectancy majorly influences health behaviors, healthcare costs, and economic patterns, therefore, identifying its determinants can help early prevention and policy planning. Especially understanding risk factors in physically inactive older adults is critical, given their higher vulnerability to disease progression. Machine learning models traditionally have been used to enhance prediction accuracy rather than to explain why certain outcomes occur. Unlike classical statistical methods, machine learning systems are designed to detect complex, nonlinear patterns that improve prediction performance. Tools such as SHAP allow researchers to examine which features most heavily influence the model’s decisions. In Yang’s study, SHAP outputs showed that age, perceived health status, physical pain, depression, and life satisfaction were the strongest contributors to subjective life expectancy risk across the inactive group. However, the active groups' greatest contributing factors were perceived health, age, education level, pension insurance, and physical pain. The overlap of age, perceived health, and physical pain across both groups indicates that these factors are central to subjective life expectancy regardless of activity level. Meanwhile, education and pension insurance appear only in the active group, suggesting that different psychosocial factors become relevant when older adults are physically active. This demonstrates how machine learning models can reveal subgroup-specific patterns that are not shown in traditional methods. Yang et al.'s findings support the potential use of ML-based risk identification in primary care and public-health screening to flag older adults at elevated perceived life-expectancy risk. Integrating such tools could allow for more efficient targeted interventions in aging populations.