Comprehensive Summary
Sleep disorders are common during pregnancy, and are linked to worse maternal and neonatal health outcomes. While multiple factors influence sleep quality during pregnancy, these factors alone do not provide a precise method for calculating the risk probability of poor sleep quality, making early identification difficult. Therefore, researchers in this study constructed and compared eight machine learning (ML) models to forecast sleep issues in pregnant women and used SHAP analysis to clarify which clinical and psychosocial risk factors most strongly influenced risk predictions. 1,681 pregnant women were recruited at their initial prenatal visit at the General Hospital of Ningxia Medical University from February 2022 to April 2023. There, standardized questionnaire data on sociodemographic factors, behavioral habits, pregnancy conditions, and clinical information were all collected; sleep quality was defined using the Pittsburgh Sleep Quality Index (PSQI), and related anxiety and postnatal depressive symptomatology were also assessed. These factors were selected from regression screening, and eight ML models were trained and evaluated to predict the presence of sleep disorders in pregnant women. A total of 618 out of 1,681 participants (36.8%) were categorized as having sleep disorders and were included in the analysis. Ten predictors that were retained for modeling were age, standardized gestational weight gain, gestational weeks for current pregnancy, number of abortions, severity of morning sickness, pregnancy intention, pre-pregnancy health, underlying diseases, anxiety, depression, and the combined effect of anxiety and depression. The multivariable analysis identified key predictors across multiple domains of demographics such as obstetric parameters, health status, and psychological factors, reflecting the multifaceted nature of pregnancy-related sleep disturbances. Out of the eight ML models tested, LightGBM showed the best discriminative ability and highest AUC (0.718) on the test set. The SHAP method identified depression as the most influential predictor, followed by standardized gestational weight gain and gestational weeks, and individual-level SHAP examples highlighted severe morning sickness, higher gestational weight gain, and later gestational weeks as important factors driving high-risk predictions. The researchers in the study note that the robust 5-fold cross-validation for model development ensures robustness, and the combination of logistic regression with machine learning algorithms provides a balance between interpretability and complex pattern identification; however, the study’s single-center design and modest sample size could limit the generalizability of the findings to more diverse populations, and the use of self-reported metrics suggest a need for broader, multi-center validation. Ultimately, the model provides a promising framework for early detection and intervention, provided future research accounts for the potential overestimation of risk magnitude in higher prevalence cohorts.
Outcomes and Implications
The high prevalence of sleep disorders (36.8%) from the results of this study convey the urgent need for more structured screening, as untreated sleep disturbances are linked to worse maternal and neonatal health outcomes. The multidimensionality of the risk model presented (LightGBM) provides a promising framework for predicting sleep disorder risk based on a patient’s unique sociodemographic, obstetric and somatic, and psychological profile. Since LightGBM also uses readily and easily obtainable data such as morning sickness severity and gestational weight gain, it can allow healthcare providers to flag high-risk patients during routine prenatal checkups. The accessibility of the model's input factors makes clinical testing viable, but multi-center validation and further testing on larger and more diverse sample sizes are still necessary for future clinical deployment.