Comprehensive Summary
This study investigated whether machine learning models can predict early pediatric sepsis using routine electronic health record (EHR) data from the first 4 hours of emergency department (ED) care. A multicenter registry was assembled from five U.S. health systems (Epic and Cerner) covering 2,323,720 pediatric ED visits for patients aged 2 months to <18 years old. Model derivation used 1,604,422 visits (from Jan 2016 - Feb 2020) and temporal validation used 719,298 visits (from Jan 2021 - Dec 2022). Outcomes were defined by the Phoenix Sepsis Criteria (PSC): suspected infection plus PSC ≥2 (sepsis) or PSC sepsis with ≥1 cardiovascular point (septic shock) occurring within 48 hours after the features window. PSC scores were calculated over two separate 24-hour periods, with the higher score used for analysis. Prevalence in the test set was 0.37% (sepsis) and 0.15% (shock). Two model families, ridge logistic regression and gradient tree boosting (XGBoost), were trained with nested cross-validation with the goal to maximize specificity at a threshold of 90% sensitivity. Features emphasized variables available before clinician decision-making effects, including Emergency Severity Index (ESI) triage, age-adjusted vital signs (temperature, heart rate, respiratory rate, blood pressure), oxygen saturation (including informative missingness), age/sex, arrival mode, ED utilization in the prior year, and complex chronic conditions. In temporal validation, prediction of sepsis was strong with AUROC 0.936 for XGBoost (vs 0.923 ridge), specificity 0.807 at 90% sensitivity, LR+ 4.67, PPV 1.7%, and number-needed-to-evaluate 59. Shock AUROCs were 0.926 (XGBoost) and 0.923 (ridge). Feature contribution (SHAP) highlighted ESI category, minimum oxygen saturation (and its absence), age-adjusted shock index, recent ED use/admission, and chronic disease burden as top predictors. Fairness analyses showed comparable AUROCs across age, sex, race/ethnicity, language, and site, with slightly higher AUROC in Medicaid-insured compared to commercially insured children. Calibration curves and site-level AUPRC/AUROC were reported, with minimal variability across systems.
Outcomes and Implications
Pediatric sepsis is a leading cause of child mortality, and earlier prognostic identification could enable timelier evaluation and treatment even before organ dysfunction appears. These findings suggest that machine learning-based early warning systems using routine ED data are feasible and clinically informative, providing meaningful LR+ at high sensitivity. However, because sepsis is rare, PPV remains low, which raises concerns about alarm fatigue. The authors recommend a two-step strategy in which model alerts are combined with clinician judgment, along with prospective evaluation in real-world workflows before deployment. Future directions include testing variable alert thresholds, comparing parsimonious models with more comprehensive feature sets, and integrating the models into EHRs.