Comprehensive Summary
This retrospective, multicenter study asked whether a two-stage machine learning–based pipeline, which first predicts the duration of surgery (DOS) and then optimizes operating room schedules, could reduce overutilization and underutilization in total hip and knee arthroplasty scheduling compared with traditional mean-based scheduling. Using a multilayer perceptron (MLP) model trained on ACS NSQIP data, researchers predicted DOS for 302,490 TKA and 196,942 THA cases, then fed those predictions into three alternative integer-linear-programming scheduling formulations to construct weekly elective surgery schedules under realistic clinical constraints. The models were trained on 2014-2017 data, tuned on 2018, and evaluated on 2019 procedures. Preprocessing included extracting 33 preoperative patient features and rounding predicted completion times to 10- or 15-minute blocks. Schedules generated with ML predictions, historical mean DOS values, and true DOS (hindsight upper bound) were compared across >100 simulated weekly cycles. The best-performing formulation (MSSP: one surgeon per room per day) achieved consistent reductions in overtime of approximately 300–500 minutes per simulated week (P<.001) relative to mean-based schedules, though with modest increases in underutilization. Across schedule parameters, the two-stage method outperformed mean scheduling in >80% of simulated weeks and >90% of weeks when using 15-minute granularity. The analysis showed that the “Any” formulation yielded the poorest performance across parameter sets, while the MSSP and Split formulations produced significantly less overtime and avoided cascading errors arising from DOS underestimation. Secondary analyses demonstrated that schedule granularity and surgeon waitlist size significantly influenced underutilization, with less unused time observed when using 15-minute blocks and waitlists ≥500 cases. Although hindsight schedules, using perfect DOS information, generated near-ideal performance, neither the ML-based nor the mean-based methods matched this upper bound. Limitations included reliance solely on preoperative features, potential selection bias, lack of surgeon-specific operative patterns, absence of bed and staffing constraints, and incomplete fairness or subgroup analyses. External validation was not performed, and findings primarily reflect simulation-based performance rather than real-world operational outcomes.
Outcomes and Implications
This study suggests that patient-specific MLP DOS predictions, when integrated with realistic scheduling optimization, can meaningfully improve operating room efficiency and reduce overtime in high-volume orthopedic centers. The two-stage pipeline has the potential to support more predictable operative days, better alignment of surgeon block time, and downstream benefits in staffing, cost containment, and throughput. In practice, such a system could be embedded into hospital scheduling software to generate weekly block assignments using surgeon-specific waitlists, allowing teams to anticipate staffing needs, minimize late-day overruns, and improve workflow reliability. However, translation into routine clinical operations will require better DOS prediction accuracy, potentially incorporating institution-specific factors, surgeon-level data, and real-time adjustments, along with prospective validation in live scheduling environments. Even so, the findings highlight how ML-enhanced planning tools could help health systems facing rising surgical volume and constrained resources by enabling more efficient, data-driven allocation of operating room time.