Comprehensive Summary
The aim of the study was to develop a machine learning model to predict HbA1c changes over the course of 90 days in youth with Type 1 diabetes (T1D) using electronic health record data (EHR). The experiment used EHR from a network of pediatric diabetes clinics across the Midwestern USA, selecting 1743 youth who had received care between January 2012 and August 2017 excluding children less than 9-years old and initial Hba1c measurements at their time of diagnosis. The model used is a random forest regression algorithm with 3-fold cross validation due to the large sample size. Two out of the three non-overlapping data subsets were used to train the model, and the third was to test its accuracy. The model also uses multiple cut-off thresholds (≥0.3%, ≥0.4%, ≥0.5%, ≥0.6%) based on the percent change in HbA1c to determine individuals requiring intervention. Features such as postal code, age, race, sex, clinical charts, and laboratory measurements were used in the predictions and the predicted HbA1c change of the model strongly correlated with the true HbA1c change shown in the patient EHR data with a Pearson score of 0.79. The efficacy of the model suggests that EHR data has great promise in identifying youth that may experience a rise in HbA1c. The sensitivity was greater when using lower thresholds while specificity was greater at higher thresholds. While focusing on high thresholds (≥0.6%) may decrease the rate of false positives, lower thresholds (≥0.3%) would reduce the likelihood of missing individuals that may experience a rise and is thus the threshold proposed by the authors. The factors that most impacted the predictions were postal code, HbA1c metrics, and treatment engagement.
Outcomes and Implications
Being able to predict changes in HbA1c would be beneficial in healthcare settings and for patient care as changes above ≥0.3% between the 90-day period indicate potential long term complications or outcomes for youth with T1D. This can help improve individualized patient care but it requires consultation with clinicians to determine the appropriate HbA1c threshold change. However with numerous false positives and lack of current generalizability due to the data only being from Midwestern USA, the model would require further development before potential implementation. Furthermore, the limitations of EHR include entry errors and missing data that inadvertently occurs in patient care. EHR data also includes fragmentation and bias in clinical data collection, thus the model must be optimized to account for these issues before it can be implemented.