Comprehensive Summary
This study developed and tested a machine learning model (Light Gradient Boosting) to predict the risk of COVID-19 infection in adults with diabetes in Alberta, Canada. Using health data from over 369,000 patients, the model showed moderate accuracy in distinguishing who might test positive. However, while it had high specificity (few false positives), it lacked sensitivity, meaning many actual COVID-19 cases were missed.
Outcomes and Implications
The findings suggest that health system administrative data alone is not enough to predict COVID-19 infection risk among people with diabetes. To make such models useful in real-world practice, more detailed individual-level data (like behaviors, exposures, and social factors) may be needed. This highlights both the potential and the current limitations of machine learning in public health prediction.