Comprehensive Summary
This paper showcases a framework for predicting hand, foot, and mouth disease (HFMD) risk in southern China by integrating multisource data. The authors used data from Bao’an District in Shenzhen between 2014 and 2023, including HFMD cases, meteorological and air pollution variables, Baidu Index search data, and public health measures. A correlation analysis was conducted to explore the relationships between HFMD incidence and these systematic factors. To capture delayed and nonlinear environmental effects, the study used a Distributed Lag Nonlinear Model. For forecasting, the researchers used a Seasonal Autoregressive Integrated Moving Average (ARIMA) model for 1 week ahead predictions and advanced machine learning methods for 2 to 4 week ahead predictions. They found that all environmental and search behavior factors except sulfur dioxide showed significant nonlinear associations with HFMD incidence. The Seasonal ARIMA model achieved the highest accuracy for 1 week ahead forecasts (R²=0.95, r=0.98). For longer term predictions, models combining web and environmental data had stronger performance and more stable results. The integrated approach matched predicted accurately and observed risk levels which shows the value of combining surveillance, environmental, and digital data for the prediction of epidemics.
Outcomes and Implications
This study highlights how integrating digital data sources like web based search behavior with traditional surveillance can improve infectious disease forecasting. The strong correlation between online search activity and HFMD incidence suggests that digital signals can serve as early indicators of outbreaks. Accurate short and midterm forecasts allow public health authorities to take proactive measures, such as preparing healthcare facilities, promoting hygiene education, and targeting high risk populations. Understanding the nonlinear impact of environmental factors, including temperature and humidity, can also help in developing season specific prevention strategies. The use of machine learning provides a way to handle complex interactions that traditional models might not account for. These findings support the development of digital epidemiology systems that combine real world and online data for early warning and response. The same framework could be used for other communicable diseases to strengthen predictive public health infrastructure.