Comprehensive Summary
Despite major advances in healthcare and medicine, TB remains one of the world's top infectious diseases and disproportionately affects countries within the Indian subcontinent. India currently has a TB-Free India public health campaign in place, but while there has been ample data collection on a variety of factors related to TB incidence, the current reasons behind why specific data patterns are correlated with TB are unknown. The goal of this study was to first integrate past-collected data with a machine-learning model in order to create a biological framework that can not only predict TB incidence with greater accuracy but also elaborate on the "why"s of specific incidence patterns. The authors then wanted to hybridize the framework that they had created with four different neural networks to analyze their performance. In order to do so, patient data from the Indian government's official TB surveillance system NI-kshay was obtained across a period of Jan 01, 2021 and Dec 31, 2024, reflecting data from during and after the pandemic. Modelling TB is extremely difficult as there are a variety of modes of transmission - as a result, a machine learning model was created with different parameters and equations for susceptible, exposed, infectious, and recovered states of disease. The susceptible state equation involved parameters for recruitment, infection spread, and death; the other three states had equations with defined parameters for endogenous activation, superinfection loss/gain, recovery, and reinfection. The incidence of TB, aka T, was then defined under several conditions with the usage of differential equations and matrices that took into account the influence of new infections and transition processes. An R value was also found in order to represent the number of cases a single infected person would be possible for transmitting in a society. In a population assumed to stay at a TB-free equilibrium, T equalled zero. At endemic equilibrium, R is assumed to be over 1. The resulting machine-learning framework was able to analyze epidemiological data to determine the patterns and trends in infection while also constantly correct itself by identifying discrepancies between predicted values and real-life data, and then taking those discrepancies into account for future analysis purposes. To assess the accuracy of the framework and the hybrid models, Symmetric Mean Absolute Percentage Error (SMAPE) and Mean Absolute Relative Residual Error (MARRE) were applied to find relative error, and Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) were applied to find the actual magnitude of error. The EG-FNN neural network combined with the developed framework was able to improve short-term accuracy of prediction, signifying a stronger ability to portray an overall trend of TB incidence. The EG-LTSM neural network with the developed framework had a better sequential memory, meaning that it can be better at mapping variations in TB incidence, long-term predictive accuracy, and adapting to non-linear patterns of transmission.
Outcomes and Implications
Being able to better understand the mechanisms of infection for tuberculosis can lead to breakthroughs in treatments and public health initiatives to prevent the spread of the infection, which can overall reduce the mortality rate of the disease. This is especially significant due to the sheer volume of fatalities from TB throughout history, with TB being one of the highest infectious disease killers. However, there are limitations in the study, mainly due to the lack of dimensional, multifaceted data - the model only worked on data collected at a national level, and thus is limited in its understanding of demographic and urban/rural factors in rates of TB transmission. Moreover, underreporting and lack of high-quality, properly-maintained data throughout the past few decades could mean that there are gaps in the knowledge of the generated framework.