Comprehensive Summary
This study examined predictors of successful smoking cessation using data from waves 5 and 6 of the Population Assessment of Tobacco and Health study. Researchers used OpenAI’s GPT-4.1 to perform text-based feature selection by analyzing descriptions of survey variables, identifying 45 variables most predictive of 12-month smoking abstinence. These variables were then used to train an eXtreme Gradient Boosting model, which showed nearly identical predictive performance compared to a model trained using all available variables. The top predictors included smoking frequency, time to first cigarette after waking, social influences on tobacco use, emotional dependence, electronic nicotine product use, and health harm concerns. The findings demonstrate that large language models can efficiently identify key predictive variables using textual data alone.
Outcomes and Implications
This study highlights the potential of artificial intelligence tools to improve efficiency in public health research by reducing the need for exhaustive variable selection. Identifying key predictors of smoking cessation can help guide more targeted and resource-efficient smoking cessation interventions. While the findings do not establish causality, they provide valuable insight into behavioral and social factors associated with quitting success. The use of large language models in this context suggests broader applications for AI in epidemiology and population health research.