Comprehensive Summary
This study examined how HIV-related conversations on social media can be categorized, using machine learning to classify tweets by theme and sentiment. Researchers collected 229,340 tweets from X (Twitter) from June 2023 to August 2024, yielding 191,972 unique tweets after preprocessing. Two researchers manually annotated 1,000 tweets across five categories, which were used to train models with text augmentation and term frequency. Logistic regression and linear support vector machine (SVM) classifiers were compared against manual labels, with the SVM performing best (accuracy 0.91; macro F1-score 0.90). The tweets were sorted into five themes: information and education (63.0%, 120,985/191,972), opinions and commentary (12.4%, 23,863/191,972), personal experiences and stories (10.3%, 19,672/191,972), stigma and social impact (7.4%, 14,252/191,972), and support and resources (6.9%, 13,200/191,972). To explore latent structure, the study also applied unsupervised topic modeling with Latent Dirichlet Allocation (LDA), identifying 15 subtopics. Sentiment analysis combined VADER (for polarity) and NRC Emotion Lexicon (for discrete emotions), showing fear, anger, and trust as dominant emotions. Key limitations include selection bias from hashtag-based sampling, absence of demographic subgroup analysis, and reliance on English-language tweets. Results reflect online discourse rather than patient outcomes and do not imply clinical efficacy. While not clinically oriented, the study demonstrates how AI models, specifically SVM classifiers and LDA topic modeling, can provide structured insights into online HIV discourse to inform public health messaging.
Outcomes and Implications
This study shows that AI can organize HIV-related tweets into clear themes like education, stigma, and support. The results help public health messaging by showing what people talk about and feel online. However, these findings do not imply clinical efficacy and only reflect online conversations.