Comprehensive Summary
This retrospective, single-platform study asked whether deep learning and machine-learning models can accurately detect COVID-19 misinformation in Persian-language Instagram posts, using transformer-based NLP, LSTM sequence models, CNN text classifiers, and traditional ML algorithms to perform binary misinformation classification. Researchers analyzed Instagram posts and comments (collected 5 September – 11 December 2022) comprising Persian COVID-19 content sourced through hashtag-based searches and health-organization accounts. Text preprocessing included Persian-specific normalisation, tokenisation using the Hazm library, and manual annotation by a multidisciplinary team of clinicians, with verification via 30 peer-reviewed COVID-19 studies. The models tested included XGBoost, KNN, CNN, LSTM, and ParsBERT, compared against each other as methodological baselines. The best-performing model, LSTM, achieved 97% accuracy, precision 0.91, recall 0.85, and an F1-score of 0.85. The analysis showed that misinformation constituted a minority class, contributing to imbalanced performance patterns in which LSTM exhibited high precision but lower recall. Secondary analyses included training-curve comparisons, model-specific scalability assessments, and evaluation of runtime efficiency, where LSTM outperformed CNN and BERT. Additional results demonstrated that LSTM and BERT achieved the strongest recall among all models tested, whereas XGBoost and KNN underperformed on semantic nuance and scalability, particularly given the complexity of Persian linguistic structure. Limitations include potential selection bias due to Instagram-only sampling, imbalanced class distributions, and subjectivity in manual annotations despite expert review. External validation was not performed, and subgroup fairness analyses were absent. Findings reflect patterns in Persian online discourse and model performance on annotated datasets rather than real-world behavioral or health outcomes and therefore do not imply clinical or population-level efficacy.
Outcomes and Implications
This study suggests that LSTM-based NLP architectures, augmented by Persian-specific embeddings and expert-verified labels, can support more effective detection of health misinformation in resource-scarce language domains. Clinically and socially, improved misinformation surveillance can aid public-health organizations in monitoring emerging false claims, prioritizing high-risk narratives, and deploying timely educational interventions during future outbreaks. Application to practice could include integration into real-time content-moderation pipelines, automated flagging systems for health ministries, or early-warning dashboards that identify misinformation clusters before they spread widely. However, translation into operational public-health tools requires external validation, continuous retraining with new linguistic variations, and alignment with ethical standards for content governance.