Comprehensive Summary
This retrospective, multicenter study sought to determine whether AI models could accurately identify the reasons clinical trials fail, using transformer-based natural language processing (NLP) models to perform failure reason classification across a large collection of clinical trial data. Researchers analyzed 78,016 completed and terminated trials from ClinicalTrials.gov, registered before February 16, 2024. The datasets were extracted from XML records, cleaned, and converted into structured tabular formats compatible with AI models. Key preprocessing steps included text normalization of trial summaries and encoding of relevant categorical variables such as disease type and trial phase. Baseline models, including transformer-based NLP classifiers and tabular learning algorithms, were implemented and compared against traditional feature-based approaches and rule-based classifiers. The best-performing model achieved an area under the receiver operating characteristic curve (AUROC) of 0.861 for classification accuracy in identifying trial failure reasons. The analysis revealed that out of the 78,016 trials, 35.6% (n=27,752) were terminated early, while 64.4% (n=50,264) were completed as planned. Out of the terminated trials, the most common failure reasons were insufficient enrollment (40.2%), lack of efficacy (28.5%), and adverse events (16.9%). Secondary analyses included text-based topic modeling of termination reasons and calibration analyses to evaluate prediction reliability across different disease categories. Additional results showed model sensitivities and specificities exceeding 0.80 across multiple subgroups, demonstrating significant generalization across diverse therapeutic areas. Limitations include potential label noise due to variable reporting quality in ClinicalTrials.gov, the lack of prospective validation, and the underrepresentation of certain trial types or regions. External validation was not performed, and subgroup fairness analyses were absent. The findings reflect diagnostic performance within retrospective trial metadata rather than real-time clinical outcomes and thus do not directly imply clinical efficacy.
Outcomes and Implications
This study suggests that transformer-based NLP models can provide important insights into some systemic causes of clinical trial failure and can support earlier risk identification and more informed trial design. By integrating predictive analytics into the early stages of clinical development, pharmaceutical sponsors and organizations that oversee trials could proactively identify trials at higher risk of termination and adjust protocols accordingly. This can improve participant safety, resource allocation, and trial efficiency. These models could be incorporated into clinical trial management platforms or decision-support tools to predict risk, guide recruitment strategies, and optimize design parameters before trial launch. However, there is no direct translation to patient care, and these models require prospective validation, integration with real-world evidence, and careful ethical oversight to ensure transparency, fairness, and reproducibility.