Comprehensive Summary
This study investigates large language models’ automatic conversion of eligibility criteria of clinical trials into structured, database-ready queries. A three-step preprocessing pipeline was developed and compared various LLMs across hundreds of SQL generation tasks, applying clinical validation benchmarks. Notably, GPT-4 outperformed traditional software tools like USAGI in concept mapping (48.5% vs 32.0%). However, it showed inferior SQL generation accuracy compared with smaller models and high hallucination rates, most commonly, inserting placeholders and wrong domain assignments. Moreover, GPT-4 exhibited varying success across clinical conditions, for example, the model displays high accuracy for Type 1 diabetes but low accuracy for Type 2 diabetes or pregnancy. The article notes that hallucination rates limit the reliability of these models, calling instead for a hybrid AI-based system with careful model selection.
Outcomes and Implications
This research advances automation in clinical trial design, which is crucial to reducing the cost and time of transforming eligibility criteria into structured data formats compatible with healthcare databases. This could significantly speed up assessments of feasibility and overall recruitment of patients, thereby improving trial efficiency and reproducibility. Further refinement is necessary before clinical implementation in healthcare data systems