BackPublic Health

Using Large Language Models to Assess the Consistency of Randomized Controlled Trials on AI Interventions With CONSORT-AI: Cross-Sectional Survey

JMIR Publications Research Authors: Xufei Luo, Zeming Li, Zhenhua Yang, Bingyi Wang, Yanfang Ma, Fengxian Chen, Qi Wang, Long Ge1, James Zou, Lu Zhang, Yaolong Chen, Zhaoxiang Bian AIIM Authors: Gayathri Ganesan, Shiv Patel Approved by President Reda Riffi Publication Date: 9/26/2025

Comprehensive Summary

This research is important because it explores how large language models (LLMs) such as GPT and Claude can be used to check whether clinical trials involving AI follow the CONSORT-AI guidelines. Clear and complete reporting is essential for making sure studies can be trusted and repeated, but doing these checks by hand is time-consuming and requires significant effort. By testing several LLMs on 41 published trials, the authors created a benchmark for how well these models can handle the task. They found that newer versions of GPT, especially gpt-4-0125-preview, were the most accurate and showed strong agreement with expert human reviews. The study highlights that LLMs could take on some of the heavy work in trial evaluation, although there are still areas where they fall short.

Outcomes and Implications

In medicine, this work is highly relevant because reliable trial reporting affects how doctors, researchers, and policymakers decide whether AI tools are safe and effective for patients. If LLMs can automate part of the process, it could make peer review faster, reduce the burden on reviewers, and improve the quality of evidence that guides clinical care. At the same time, the study shows that these models are not perfect yet. They sometimes miss important details and still require human oversight. For now, they are best used as supportive tools rather than replacements. With more testing, stronger prompting strategies, and integration into journal systems, LLMs could become a standard part of research review, but this will likely take time and gradual refinement before they are fully trusted in clinical or regulatory settings.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.