Pediatrics

Comprehensive Summary

This study aimed to evaluate and compare the performance and effectiveness of responses between large language models (LLMs) concerning frequently asked questions about Type 1 Diabetes (T1D). For this study, questions were sourced from responses to an Instagram story asking viewers living in Turkey what questions they had regarding T1D; 20 of the most frequently asked questions were used in the study. 5 LLMs (Chat GPT 3.5, 4.0, 4o, Gemini, and Gemini Advanced) were compared using a 5 point General Quality Scale (GQS), with 1 grading the response as “poor quality” and 5 grading it as “excellent quality”. 5 pediatric endocrinologists were blinded and utilized this scale to score each model’s responses. They had an intraclass correlation coefficient of 0.409 (0 being no agreement, 1 being perfect agreement), meaning a fair to moderate agreement level between physicians’ judgement. Before generating responses, the same prompt concerning the purpose of the study and instructions for the AI was given to all LLMs. Once responses had been generated, results found that ChatGPT 4o had the highest average score (3.78 ± 1.09 points) while Gemini had the lowest (3.40 ± 1.24 points). However, these results were not statistically significant. ChatGPT 3.5 had the least score variability, but did not have a high mean score compared to newer models (such as ChatGPT 4o, 4.0, Gemini Advanced). All in all, results indicated that models have the potential to provide 60-80% accurate information in responses. Limitations of this study were noted to be the range of models utilized and the smaller number of questions included.

Outcomes and Implications

The number of children diagnosed with Type 1 Diabetes increases every year. In this initial time period after being diagnosed, children and their families often have many questions that cannot all be answered by a medical professional. Thus, LLMs, especially newer models, have the potential to be a helpful resource for these unanswered questions. Before the completion of this study, there had been others detailing the use of LLMs in adult populations and even those with Type 2 Diabetes, but never in a pediatric T1D population. The authors note that a specialized, easily accessible, and accurate LLM has the potential to benefit patients and their families, but that not all questions can be appropriately answered by these models and that an integration between models and healthcare professionals is needed for long term ailments like Type 1 Diabetes.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team