Comprehensive Summary
This study aimed to assess the readability, reliability, and accuracy of patient-information leaflets on Descemet Membrane Endothelial Keratoplasty (DKEP) generated by seven different large language models (LLMs) to see which LLM produced the most patient-friendly leaflets with accurate information as compared to a leaflet written by clinicians. To do so, each LLM was given the prompt, “Make a patient information leaflet on DMEK surgery.” To analyze the accuracy and readability of these generated leaflets, several readability and reliability metrics were used, as well as misinformation detection. Leaflets were then scored on a 0-100% scale. Clinician-generated leaflets scored a 92%, Claude 3.7 Sonnet scored 77.8%, and ChatGPT-4o scored a 70.9% without references. Moderate scores were observed for DeepSeek-V3, Perplexity AI< and Google Gemini 2.0 Flash. The lowest scores were seen from ChatGPT-4 and Microsoft CoPilot as a result of misinformation. Overall, LLMs show promise in their ability to generate patient education material, but still lack reliability and accuracy of information compared to clinician-written leaflets. Thus, LLM-generated leaflets should be reviewed by clinicians prior to being used in clinical settings.
Outcomes and Implications
LLMs offer a lot of promise in patient education. However, this study showed that LLMs still lack the reliability and clinical expertise that actual clinicians do, as seen in the score difference. LLM-generated leaflets have the ability to help educate patients and save time for clinicians by making leaflets; however, they must be reviewed before they are used for patient education in clinical settings. Furthermore, the findings in this study show that many LLMs still have room for improvement regarding their ophthalmologic knowledge, and therefore, should continue to be used with caution when treating patients.