Opthalmology

Comprehensive Summary

Gholami et. al. evaluated the quality of ophthalmology multiple-choice questions (MCQs) generated by the large language model (LLM) GPT-4 by OpenAI. Questions were generated by sampling questions from the self-assessment sections of all 13 volumes of the Basic and Clinical Science Course (BCSC) textbook series. The final question set consisted of 121 questions and covered all 11 ophthalmology specialties, with efforts made to distribute the topics evenly. 10 ophthalmologists participated in an online survey to rate the questions based on appropriateness, clarity and specificity, relevance, discriminative power, and suitability for trainees on a 10-point Likert scale. The source of the questions (LLM-generated or human-written from BCSC question banks) was hidden from the graders. Further, the agreement between graders for each criteria was assessed using Krippendorff α coefficients, and the similarity between LLM-generated and human-written questions was assessed using the FuzzyWuzzy Python library. Median Likert scores between LLM-generated and human-written questions were very similar, suggesting the value of using GPT-4 as a resource to generate high-quality ophthalmology MCQs within a short time. Additionally, 95% of the LLM-generated questions had a similarity score less than 60, implying the originality of the generated questions. However, low Krippendorff α values were found across all criteria, indicating a lack of consensus among graders. Though GPT-4 was found to be a viable resource to generate realistic, high-quality MCQs, it still requires extensive validation by clinical professionals to be used in examinations.

Outcomes and Implications

GPT-4 can be a significant tool to streamline the question development process for the OKAP by generating questions for underrepresented topics and significantly decreasing the time needed to write questions by hand. However, limitations remain regarding biases of GPT-4 and other LLMs and data privacy. Institutions that plan to use GPT-4 to generate MCQs should implement well-developed review protocols to ensure quality and consistency compared to human-written questions.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team