Comprehensive Summary
This research paper examines the potential of artificial intelligence (AI) models to accurately grade medical assessments that evaluate students’ clinical reasoning in patient scenarios. The researchers developed a Script Concordance Test (SCT), which specifically assesses clinical decision making skills in uncertain situations, with a focus on ophthalmology related questions. The test was administered to medical students, whose responses were then evaluated by both human ophthalmology experts and two AI models (ChatGPT and o1 preview). The findings indicate that human experts provided responses with slightly higher accuracy (as reflected in mean scores) compared to the AI models. Additionally, while AI grading resulted in minor score adjustments, these changes did not compromise the fairness of the assessment. Overall, this study highlights the potential utility of AI models as tools for grading medical reasoning in exams that involve decision making. Given that human experts demonstrated greater accuracy, the study also highlights the importance of critically assessing the role of AI in providing reliable assessments in clinical knowledge.
Outcomes and Implications
This research is significant as it explores the potential of artificial intelligence in accurately assessing students’ clinical reasoning responses. The study’s findings demonstrate the capability of AI in accurately evaluating responses in clinical decision-making scenarios. These results are clinically significant as they suggest that AI could serve as a resource for both medical students and healthcare professionals by providing decision making support close in accuracy to that provided by experts.