Comprehensive Summary
This study aimed to evaluate ChatGPT’s ability to deliver accurate and detailed information regarding corneal ulcers to assess its use in medical education and clinical practice. To study this, twelve questions related to corneal ulcers were used. For each question, five independent ChatGPT sessions were performed, and a panel of five ophthalmology experts, including teaching staff, assessed its response using a Likert scale from 1-5, where 1 was very poor and 5 was very good. The median scores between each expert were then calculated, along with inter-rater reliability to assess consistency between evaluators. ChatGPT’s response to corneal ulcer-related questions varied greatly. Regarding risk factors, etiology, symptoms, treatment, complications, and prognosis, median scores were high (4.0 and above) with narrow IQRs. However, classification and investigations scored lower with a median of 3.0, and signs of corneal ulcers had a median of 2.0. This shows that ChatGPT has great variability in its ability to answer ophthalmologic questions and highlights a need to improve AI responses through continuous feedback and targeted adjustments.
Outcomes and Implications
ChatGPT has great potential in ophthalmologic education and clinical practice. This study shows it has a fairly good ability in education, as seen in its median score of 4.0 for questions related to things like risk factors and symptoms. One area of improvement is its use as a diagnostic tool, as it had a lower median score of 3.0 from evaluators. The need for improvement in ChatGPT’s ophthalmologic knowledge is also highlighted in the inter-evaluator reliability, with some evaluators giving more ‘good’ scores than others. Overall, this shows that there is still a lot of improvement to be made in ChatGPT’s ophthalmologic knowledge, and that as this knowledge expands, it will become a useful tool within ophthalmology.