From visual question answering to intelligent AI agents in ophthalmology

Back

Ophthalmology

From visual question answering to intelligent AI agents in ophthalmology

British Journal of Ophthalmology

Research Authors: Danli Shi, Mingguan He, Xianwen Shang, Bingjie Yan, Weiyi Zhang, Xiaojie Wan, Pusheng Xu, Ruyou Chen, Xiaolan Chen

AIIM Authors: Kaiden Cheung, Zakariyya Siddiqui

Approved by President Reda Riffi

Publication Date: Aug 27, 2025

Comprehensive Summary

This paper, presented by Chen et al., evaluates the effectiveness of Visual Question Answering (VQA) in the practice of ophthalmology. With the usage of many different data sources in ophthalmology, traditional AI models tend to have struggles in pulling data from different sources, making its use difficult in the clinical setting of ophthalmology. VQA is a multimodal AI system that can analyze images, while utilizing machine learning techniques and NLP to generate answers. VQA systems being used in ophthalmologic practices fall into two categories: closed-ended and open-ended. With developments in multimodal LLMs (LLaVA and GPT 4Vision), open-ended application of VQA seems to be heading in a positive trajectory due to the ability for generative and reasoning abilities. The multimodal LLMs are able to be more accurate with the ability to pull data from many sources while being able to use its memory from previous data sets. Despite the advancements, the study illustrated that the current state of multimodal LLMs are poor in ophthalmology applications, with a 30.6 % accurate response rate in generating answers. Furthermore, the field of ophthalmology currently lacks abundant amounts of high-quality data sets that LLMs can use to pull data. While LLMs can generate simulated events and synthesize data, having real data sets can be a precursor to more accurate responses. Despite the developments in VQA, specifically in utilizing multimodal AI systems and image recognition, traditional physician diagnosis and treatment plan remains the most common.

Outcomes and Implications

Chen et al, expresses the potential benefits of implementing VQA systems in a clinical setting as it has seen success in other medical fields such as pathology, radiology, and dermatology. The paper suggests that with future developments in VQA’s, these systems can provide an efficient and personalized training for professionals and students in the field of ophthalmology. Additionally, the ability to analyze images would allow the model to provide diagnostic responses to any queries from the clinician as well as streamline all data (labs, imaging, patient history) and provide a preliminary report to the physician. Despite the many positive medical implications VQA’s have in the field of clinical ophthalmology, the model is far from ready to be applied into the clinical field, with limited datasets available as well as limitations in the AI model. Providing more high-quality datasets and developing frameworks that encompass real world qualities (empathy, risk management, ease to understand) will help propel the development of VQA’s into the field of ophthalmology.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.