Comprehensive Summary
This peer-reviewed study aimed to assess if the AI large language model ChatGPT could accurately recommend audiologist referrals based on self-testing data using mobile hearing apps. ChatGPT-4o was tested in this retrospective, single-center study using 1,000 simulated data samples of threshold hearing. This longitudinal data represents a period of up to 12 months. ChatGPT’s recommendations were then compared to the recommendations of 5 experts, and further analyzed to assess whether the decision-making criteria followed a rigid set of guidelines or utilized a more nuanced approach. Analysis of accuracy included percent agreement and Cohen's Kappa. ROC curves and AUCs were used to explore the decision thresholds of ChatGPT. All data analyses were performed in MATLAB 2023b. Possible limitations include the use of simulated data, so generalizability to real patient data is unclear. No external or temporal validation was performed, and demographics of a real population sample were not explored. ChatGPT-4o achieved 80–84% agreement with the experts, with accuracy reaching 87% using a multi-response approach. Furthermore, in cases with unanimous expert agreement, ChatGPT reached a 99% agreement score. The findings also suggest ChatGPT considers and adapts to differences in the duration of data collection, rather than relying on rigid criteria. If validated with a clinical sample, large language models such as ChatGPT could enhance existing mobile hearing self-tests by offering nuanced decisions with high accuracy. This would improve accessibility and support early detection of hearing loss progression.
Outcomes and Implications
Hearing loss is a global health concern, and implementation of this large language model technology would provide greater accessibility to hearing testing and specialist recommendations. It would allow patients to regularly self-test and enhance their decision-making processes to seek clinical expertise when necessary. AI tools have the potential to enhance communication between patients and physicians, and could also allow physicians to focus on more complicated hearing conditions which may need greater professional attention. These findings could significantly improve existing home-testing strategies.