Comprehensive Summary
This study evaluates the performance of OpenAI’s GPT-4 in answering FAQs about Kienböck’s disease, a rare wrist disorder. Researchers generated 19 patient-relevant questions and used zero-shot prompting to input them into ChatGPT-4. Responses were assessed by 33 hand surgeons using the Global Quality Scale (GQS) and readability metrics like Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES). The AI demonstrated strong content quality with an average GQS of 4.28, with 84.6% of responses rated as high-quality. However, readability was poor, with an average FKGL of 15.5 and FRES of 23.4, indicating college-level difficulty. No significant differences were found across question categories. The study concludes that while GPT-4 provides accurate information, its complexity limits accessibility, highlighting the need for improved readability tools and clinician oversight.
Outcomes and Implications
The research underscores the potential of AI in generating accurate educational content for rare orthopedic conditions like Kienböck’s disease. With over 84% of responses rated as high-quality by specialists, GPT-4 can be a valuable resource for medical professionals. However, the high reading level poses a barrier for patient comprehension, limiting standalone use. This highlights a broader challenge in healthcare communication: ensuring digital health literacy. Future AI tools should incorporate readability optimization and clinician-verified layers to enhance accessibility and equity in patient education.