Comprehensive Summary
The authors in this article evaluated the accuracy of artificial intelligence tools in responding to questions about Post-Orgasmic Illness Syndrome (POIS), which is a rare condition characterized by flu-like and cognitive symptoms after ejaculation. Sixteen questions were gathered from clinical visits and online patient forums, and they focused on epidemiology, treatment options, treatment risks, and counseling. Each question was prompted twice, 48 hours apart, to assess consistency. Three urologists graded responses on a four-point accuracy scale. ChatGPT-4 performed best in epidemiology and counseling, achieving 100% accuracy and reproducibility. However, accuracy dropped to 50% for treatment and risk-related questions, with low reproducibility; readability scores worsened from Day 1 to Day 2. While ChatGPT-4o was relatively accurate in counseling and general knowledge areas, its inconsistency in scores limit the reliability for accurate clinical guidance.
Outcomes and Implications
This study highlights both the promise and the limitations of AI tools like ChatGPT in providing health information for rare conditions. While the model can deliver consistent and supportive responses for general topics, it struggles with accuracy and stability in treatment-related content, where reliable, up-to-date medical knowledge is essential. These findings emphasize the need for expert verification and improved readability before AI can be safely used for patient education, especially in rare or poorly understood conditions. Future research should compare multiple language models and include patient feedback to better evaluate trustworthiness and real-world usefulness.