Comprehensive Summary
Kooper-Johnson et al examine whether AI-generated images of skin conditions are realistic and accurate enough to be used for medical education. The researchers generated images of ten skin conditions using DALLE-2 and DALLE-3, randomized them with real clinical photographs, and asked dermatology residents and physicians to identify which images were AI-generated and to provide diagnoses. Performance was compared across image type, AI model, physician training level, and disease category. The results showed that AI-generated images were correctly identified as computer-generated in 70.8% of cases, with residents outperforming attendings, especially for DALLE-2 images. Diagnostic accuracy was significantly lower for AI-generated images than for clinical photographs (40.83% vs. 72%), and DALLE-3 images were diagnosed more accurately than DALLE-2 images. Attendings diagnosed DALLE-2 images more accurately than residents, while both groups performed similarly on DALLE-3 images. Infectious conditions were often flagged as AI-generated but had the lowest diagnostic accuracy, and skin of color was largely underrepresented.
Outcomes and Implications
Dermatology heavily relies on visual learning, AI-generated images could help address gaps in educational resources. However, the authors conclude that AI-generated images should not replace real clinical photographs due to poor diagnostic performance and limited representation of skin color. Although newer models show improvement, more testing and better training data are needed before these images can be used in medical education.