Comprehensive Summary
With the utilization of artificial intelligence taking up more and more space in medical imaging, the accuracy and consistency of AI models in clinical practice must be assured. The focus of this study is on externally validating an AI model that classifies ankle fractures based on the widely adopted AO/OTA classification system. The research entailed training a deep-learning neural network with 7,500 ankle radiographs from a Swedish trauma center, then validating its performance on a distinct dataset of 399 radiographs from Australia. The model’s performance was assessed using the area under the receiver operating characteristic (AUC) curve and the area under the precision-recall curve. The model performed excellently on the internal validation dataset, with a weighted AUC of 0.95 and AUPR of 0.96 for malleolar fractures. Be that as it may, however, its performance on the external dataset was lower, with an AUC of 0.86 and AURP of 0.93, indicating that while the model was effective, its performance varied between different datasets. Despite these discrepancies, the model still outperformed a random classifier in both datasets, although it faced challenges in accurately classifying certain fracture types, particularly type A fractures in the external dataset.
Outcomes and Implications
This research is important because it addresses a critical gap in the development of AI models for medical imaging: the lack of external validation. By demonstrating how AI models trained in one clinical setting can be validated and adjusted for use in different settings, it contributes to improving the generalizability and reliability of AI applications in healthcare. While the timeline for clinical implementation is not explicitly mentioned, the study’s emphasis on model adjustment and validation suggests that with further refinement, AI-driven fracture classification tools could be integrated into clinical practice within a few years.