Comprehensive Summary
This study examined the performance of a code-free deep learning (CFDL) application for identifying supraspinatus tendon pathologies on shoulder magnetic resonance imaging (MRI). A retrospective cross-sectional design was used which included patients with MRI-confirmed partial or full thickness tears and tendinosis, while those with normal supraspinatus findings acted as controls. All MRI scans were analyzed using the LobeAI platform, which used transfer learning based on the ResNet-50 V2 architecture to develop classification models. The resulting models showed high sensitivity for detecting supraspinatus abnormalities at 93.75% for partial tears, 100% for full-thickness tears, and 100% for tendinosis. However, the model’s specificity was lower, at 43.75%, 62.5%, and 18.75% for each condition, respectively. The model differentiating between partial and full thickness tears had an overall accuracy of 34.38%, while the model comparing all pathological scans against normal images showed an accuracy of 37.50% and a weighted F1 score of 0.32. These findings suggest that CFDL tools have potential for early identification of shoulder tendon pathologies, however the models still need improvement before they can be used confidently in clinical practice.
Outcomes and Implications
The findings from this study suggest that code-free deep learning (CFDL) tools may play an important role in improving accessibility to AI-assisted diagnostics in musculoskeletal imaging. By allowing clinicians and radiologists to develop image classification models without coding experience, these systems could support early detection of supraspinatus pathologies, enable more timely intervention and prevent progression from partial to full-thickness tears. Such early diagnosis has the potential to enhance rehabilitation outcomes, reduce the need for surgical procedures, and improve overall patient function and quality of life. However, despite high sensitivity, the relatively low specificity and limited accuracy observed indicate that current CFDL models are not yet reliable enough for clinical use as standalone diagnostic tools. False positives are a concern and could lead to unnecessary follow-ups or overtreatment. Therefore, these applications should currently be used as a complementary aid to expert interpretation rather than replacements. Future research should focus on refining the model, training larger datasets, and developing a standardized method to ensure higher diagnostic precision and consistent performance in real-world clinical settings.