Comprehensive Summary
This study evaluated whether lightweight deep learning (DL) models can classify lumbar foraminal stenosis (LFS) severity from sagittal CT scans, providing an alternative when MRI is unavailable. In a retrospective, single-center study at Wenzhou Medical University (China), researchers analyzed 868 sagittal CT images from 177 patients collected between 2016–2025, all of whom also underwent MRI within 7 days. LFS grades were assigned using the Lee system by two surgeons with adjudication by a third. Two DL models were developed (EfficientNet-B0 and MobileNetV3-Large-100) and were trained and tested against a senior (18 years experience) and junior (5 years experience) spine surgeon using Faster R-CNN/ResNet-50 region of interest (ROI) detection. On the independent test set (102 images, 21 patients), accuracies were 82.35% (DL1) and 80.39% (DL2), nearly identical to the senior surgeon (83.33%) and much higher than the junior (62.75%). Agreement with the senior surgeon was strong (k = 0.815, DL1; k = 0.799, DL2). For binary detection of nerve compression, accuracies were 87–89% (models) vs. 91% (senior). Limitations include single-center sampling, modest dataset size, and reliance on subjective grading. No external validation or subgroup analysis was performed.
Outcomes and Implications
This study suggests that lightweight AI models using CT scans could help expand access to reliable diagnosis of lumbar foraminal stenosis in places where MRI is not widely available. Because the models performed nearly as well as senior spine surgeons and far better than junior ones, they could serve as useful decision-support tools, particularly for less experienced clinicians. Their speed and efficiency make them practical for smaller hospitals or resource-limited clinics. At the same time, the findings must be interpreted cautiously. The study was based on a single center with a limited dataset and relied on subjective grading, with no external validation or subgroup analysis. The models show promise as low-cost, accessible diagnostic aids, but they are not yet ready for routine use and should be considered exploratory until tested across larger, more diverse patient populations