3D-CNN Enhanced Multiscale Progressive Vision Transformer for AD Diagnosis

Back

Neurology

3D-CNN Enhanced Multiscale Progressive Vision Transformer for AD Diagnosis

IEEE Journal Of Biomedical And Health Informatics

Research Authors: Fei Huang, Nanguang Chen, Anqi Qiu

AIIM Authors: Ronit Ganguli, Sahil Langote, Reda Riffi

Approved by President Reda Riffi

Publication Date: Sep 10, 2025

Comprehensive Summary

Huang et al. explored a deep-learning method that combines convolutional neural networks (CNNs) and vision transformers (ViTs) to improve the diagnosis of Alzheimer's disease (AD) and predict mild cognitive impairment (MCI) progression from MRI scans. They proposed the 3D-CNN Enhanced Multiscale Progressive Vision Transformer (3D-CNN-MPVT). It utilizes a 3D DenseNet121 to learn local brain features and a multiscale progressive ViT to fuse local and global information. In an attempt to minimize the computational expense, it uses a stitch operation. The model was trained on more than 8,400 structural MRI scans from ADNI and OASIS-3 datasets. The 3D-CNN-MPVT was able to differentiate AD from healthy controls with an accuracy of approximately 90% and differentiate those MCI patients who would develop AD with an accuracy of approximately 80%. It was also cheaper since it used about 67% fewer parameters and 68% fewer FLOPs without compromising accuracy. On an external test set, the model failed to generalize as well since there were fewer difficult cases of AD. Attention maps emphasize already established AD-affected areas, i.e., hippocampus and cortical areas, and thus validate that they are biologically meaningful results. The researchers noted that it's possible to integrate CNNs with a growing ViT and a stitching method so that the model can be made aware of global as well as local features without compromising its performance. Researchers showed that the method overcomes significant limitations of conventional ViTs, i.e., overfitting and many computing requirements, but needs further testing for clinical application.

Outcomes and Implications

This study is important because it offers a more accurate and computationally efficient approach for AD detection and MCI prediction of conversion, which can help improve early intervention techniques. It also shows how CNN–ViT hybrids can avoid challenges that have prevented the application of pure ViTs in medical imaging. Clinically, knowing which MCI patients are more at risk for AD would allow earlier follow-up and referrals as well as more precise trial recruitment. The model is interpretive and efficient and therefore may be a valuable diagnostic aid, although the researchers advise that it remains in the investigative stage and must be validated against different populations before becoming a clinical tool. No timeline for clinical use was proposed.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.