Comparative Analysis of Vision Transformers and Conventional Convolutional Neural Networks in Detecting Referable Diabetic Retinopathy

Autor: Jocelyn Hui Lin Goh, BEng, Elroy Ang, BEng, Sahana Srinivasan, BEng, Xiaofeng Lei, MSc, Johnathan Loh, MEng, Ten Cheer Quek, BEng, Cancan Xue, PhD, Xinxing Xu, PhD, Yong Liu, PhD, Ching-Yu Cheng, PhD, Jagath C. Rajapakse, PhD, Yih-Chung Tham, PhD
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Ophthalmology Science, Vol 4, Iss 6, Pp 100552- (2024)
Druh dokumentu: article
ISSN: 2666-9145
DOI: 10.1016/j.xops.2024.100552
Popis: Objective: Vision transformers (ViTs) have shown promising performance in various classification tasks previously dominated by convolutional neural networks (CNNs). However, the performance of ViTs in referable diabetic retinopathy (DR) detection is relatively underexplored. In this study, using retinal photographs, we evaluated the comparative performances of ViTs and CNNs on detection of referable DR. Design: Retrospective study. Participants: A total of 48 269 retinal images from the open-source Kaggle DR detection dataset, the Messidor-1 dataset and the Singapore Epidemiology of Eye Diseases (SEED) study were included. Methods: Using 41 614 retinal photographs from the Kaggle dataset, we developed 5 CNN (Visual Geometry Group 19, ResNet50, InceptionV3, DenseNet201, and EfficientNetV2S) and 4 ViTs models (VAN_small, CrossViT_small, ViT_small, and Hierarchical Vision transformer using Shifted Windows [SWIN]_tiny) for the detection of referable DR. We defined the presence of referable DR as eyes with moderate or worse DR. The comparative performance of all 9 models was evaluated in the Kaggle internal test dataset (with 1045 study eyes), and in 2 external test sets, the SEED study (5455 study eyes) and the Messidor-1 (1200 study eyes). Main Outcome Measures: Area under operating characteristics curve (AUC), specificity, and sensitivity. Results: Among all models, the SWIN transformer displayed the highest AUC of 95.7% on the internal test set, significantly outperforming the CNN models (all P
Databáze: Directory of Open Access Journals