International Scientific Platform
Serving Researchers Since 2012

DIABETIC RETINOPATHY SCREENING SYSTEM USING MOBILENETV3 WITH TRANSFER LEARNING AND GRAD-CAM EXPLAINABILITY

DOI : 10.17577/IJERTCONV14IS030013
Download Full-Text PDF Cite this Publication

Text Only Version

DIABETIC RETINOPATHY SCREENING SYSTEM USING MOBILENETV3 WITH TRANSFER LEARNING AND GRAD-CAM EXPLAINABILITY

David Jones

Under Graduate Student,

Department of Computer Science and Engineering,

Dr. G. U. Pope College of Engineering, Sawyerpuram,

Thoothukudi, Tamil Nadu, India. jesustheimmaculatta@gmail.com

Mariselvam P

Under Graduate Student,

Department of Computer Science and Engineering,

Dr. G. U. Pope College of Engineering, Sawyerpuram,

Thoothukudi, Tamil Nadu, India. mariselvam02052005@gmail.com

T. Jasperline

Professor,

Department of Computer Science and Engineering,

Dr. G. U. Pope College of Engineering, Sawyerpuram,

Thoothukudi, Tamil Nadu, India. gnchriston6448@gmail.com

Abstract Diabetic Retinopathy (DR) is a leading cause of preventable blindness globally, necessitating timely and accurate screening. Manual grading by ophthalmologists is costly and unavailable at scale in resource-limited settings. This paper presents a fully automated DR Screening System built on MobileNetV3-Small with ImageNet-based transfer learning for five-class severity grading of retinal fundus images according to the International Clinical Diabetic Retinopathy (ICDR) scale. The APTOS 2019 benchmark dataset (3,662 images) is used for training and evaluation with an 80/20 stratified split. A class-weighted cross-entropy loss function is employed to address severe class imbalance, assigning higher penalties to minority severity grades. Explainability is integrated using Gradient- weighted Class Activation Mapping (Grad-CAM), which generates heatmaps highlighting pathological retinal regions that influence model predictions, thereby enhancing clinical interpretability. A confidence score and structured clinical recommendation are produced alongside each prediction. After 20 training epochs with the Adam optimizer, the best checkpoint achieves a validation accuracy of 81%, with a weighted F1-score of 0.81 and a macro F1-score of 0.66 on the 733-sample validation set. The proposed system is lightweight (~2.5M parameters), computationally efficient (~0.06 GFLOPs), and suitable for deployment in low- resource healthcare environments. The system demonstrates strong potential as a scalable, interpretable, and clinically actionable diagnostic support tool.

Keywords Diabetic Retinopathy, Deep Learning, MobileNetV3, Transfer Learning, Fundus Image Classification, Weighted Cross- Entropy, Class Imbalance, Explainable AI, Grad-CAM, APTOS 2019, Computer-Aided Diagnosis, Healthcare AI, Medical Image Analysis.

  1. INTRODUCTION

    Diabetic Retinopathy (DR) is one of the most prevalent microvascular complications of diabetes mellitus and remains a leading cause of blindness and visual impairment worldwide. The International Diabetes Federation estimates that over 463 million adults are living with diabetes, and projections indicate this figure will exceed 700 million by 2045. Critically, up to one- third of diabetic patients are estimated to develop some form of retinopathy during their lifetime. Early-stage DR is typically asymptomatic, yet timely identification and intervention are essential to prevent progressive and irreversible vision loss. Without appropriate treatment,

    DR can advance through multiple severity stages from mild non-proliferative changes to proliferative retinopathy involving pathological neovascularization, which carries a high risk of complete blindness.

    Conventional DR screening involves manual evaluation of retinal fundus photographs by trained ophthalmologists or retinal specialists. This approach is time-intensive, expensive, and severely constrained by the global shortage of ophthalmic specialists, particularly in low- and middle-income countries. Teleophthalmology programs and diabetic eye screening initiatives have expanded access to some extent, but the volume of diabetic patients requiring regular retinal examination significantly outpaces the capacity of available human expertise. Consequently, there is a compelling need for automated, scalable, and cost-effective DR screening systems capable of accurate multi-class severity grading.

    The rapid advancement of deep learning, particularly Convolutional Neural Networks (CNNs), has revolutionized medical image analysis. CNNs have demonstrated near-expert- level performance in a wide range of diagnostic tasks, including skin lesion classification, chest X-ray analysis, and ophthalmic disease screening. For DR detection, deep learning models trained on large annotated fundus image datasets have shown remarkable accuracy in both binary (DR/no DR) and multi-class severity grading tasks. Despite these advances, a significant challenge remains: deep learning models typically operate as black boxes, providing predictions without explanations. This lack of transparency raises concerns among clinicians regarding the safety and reliability of AI-driven diagnoses, particularly in high- stakes medical contexts.

    Explainable Artificial Intelligence (XAI) has emerged as a critical research direction aimed at making AI decisions transparent and interpretable to human experts. In the context of medical imaging, XAI methods such as Gradient-weighted Class Activation Mapping (Grad- CAM) can generate visual saliency maps that highlight the image regions most influential to a model's prediction. Such visualizations enable clinicians to verify

    whether the model is attending to clinically relevant pathological structures such as microaneurysms, hard exudates, haemorrhages, and neovascularization building trust and enabling effective human-AI collaboration.

    This paper proposes a comprehensive automated DR Screening System that addresses the dual challenges of classification accuracy and interpretability. The system employs MobileNetV3-Small, a lightweight yet high- performing CNN architecture, fine-tuned on the APTOS 2019 Blindness Detection dataset for five-class DR severity grading. Weighted cross- entropy loss mitigates the pronounced class imbalance inherent in the dataset. Grad-CAM-based visualizations provide clinically meaningful explanations for each prediction, while a structured recommendation engine translates predicted severity grades into actionable clinical guidance.

    1. Objectives

      The primary objectives of this work are: (1) to develop an automated multi-class DR severity classification system achieving clinically acceptable accuracy on a standard benchmark dataset; (2) to address class imbalance through principled loss weighting strategies;

      (3) to integrate Grad-CAM-based explainability to produce visual diagnostic evidence; (4) to design a lightweight architecture suitable for deployment in resource- constrained healthcare environments; and (5) to provide a structured clinical recommendation output for use by non- specialist healthcare workers.

    2. Paper Organization

    The remainder of this paper is organized as follows: Section II reviews related literature on DR detection and explainable AI. Section III describes the proposed methodology including dataset preparation, model architecture, and training strategy. Section IV details the system architecture. Section V presents experimental results and comparative analysis. Section VI concludes the paper with future directions.

  2. RELATED WORK

    Automated retinal image analysis for DR detection has been extensively studied over the past two decades. The earliest computer-aided methods relied on handcrafted feature extraction pipelines targeting specific DR lesion morphologies, including microaneurysm detection using mathematical morphology, hemorrhage segmentation via region growing, and hard exudate detection through color and texture analyis. While these methods produced interpretable features, their sensitivity to imaging variability and reliance on lesion-specific preprocessing limited their generalizability.

    The landmark study by Gulshan et al. [1] demonstrated that a deep CNN trained on a large ophthalmologist-annotated dataset of 128,175 retinal images could achieve high specificity and sensitivity for DR detection, surpassing the performance of ophthalmologists on two validation sets. This work

    catalyzed widespread adoption of deep learning for retinal image analysis. Subsequent research explored diverse CNN architectures including VGGNet, ResNet, DenseNet, and Inception-based models for multi-class DR grading, consistently demonstrating that transfer learning from ImageNet pre-trained weights significantly improved performance on limited medical imaging datasets [2].

    Lightweight CNN architectures have gained attention for their deployability in resource-constrained and mobile health settings. Howard et al. [7] introduced MobileNets, based on depthwise separable convolutions, which dramatically reduced computational cost while maintaining competitive accuracy. The subsequent MobileNetV3 [8] incorporated neural architecture search and hard-swish activation functions, further improving efficiency. Qummar et al. [3] evaluated multiple CNN architectures for DR grading on the APTOS dataset, reporting that MobileNet-family models achieved favorable accuracy- efficiency trade-offs compared to deeper architectures.

    Explainability in deep learning-based medical AI has received increasing attention from both research and regulatory perspectives. Selvaraju et al. [5] proposed Grad-CAM, which computes gradient-weighted linear combinations of feature maps from the final convolutional layer to generate class-discriminative localization maps. In the context of DR screening, several studies have employed Grad-CAM and its variants (Grad-CAM++, Score-CAM) to visualize model attention on retinal lesions, demonstrating alignment between AI saliency regions and clinically relevant structures identified by ophthalmologists [6]. The APTOS 2019 Kaggle dataset [9], comprising 3,662 high- quality fundus images graded by certified graders, has been widely adopted as a standard benchmark, enabling reproducible comparison across studies.

    Despite these advances, the majority of existing works focus on either high-accuracy deep models without interpretability, or binary classification tasks. Few studies have simultaneously addressed multi-class DR grading, class imbalance, lightweight deployment, and clinical explainability within a unified pipeline. This work bridges these gaps by integrating MobileNetV3, weighted loss training, and Grad-CAM into a cohesive, clinically oriented screening system.

  3. PROPOSED METHODOLOGY

    The proposed DR screening system is developed as a modular deep learning pipeline encompassing dataset preparation, model design, loss function engineering, training optimization, and post-hoc explainability. Each component is designed with clinical applicability and computational efficiency as guiding principles.

    1. Dataset Preparation

      The APTOS 2019 Blindness Detection dataset [9] is used as the primary benchmark. The dataset comprises 3,662 high- resolution retinal fundus images acquired

      across multiple clinical sites in India, annotated by trained graders with five DR severity grades following the ICDR classification: Grade 0 (No DR, 1805 images), Grade 1

      (Mild NPDR, 370), Grade 2 (Moderate NPDR, 999),

      Grade 3 (Severe NPDR, 193), and Grade 4 (Proliferative DR, 295). All images are resized to 224×224 pixels to match the MobileNetV3 input specification. The dataset exhibits pronounced class imbalance with a 9.4:1 ratio between the majority (Grade 0) and minority (Grade 3) classes.

      An 80/20 stratified train-validation split is applied using Scikit-learn's train_test_split with random seed 42, yielding 2,929 training and 733 validation samples while preserving the original class proportions. Training augmentations include random horizontal flipping and random rotation within ±10°, implemented via PyTorch's transforms.Compose pipeline. No augmentation is applied to the validation set to ensure unbiased evaluation.

    2. Model Architecture

      MobileNetV3-Small is selected as the classification backbone, pre-trained on the ImageNet-1K dataset. The architecture employs depthwise separable convolutions organized into inverted residual blocks with linear bottlenecks, combined with Squeeze-and-Excitation (SE) modules that perform adaptive channel-wise feature recalibration. The hard-swish activation function replaces ReLU in the deeper layers, providing improved nonlinearity with negligible additional computational cost. The backbone produces a 576-dimensional global average-pooled feature representation. The original classification head is replaced with a custom fully connected layer: Linear (5761024) Hardswish Dropout (0.2) Linear (10245), mapping to the five DR severity grades. All backbone layers are unfrozen and jointly fine-tuned throughout training.

    3. Class Imbalance Handling

      The severe class imbalance in the APTOS 2019 dataset presents a significant challenge for standard cross-entropy training, as the model tends to be dominated by the majority No DR class. To counteract this, class-wise inverse frequency weights are computed as w_c = N / (K × n_c), where N is the total number of training samples (2929), K is the number of classes (5), and n_c is the training sample count for class c. The resulting weights are provided in Table IV. These weights are embedded directly into the PyTorch CrossEntropyLoss criterion, applying higher penalties for misclassification of minority severity grades, particularly Grade 3 (Severe NPDR, weight: 3.80×) and Grade 4 (Proliferative DR, weight: 2.48×).

    4. Training Configuration

      The model is trained in two phases over a total of 20 epochs. Phase 1 (5 epochs) constitutes a warm-up phase for initial convergence from the ImageNet-initialized weights. Phase 2 (15 epochs) is the fine-tuning phase with model checkpointing based on validation accuracy. The Adam optimizer is employed with a learning rate of

      1×10 and default betas (=0.9, =0.999). Mini- batch gradient descent is performed with a batch size of

      32. Training is executed on an NVIDIA GPU with CUDA acceleration using PyTorch. The model checkpoint achieving the highest validation accuracy (80.90% at Epoch 6 of Phase 2) is retained for final evaluation. Table II presents a summary of key training epochs.

    5. Grad-CAM Explainability

    Gradient-weighted Class Activation Mapping (Grad- CAM) is implemented for post-hoc visual explainability. Forward and backward hook functions are registered on the final convolutional layer (model.features[-1][0], a Conv2d(96576) layer) of MobileNetV3. For a given input image x and predicted class c, the class score y^c is backpropagated through the network to compute the gradients y^c/A^k with respect to the feature map activations A^k of the target layer. These gradients are global- average-pooled to obtain per-channel importance weights ^c_k = (1/Z) (y^c/A^k_ij). The Grad- CAM heatmap is computed as the ReLU-activated weighted sum of activations: L^c_GradCAM = ReLU(_k ^c_k A^k). The resulting heatmap is bilinearly upsampled to the input image resolution (224×224), normalized to [0,1], and overlaid on the original fundus image using a JET colormap to produce the final visualization shown in Fig. 1.

    TABLE I. DETAILED CLASSIFICATION REPORT VALIDATION SET

    Epoch

    Phase

    Train Loss

    Train Acc.

    Val. Loss

    Val. Acc.

    Remarks

    1

    Warm-up

    1.2719

    57.49%

    1.6288

    54.57%

    Initial convergenc

    3

    Warm-up

    0.8120

    75.62%

    1.1423

    67.12%

    Rapid improveme

    5

    Warm-up

    0.6601

    80.40%

    0.9324

    73.94%

    Phase 1 best

    6

    Fine- tuning

    0.4059

    87.67%

    1.1515

    80.90%

    Best model saved

    8

    Fine- tuning

    0.3370

    89.28%

    1.1018

    80.90%

    Plateau reached

    12

    Fine- tuning

    0.2437

    92.56%

    1.5600

    79.95%

    Overfitting onset

    15

    Fine- tuning

    0.2021

    93.96%

    1.5525

    80.08%

    Final epoch

    Best (Ep.6/11)

    Fine- tuning

    81.00%

    Checkpoin t used

  4. SYSTEM ARCHITECTURE

    The complete DR screening system integrates four primary functional modules in a sequential inference pipeline: (1) Image Acquisition and Preprocessing Module, (2) Deep Learning Classification Module, (3) Grad-CAM Explainability Module, and (4) Clinical Report Generation Module.

    1. Image Acquisition and Preprocessing

      The input module accepts retinal fundus images in standard formats (JPEG, PNG) and applies a standardized

      preprocessing pipeline: RGB channel normalization, bilinear resizing to

      224×224 pixels and conversion to a PyTorch float tensor. No additional preprocessing such as CLAHE or green channel extraction is applied, relying instead on the model's ability to learn relevant features from raw normalized images through transfer learning.

    2. Classification Module

      The preprocessed image tensor is passed through the fine- tuned MobileNetV3-Small backbone. The model produces a 5- dimensional logit vector, which is converted to class probabilities via softmax. The predicted severity grade corresponds to the class with the highest probability. The maximum softmax probability is reported as the model confidence score.

    3. Explainability Module

      regions such as areas with microaneurysms, haemorrhages, hard exudates, or neovascularization that most strongly contributed to the model's decision. Fig. 1 illustrates a representative Grad-CAM output for a test image.

      Fig. 1. Grad-CAM Visualization: Original Retinal Fundus Image (left) and Heatmap Overlay highlighting pathological regions (right)

    4. Clinical Report Generation

    The clinical report module maps the predicted severity grade to a structured diagnostic output comprising: a diagnostic label (e.g., 'Moderate Non-Proliferative Diabetic Retinopathy'), the model confidence percentage, and a standardized clinical recommendation. Recommendations follow established ophthalmological screening guidelines: Grade 0 routine annual screening; Grade 1 ophthalmologist follow-up within 612 months; Grade 2 referral recommended; Grade 3

    urgent consultation advised; Grade 4 immediate specialist intervention required.

  5. EXPERIMENTAL RESULTS AND DISCUSSION

    1. Training Performance

      The model underwent two-phase training over 20 epochs. During the Phase 1 warm-up (Epochs 15), training accuracy improved substantially from 57.49% to 80.40%, while validation accuracy rose from 54.57% to 73.94%, reflecting rapid adaptation of ImageNet features to the retinal imaging domain. The Phase 2 fine-tuning (Epochs 115) achieved the best validation accuracy of 80.90% at Epoch 6, corresponding to an overall epoch 11 of training. Beyond this point, training accuracy continued to increase (reaching 93.96% at Epoch 20), while validation

      accuracy plateaued and slightly declined indicative of overfitting onset. The best checkpoint was saved at Epoch 6 of Phase 2 and used for all subsequent evaluation. Table II provides a detailed epoch-by-epoch performance summary for key training milestones.

      TABLE II. TRAINING PERFORMANCE SUMMARY

      DR Grade / Class

      Precision

      Recall

      F1-Score

      Support

      Clinical Remarks

      0 No DR

      (Normal)

      0.98

      0.94

      0.96

      361

      High accuracy; majority class

      1 Mild NPDR

      0.51

      0.59

      0.55

      74

      Moderate; limited samples

      2 Moderate NPDR

      0.77

      0.81

      0.79

      200

      Good; adequate training data

      3 Severe NPDR

      0.46

      0.41

      0.43

      39

      Challenging; very few samples

      4 Proliferative DR

      0.58

      0.53

      0.55

      59

      Critical class; needs improvement

      Macro Average

      0.66

      0.66

      0.66

      733

      Weighted Average

      0.81

      0.81

      0.81

      733

      Best overall balance

    2. Detailed Classification Performance

      Table I presents the per-class precision, recall, F1-score, and support for the best model on the 733-sample validation set. The model achieves excellent performance on Grade 0 (No DR) with an F1-score of 0.96, benefiting from the large number of training samples. Grade 2 (Moderate NPDR) achieves a competitive F1- score of 0.79, supported by adequate training representation (799 samples). Grades 1, 3, and 4 show lower F1-scores of 0.55, 0.43, and 0.55 respectively, primarily attributable to limited training data and the visual similarity of adjacent severity grades. The overall weighted average F1-score of

      0.81 reflects the system's strong general performance.

      TABLE III. COMPARATIVE ANALYSIS OF CNN ARCHITECTURES ON APTOS 2019

      DR Grade

      Training Samples

      Class Weight

      Effect on Training

      0 No DR

      1444

      0.4057

      Downweighted; prevents majority bias

      1 Mild NPDR

      296

      1.9791

      Upweighted; improves mild detection

      2 Moderate NPDR

      799

      0.7332

      Near-balanced; moderate emphasis

      3 Severe NPDR

      154

      3.8039

      Highest weight; critical minority class

      4 Prolif. DR

      236

      2.4822

      High weight; clinically urgent class

      TABLE IV. CLASS-WISE INVERSE FREQUENCY WEIGHTS APPLIED DURING TRAINING

      td>

      ~138M

      Model

      Val. Acc.

      Params

      FLOPs

      XAI

      Support

      Deployment

      VGG-16 [2]

      76%

      ~15.5G

      No

      High-resource only

      ResNet-50 [2]

      79%

      ~25M

      ~4.1G

      No

      Moderate resource

      Inception-V3 [12]

      80%

      ~27M

      ~5.7G

      No

      Moderate resource

      EfficientNet-B0 [4]

      83%

      ~5.3M

      ~0.39G

      No

      Low-moderate

      MobileNetV3 (Proposed)

      81%

      ~2.5M

      ~0.06G

      Grad- CAM

      Low-resource / Edge

    3. Confusion Matrix Analysis

      The confusion matrix (Fig. 2) reveals that Grade 0 (No DR) is classified correctly in 340 of 361 cases (94.2%), with minor confusion with Grade 1 (18 cases). Grade 2 (Moderate NPDR) achieves 162/200 correct classifications (81%), with notable confusion with Grade 1 (18 cases) and Grade 4 (8 cases), reflecting the spectral nature of DR progression. Grade 3 (Severe NPDR) presents the greatest challenge, with only 16/39 correct classifications (41%), frequently confused with Grade 2 (14 cases) and Grade 4 (8 cases). This confusion pattern is clinically expected and consistent with literature, as the distinction between severe NPDR and early PDR requires subtle lesion-level assessment beyond pixel-level classification. Grade 1 achieves 44/74 correct (59%), while Grade 4 achieves 31/59 correct (53%).

      Fig. 2. Confusion Matrix for 5-Class DR Severity Classification on APTOS 2019 Validation Set (n=733)

    4. Comparative Analysis

      Table III presents a systematic comparison of the proposed MobileNetV3-based system against widely used CNN architectures evaluated on the APTOS 2019 dataset. The proposed system achieves competitive accuracy (81%) while offering significant advantages in parameter efficiency (~2.5M vs ~25M for ResNet-50 and

      ~138M for VGG-16) and computational cost (~0.06 GFLOPs vs ~4.1G for ResNet-50). EfficientNet-B0 achieves marginally higher accuracy (83%) but does not incorporate any XAI component. Crucially, the proposed system is the only approach integrating Grad-CAM explainability, providing clinically meaningful visual evidence alongside predictions a distinction of significant practical importance for clinical adoption.

    5. Discussion

    The experimental results demonstrate that MobileNetV3- Small, when fine-tuned with class- weighted loss on the APTOS 2019 dataset, achieves clinically meaningful DR severity grading with a compact model footprint suitable for edge deployment. The class weighting strategy effectively improves sensitivity for minority grades relative to unweighted training baselines. The Grad-CAM visualizations exhibit qualitatively appropriate attention patterns, focusing on the central retinal region and areas exhibiting pathological changes in

    higher-severity fundus images, consistent with ophthalmological diagnostic criteria.

    The primary performance bottleneck is the limited training data for minority severity grades (Grades 1, 3, 4), which constrains the model's discrimination capacity for these clinically critical categories. The Grade 3 (Severe NPDR) class, in particular, represents the transition boundary before sight- threatening proliferative disease making accurate detection at this stage critical for timely intervention. Future strategies to address this limitation include advanced data augmentation (e.g., Mixup, CutMix), synthetic data generation using GANs, and self- supervised pre-training on larger unlabeled retinal datasets.

  6. CONCLUSION

This paper has presented a comprehensive automated Diabetic Retinopathy Screening System leveraging MobileNetV3-Small with transfer learning for five-class severity grading of retinal fundus images. Trained on the APTOS 2019 dataset with class-weighted cross-entropy loss to address inherent class imbalance, the system achieves a validation accuracy of 81% and a weighted F1- score of 0.81. The integration of Grad- CAM-based explainability provides clinically interpretable visualizations of pathological retinal regions driving model predictions, addressing a key barrier to the clinical adoption of AI-driven diagnostic tools. The structured recommendation engine further bridges the gap between model output and actionable clinical guidance.

The proposed system's compact architecture (~2.5M parameters, ~0.06 GFLOPs) makes it well-suited for deployment in resource-constrained settings, including mobile screening programs and primary healthcare facilities in underserved regions. The combination of accuracy, efficiency, and explainability positions the system as a viable decision-support tool for DR screening at scale.

Future research directions include: (1) multi-dataset training and cross-dataset validation to improve generalization; (2) incorporation of attention mechanisms and Vision Transformer (ViT) architectures for enhanced feature capture; (3) uncertainty quantification using Monte Carlo Dropout or deep ensembles to flag low- confidence predictions for expert review; (4) advanced augmentation and synthetic data generation for minority class enrichment; and (5) prospective clinical validation studies to assess real-world diagnostic utility and clinician acceptance of the explainable outputs.

REFERENCES

  1. V. Gulshan et al., "Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs," JAMA, vol. 316, no. 22, pp. 2402 2410, Dec. 2016.

  2. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks," in Proc. Advances in Neural Information Processing Systems (NeurIPS), Lake Tahoe, NV, USA, 2012, pp. 10971105.

  3. S. Qummar et al., "A Deep Learning Ensemble Approach for Diabetic Retinopathy Detection," IEEE Access, vol. 7, pp. 150530150539, 2019.

  4. M. Tan and Q. V. Le, "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks," in Proc. 36th Int. Conf. Machine Learning (ICML), Long Beach, CA, USA, 2019, pp. 61056114.

  5. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization," in Proc. IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 618 626.

  6. P. Porwal et al., "IDRiD: Diabetic Retinopathy Segmentation and Grading Challenge," Medical Image Analysis, vol. 59, art. 101561, Jan. 2020.

  7. A. G. Howard et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint arXiv:1704.04861, Apr. 2017.

  8. A. Howard et al., "Searching for MobileNetV3," in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Korea, 2019, pp. 13141324.

  9. Kaggle, "APTOS 2019 Blindness Detection Dataset," 2019. [Online]. Available: https://www.kaggle.com/c/aptos2019- blindness-detection. [Accessed: Apr. 12, 2026].

  10. D. S. Ting et al., "Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images from Multiethnic Populations with Diabetes," JAMA, vol. 318, no. 22, pp. 22112223, Dec. 2017.

  11. G. Litjens et al., "A Survey on Deep Learning in Medical Image Analysis," Medical Image Analysis, vol. 42, pp. 6088, Dec. 2017.

  12. C. Szegedy et al., "Going Deeper with Convolutions," in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 19.

  13. O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," in Proc. Int. Conf. Medical Image Cmputing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 2015, pp. 234241.

  14. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770778.