DOI : 10.17577/IJERTV15IS060262
- Open Access

- Authors : Kiran Kadakuntla
- Paper ID : IJERTV15IS060262
- Volume & Issue : Volume 15, Issue 06 , June – 2026
- Published (First Online): 11-06-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
State-of-the-Art Deep Learning Methodologies for the Detection, Segmentation, and Grading of Diabetic Macular Edema (DME): A Comprehensive Multi-Decadal Survey
Kiran Kadakuntla
Lecturer, Government Polytechnic, Karnataka, India
Research Scholar, Department of E&C Engineering, SDMCET, Dharwad, Karnataka, India
-
INTRODUCTION
Diabetes mellitus is a rapidly increasing chronic disease worldwide and is associated with several complications, including diabetic retinopathy (DR), a major cause of vision impairment and blindness. Among its complications, Diabetic Macular Edema (DME) is one of the most critical causes of vision loss in diabetic patients. It occurs due to leakage of fluid from damaged retinal blood vessels into the macula, leading to swelling and progressive loss of central vision. Since early stages of DME may be asymptomatic, timely detection is essential to prevent irreversible visual damage.
Traditionally, DME diagnosis is performed using retinal fundus images and Optical Coherence Tomography (OCT) through manual examination by ophthalmologists. Although clinically reliable, this process is time-consuming, subjective, and challenging to scale due to increasing patient load, especially in resource-limited settings.
To overcome these limitations, automated image analysis methods have been widely explored. Early approaches relied on handcrafted features and conventional machine learning techniques to detect lesions such as exudates and retinal thickening. However, their performance was limited by dependency on feature design and image variability.
The emergence of deep learning has significantly improved retinal disease analysis. Convolutional Neural Networks (CNNs) can automatically learn hierarchical feature representations from images, eliminating the need for manual feature extraction. Recent methods using CNNs, attention mechanisms, U-Net variants, transformer-based models, and hybrid architectures have achieved improved performance in DME detection and grading.
In recent years, research in deep learning-based DME analysis has expanded rapidly, covering tasks such as detection, segmentation, and severity classification using diverse datasets and evaluation protocols. However, the diversity of methodologies makes it difficult to obtain a unified view of progress in this domain.
This review provides a comprehensive overview of deep learning approaches for DME detection, segmentation, and grading. It summarizes the evolution from traditional machine learning to advanced deep learning frameworks, discusses commonly used datasets and evaluation metrics, and highlights emerging trends such as explainable AI, self-supervised learning, federated learning, and foundation models. The study aims to provide a consolidated understanding of current progress, limitations, and future research directions in automated DME diagnosis.
-
BACKGROUND AND CLINICAL OVERVIEW OF DIABETIC MACULAR EDEMA
-
Anatomy of the Retina
The retina is a thin, light-sensitive neural tissue located at the posterior segment of the eye. It plays a vital role in vision by converting incoming light into electrical signals that are transmitted to the brain through the optic nerve. The retina consists of multiple layers and specialized structures that collectively support visual perception.
Among these structures, the macula is responsible for central vision and enables activities requiring fine visual detail, such as reading, driving, and facial recognition. At the center of the macula lies the fovea, which contains the highest concentration of photoreceptor cells and provides maximum visual acuity. The retina also contains an extensive vascular network that supplies oxygen and nutrients to retinal tissues while maintaining the integrity of the blood-retinal barrier.
Figure 1. Anatomical structure of the retina highlighting the optic disc, macula, fovea, retinal blood vessels, and peripheral retina.
Any disruption to the retinal vascular system due to prolonged hyperglycemia can lead to pathological changes, including diabetic retinopathy and diabetic macular edema.
-
Diabetic Retinopathy and Development of DME
Diabetic retinopathy (DR) is a microvascular complication of diabetes mellitus caused by prolonged elevation of blood glucose levels. Chronic hyperglycemia damages retinal capillaries, resulting in increased vascular permeability, microaneurysm formation, hemorrhages, and ischemic changes within retinal tissues.
One of the most vision-threatening manifestations of diabetic retinopathy is Diabetic Macular Edema (DME). DME occurs when the blood-retinal barrier becomes compromised, allowing plasma fluid and lipoproteins to leak into the macular region. This leakage causes retinal thickening and swelling of the macula, ultimately affecting central vision.
Unlike several other retinal abnormalities that primarily occur in advanced diabetic retinopathy, DME may develop at various stages of the disease and can significantly impair vision if left untreated. Therefore, early diagnosis and continuous monitoring are essential for preventing irreversible vision loss.
Figure 2. Pathogenesis of diabetic macular edema illustrating the progression from diabetes mellitus to retinal vascular damage, fluid leakage, retinal thickening, and DME formation.
-
Clinical Significance of DME
DME is one of the leading causes of vision impairment among working-age adults worldwide. In its early stages, patients may experience few or no noticeable symptoms. As the disease progresses, visual disturbances such as blurred vision, reduced contrast sensitivity, image distortion, and central vision loss become increasingly apparent.
Early detection of DME is critical because timely therapeutic interventions can significantly reduce the risk of severe vision loss. Current treatment strategies include anti-vascular endothelial growth factor (anti-VEGF) therapy, corticosteroid injections, and laser photocoagulation. The success of these treatments largely depends on the stage at which the disease is diagnosed.
Consequently, effective screening and monitoring programs are essential for improving patient outcomes and reducing the burden of diabetes-related blindness.
-
Clinical Classification and Grading of DME
The severity of DME is commonly classified according to the location and extent of retinal thickening and hard exudates relative to the macular center. Clinical grading systems help ophthalmologists evaluate disease severity and determine appropriate treatment strategies.
Generally, DME is categorized into three levels:
-
Mild DME: Retinal abnormalities located away from the macular center.
-
Moderate DME: Lesions approaching the macular center.
-
Severe DME: Retinal thickening or exudates involving the macular center and posing a substantial risk to vision.
Automated grading systems developed using artificial intelligence aim to mimic this clinical assessment process and provide consistent severity classification.
Figure 3. Clinical grading categories of diabetic macular edema showing mild, moderate, and severe disease progression.
-
-
Retinal Imaging Modalities for DME Assessment
Advances in ophthalmic imaging technologies have greatly enhanced the diagnosis and managment of DME. Retinal fundus photography remains one of the most widely adopted imaging techniques due to its affordability, accessibility, and suitability for large-scale screening programs. Fundus images provide valuable information regarding retinal lesions such as hard exudates, microaneurysms, and hemorrhages.
Optical Coherence Tomography (OCT) has emerged as the gold standard for DME assessment because it provides high-resolution cross-sectional images of retinal layers. OCT enables direct visualization of fluid accumulation and retinal thickening, making it highly effective for disease diagnosis and progression monitoring.
Retinal thickness is an important biomarker used to assess DME severity and is commonly calculated as:
Equation (1): Retinal Thickness Measurement
RT = ILM RPE
where:
-
RT denotes retinal thickness,
-
ILM represents the Internal Limiting Membrane,
-
RPE represents the Retinal Pigment Epithelium.
This measurement quantifies the distance between the inner and outer retinal boundaries and is frequently used in OCT-based analysis.
Figure 4. Example OCT image illustrating retinal layers and retinal thickness measurement between the ILM and RPE boundaries.
Recent studies have also explored multimodal imaging approaches that combine fundus photography, OCT, and OCT angiography (OCTA) to provide complementary structural and vascular information for comprehensive DME assessment.
-
-
Need for Automated DME Analysis
The rapid growth of the diabetic population has substantially increased the demand for retinal screening services. Manual interpretation of retinal images requires specialized expertise and can be both time-consuming and resource-intensive. Additionally, variations in clinical experience may introduce inconsistencies in diagnosis and grading.
Automated image analysis systems powered by artificial intelligence offer a promising solution to these challenges. By leveraging advanced deep learning techniques, these systems can automatically detect pathological features, segment retinal lesions, grade disease severity, and assist clinicians in making informed treatment decisions.
A typical automated DME analysis framework consists of image acquisition, preprocessing, feature extraction, deep learning-based analysis, and clinical decision support.
Figure 5. General workflow of an automated DME analysis system showing image acquisition, preprocessing, deep learning analysis, and clinical decision support.
The growing success of deep learning models in retinal image analysis has led to significant improvements in the accuracy and reliability of automated DME diagnosis. Consequently, deep learning has become a major research focus in the development of next-generation ophthalmic screening systems.
-
Quantitative Measures Used in DME Assessment
The assessment of Diabetic Macular Edema (DME) relies on both clinical observations and quantitative measurements obtained from retinal imaging modalities. Various mathematical metrics are used to evaluate retinal thickness, disease severity, lesion segmentation performance, and classification accuracy. These quantitative measures provide objective criteria for disease diagnosis, progression monitoring, and evaluation of automated computer-aided diagnostic systems.
-
Retinal Thickness Measurement
Retinal thickness is one of the most important biomarkers used in Optical Coherence Tomography (OCT)-based DME assessment. Increased retinal thickness is generally associated with fluid accumulation and macular swelling.
Equation (1): Retinal Thickness
=
Where:
-
represents retinal thickness.
-
denotes the Internal Limiting Membrane.
-
denotes the Retinal Pigment Epithelium.
The retinal thickness is measured as the distance between the inner and outer retinal boundaries identified in OCT scans.
Figure 6. OCT-based retinal thickness measurement between the Internal Limiting Membrane (ILM) and Retinal Pigment Epithelium (RPE).
-
-
Accuracy
Accuracy is widely used to evaluate the overall performance of DME classification models.
Where:
-
= True Positives
-
= True Negatives
-
= False Positives
-
= False Negatives
=
+
+ + +
Accuracy indicates the proportion of correctly classified retinal images among all examined samples.
-
-
Sensitivity
Sensitivity, also known as Recall or True Positive Rate, measures the ability of a model to correctly identify DME cases.
=
+
A higher sensitivity value indicates that fewer diseased cases are missed, which is particularly important in medical screening applications.
-
Specificity
Specificity measures the ability of a model to correctly identify healthy or non-DME cases.
=
+
High specificity reduces the number of false alarms and unnecessary clinical referrals.
-
Dice Similarity Coefficient
The Dice Similarity Coefficient (DSC) is one of the most frequently used metrics for evaluating lesion segmentation performance in retinal images.
Where:
-
represents the ground truth segmentation.
-
represents the predicted segmentation.
=
2
+
The Dice coefficient ranges from 0 to 1, with values closer to 1 indicating better overlap between the predicted and actual lesion regions.
Figure 7. Illustration of Dice Similarity Coefficient showing overlap between ground truth and predicted lesion regions.
-
-
Intersection over Union (IoU)
Intersection over Union (IoU), also known as the Jaccard Index, is another widely used segmentation evaluation metric.
Where:
-
denotes the ground truth lesion area.
=
-
denotes the predicted lesion area.
Higher IoU values indicate more accurate lesion localization and segmentation performance.
-
-
-
-
EVOLUTION OF AUTOMATED DME ANALYSIS: FROM TRADITIONAL MACHINE LEARNING TO DEEP LEARNING
The increasing prevalence of diabetes has led to growing demand for automated retinal screening systems for Diabetic Macular Edema (DME). Over the past two decades, DME analysis has evolved from traditional image processing techniques to advanced deep learning-based systems capable of near-expert performance. This progression highlights the improvements in accuracy, robustness, and clinical applicability over time.
-
Traditional Image Processing Approaches
Early DME analysis methods relied on classical image processing techniques to enhance retinal images and detect lesions such as hard exudates and hemorrhages. Common operations included preprocessing steps like contrast enhancement, noise reduction, histogram equalization, and color space conversion, followed by rule-based lesion detection using thresholding, edge detection, morphological operations, and clustering.
These methods were simple and computationally efficient but heavily dependent on handcrafted rules and image quality, resulting in poor generalization across diverse clinical datasets.
-
Machine Learning-Base DME Detection
With the availability of retinal datasets, machine learning approaches were introduced. These systems followed a pipeline of preprocessing, handcrafted feature extraction, feature selection, and classification. Features such as texture descriptors, shape features, wavelet transforms, LBP, GLCM, and HOG were widely used.
Classifiers including SVM, KNN, Random Forest, Decision Trees, and ANN improved performance compared to rule-based methods. However, their effectiveness remained limited by the quality of manually engineered features.
-
Emergence of Deep Learning
Deep learning significantly transformed retinal image analysis by enabling automatic feature learning from raw data. This shift was supported by large datasets, GPU computing, and improved neural network designs. Among deep learning models, Convolutional Neural Networks (CNNs) became the most widely used due to their ability to learn hierarchical representations from retinal images.
A typical CNN learns progressively complex features ranging from edges and textures to disease-specific patterns through convolutional, pooling, activation, and fully connected layers.
-
CNN-Based DME Detection Systems
Between 2015 and 2020, CNN-based models became the dominant approach for DME detection. Architectures such as AlexNet, VGGNet, GoogLeNet, ResNet, DenseNet, and Inception were widely adopted using both custom designs and transfer learning.
Transfer learning was especially effective for medical imaging due to limited datasets, allowing pre-trained networks to be fine- tuned for retinal disease classification with improved accuracy.
-
Deep Learning for DME Segmentation
Segmentation models play a key role in identifying lesion regions such as fluid accumulation and exudates at pixel level. Encoder decoder architectures, particularly U-Net and its variants (U-Net++, Attention U-Net, Residual U-Net, SegNet, and DeepLab), have shown strong performance in retinal lesion localization and quantitative assessment.
-
Attention Mechanisms and Hybrid Networks
To improve focus on clinically relevant regions, attention mechanisms were introduced into CNN-based models. These mechanisms enhance feature weighting for key lesion areas such as exudates, macular edema, and abnormal retinal structures. Hybrid CNN- attention models have improved both accuracy and interpretability.
-
Transformer-Based DME Analysis
Recently, transformer architectures have been applied to retinal imaging. Vision Transformer (ViT), Swin Transformer, TransUNet, and SegFormer use self-attention to capture global contextual relationships across images, improving both classification and segmentation performance compared to CNN-only models.
-
Emerging Trends
Current research is moving toward advanced paradigms such as explainable AI, self-supervised learning, federated learning, vision- language models, foundation models, and multimodal learning using fundus and OCT images. These approaches aim to improve interpretability, reduce annotation requirements, and enable scalable clinical deployment.
-
-
PUBLICLY AVAILABLE DATASETS AND BENCHMARK RESOURCES FOR DME ANALYSIS
The performance of deep learning models for Diabetic Macular Edema (DME) analysis strongly depends on the availability of high- quality annotated retinal imaging datasets. These datasets form the basis for training, validation, and benchmarking of automated diagnostic systems. Over time, several public datasets have been developed for diabetic retinopathy (DR), DME detection, lesion segmentation, and disease grading. However, variations in imaging protocols, annotations, and patient populations often affect model generalization, making dataset selection an important factor in research evaluation.
-
Fundus Image Datasets
Color fundus photography is widely used for large-scale DME screening due to its cost-effectiveness. Several benchmark datasets support detection and grading tasks.
IDRiD-Dataset
The Indian Diabetic Retinopathy Image Dataset (IDRiD) contains high-resolution fundus images with pixel-level annotations for lesions such as microaneurysms, hemorrhages, hard exudates, and soft exudates. It is widely used for segmentation and grading tasks. However, it has a relatively small sample size and class imbalance.
MESSIDOR-Dataset
MESSIDOR is one of the earliest and most widely used datasets for diabetic eye disease research. It contains fundus images with multiple severity levels and is commonly used for classification and DR/DME screening. However, it lacks detailed pixel-level annotations, limiting its use for segmentation.
DIARETDB1-Dataset
DIARETDB1 is designed for lesion detection and includes expert annotations for diabetic retinopathy abnormalities. It is frequently used for evaluating early detection methods but has limited dataset size and fewer advanced-stage cases.
e-Ophtha-Dataset
The e-Ophtha dataset provides lesion-specific annotations, particularly for microaneurysms and exudates, making it useful for lesion detection and segmentation tasks in DME research.
DDR-Dataset
The DeepDR (DDR) dataset is a large-scale dataset containing fundus images across multiple severity levels and diverse populations. It is widely used for training deep learning models and evaluating generalization performance.
-
OCT-Based Datasets
Optical Coherence Tomography (OCT) provides cross-sectional retinal imaging and is considered the gold standard for DME diagnosis due to its ability to visualize retinal thickness and fluid accumulation.
OCT datasets offer high-resolution structural information and enable accurate disease severity assessment. However, their usage is limited by data scarcity, annotation complexity, and high storage requirements.
-
Multimodal Retinal Imaging Datasets
Recent research has focused on multimodal datasets combining fundus images, OCT scans, OCT angiography, and clinical metadata. These datasets improve diagnostic accuracy and disease characterization by integrating complementary information.
Despite their advantages, multimodal datasets are difficult to construct due to high cost, complex annotation requirements, and limited availability.
-
Dataset Challenges in DME Research
Despite the availability of several benchmark datasets, key challenges remain:
-
Class imbalance: Fewer DME-positive samples compared to normal cases lead to biased learning.
-
Limited annotations: Pixel-level labeling requires expert ophthalmologists and is time-consuming.
-
Dataset heterogeneity: Differences in imaging devices, illumination, and populations reduce model generalization.
-
Small dataset size: Retinal datasets are relatively small, increasing overfitting risk in deep models.
-
Lack of standard protocols: Inconsistent evaluation methods make cross-study comparison difficult.
-
-
Comparative Analysis of Publicly Available Datasets
To facilitate dataset selection for future research, Table 1 summarizes the most widely used datasets in DME analysis.
Dataset
Imaging Modality
Number of Images
Annotation Type
Application
IDRiD
Fundus
58
Lesion Masks
Detection, Segmentation
MESSIDOR
Fundus
862
Disease Graes
Classification
DIARETDB1
Fundus
421
Lesion Labels
Detection
e-Ophtha
Fundus
150
Exudate Annotations
Segmentation
DDR
Fundus
52
Severity Grades
Classification
OCT Datasets
OCT
156
Layer Annotations
DME Assessment
Table 1. Comparative summary of publicly available retinal imaging datasets used for DME detection, segmentation, and grading.
The availability of publicly accessible retinal imaging datasets has played a crucial role in advancing automated DME analysis. These datasets have enabled researchers to develop increasingly sophisticated machine learning and deep learning models while providing standardized benchmarks for performance evaluation. Nevertheless, challenges related to data quality, annotation availability, and dataset diversity continue to motivate the development of larger, more representative datasets for future research.
-
-
DEEP LEARNING METHODOLOGIES FOR DME DETECTION
The introduction of deep learning has significantly improved the automated detection of Diabetic Macular Edema (DME). Unlike traditional machine learning approaches that rely on handcrafted features, deep learning models can automatically learn complex and discriminative representations directly from retinal images. This capability has enabled more accurate identification of pathological features associated with DME, leading to substantial improvements in diagnostic performance.
-
Convolutional Neural Networks
Convolutional Neural Networks (CNNs) were among the first deep learning architectures successfully applied to retinal image analysis. CNNs learn hierarchical image features through convolutional operations, allowing them to detect retinal abnormalities such as hard exudates and macular changes associated with DME.
The convolution operation is defined as:
(, ) = ( )(, )
where represents the input image, denotes the convolution kernel, and (, )is the resulting feature map.
Popular CNN architectures used in DME studies include AlexNet, VGGNet, ResNet, DenseNet, and Inception networks.
-
Transfer Learning Approaches
Due to the limited availability of annotated retinal datasets, transfer learning has become a widely adopted strategy in DME detection. In this approach, models pre-trained on large image datasets are fine-tuned using retinal images, reducing training time and improving performance.
Commonly used pre-trained models include ResNet, DenseNet, EfficientNet, and InceptionV3.
-
Attention-Based Models
Attention mechanisms enable deep learning models to focus on clinically relevant retinal regions while reducing the influence of background information. By highlighting important lesion areas, attention-based models often achieve improved detection accuracy and interpretability.
-
Transformer-Based Architectures
Recent advances in computer vision have led to the adoption of transformer-based models for retinal image analysis. Unlike CNNs, transformers utilize self-attention mechanisms to capture long-range dependencies and global contextual information across the entire image.
Popular architectures include Vision Transformer (ViT), Swin Transformer, and hybrid CNN-transformer models.
-
Explainable Artificial Intelligence
Although deep learning models achieve high diagnostic accuracy, their decision-making process is often difficult to interpret. Explainable Artificial Intelligence (XAI) techniques such as Grad-CAM, SHAP, and LIME help visualize the retinal regions influencing model predictions, thereby increasing clinician trust and supporting clinical adoption.
-
Summary
The evolution of deep learning has transformed automated DME detection from conventional CNN-based systems to sophisticated attention and transformer-based architectures. These advancements have improved detection accuracy, robustness, and clinical applicability. Nevertheless, challenges related to data availability, model interpretability, and cross-dataset generalization continue to drive ongoing research in this area.
Method
Key Advantage
Limitation
CNN
Automatic feature extraction
Limited global context
Transfer Learning
Effective with small datasets
Domain dependency
Attention Networks
Better lesion localization
Increased complexity
Transformers
Global feature learning
High computational cost
XAI Methods
Improved interpretability
Additional processing
Table 2. Summary of major deep learning methodologies used for DME detection.
-
-
Deep Learning Methodologies for DME Segmentation
DME segmentation aims to accurately identify and delineate retinal lesions such as hard exudates and fluid accumulation regions. Accurate segmentation provides valuable information regarding lesion location, extent, and disease progression. Deep learning architectures, particularly U-Net and its variants, have become the dominant approaches for automated retinal lesion segmentation due to their ability to perform pixel-level classification.
-
U-Net and Advanced Segmentation Models
U-Net, U-Net++, Attention U-Net, and transformer-based models such as TransUNet and Swin-UNet are widely used for DME segmentation. These architectures combine feature extraction and spatial localization to achieve accurate lesion boundary detection.
-
Segmentation Evaluation Metrics
The performance of segmentation models is commonly evaluated using overlap-based metrics.
Dice Similarity Coefficient
where:
-
= Ground truth segmentation
-
= Predicted segmentation
=
2
+
A higher Dice score indicates better overlap between the predicted and actual lesion regions.
Intersection over Union (IoU)
where:
-
= Ground truth lesion area
-
= Predicted lesion area
=
IoU measures the ratio between the intersection and union of segmented regions.
Segmentation Loss Function
A commonly used loss function in DME segmentation is Dice Loss:
2
= 1 +
Minimizing Dice Loss improves segmentation accuracy by maximizing overlap with the ground truth.
-
-
Summary
Recent advances in deep learning have significantly improved DME lesion segmentation. U-Net-based architectures remain the most widely adopted models, while attention and transformer-based approaches continue to enhance segmentation accuracy and robustness.
Model
Application
Common Metrics
U-Net
Lesion Segmentation
Dice, IoU
U-Net++
Exudate Segmentation
Dice, IoU
Attention U-Net
Lesion Localization
Dice, IoU
TransUNet
DME Segmentatin
Dice, IoU
Swin-UNet
Retinal Segmentation
Dice, IoU
Table 3. Summary of deep learning segmentation methods used in DME analysis.
-
-
DEEP LEARNING METHODOLOGIES FOR DME GRADING
DME grading focuses on classifying the severity of the disease into different categories, such as mild, moderate, and severe. Accurate grading is essential for clinical decision-making, treatment planning, and patient prioritization. Unlike detection, which determines the presence of DME, grading assesses the extent of disease progression.
Deep learning models, particularly CNNs, attention-based networks, and transformer architectures, have demonstrated promising performance in automatically learning disease-specific patterns from retinal images and assigning appropriate severity grades.
-
Classification Metrics
The performance of DME grading models is commonly evaluated using classification metrics.
Accuracy
=
+
+ + +
Accuracy measures the proportion of correctly classified samples.
Precision
=
Precision indicates the reliability of positive predictions.
Recall (Sensitivity)
=
+
+
Recall measures the ability of the model to correctly identify DME cases.
F1-Score
1 = 2 ×
×
+
F1-score provides a balanced assessment of precision and recall.
-
Loss Function for DME Grading
For multi-class DME grading, Cross-Entropy Loss is commonly employed:
= log ()
=1
where:
-
is the number of classes,
-
is the true label,
-
is the predicted probability.
The objective is to minimize the classification error during model training.
-
-
Summary
Deep learning-based grading systems have improved the consistency and accuracy of DME severity assessment. Modern architectures can effectively distinguish between different disease stages and support clinicians in making timely treatment decisions.
Method
Application
Common Metrics
CNN
Severity Classification
Accuracy, F1
Transfer Learning
Multi-class Grading
Accuracy, Recall
Attention Networks
Severity Assessment
Precision, F1
Transformers
Advanced Grading
Accuracy, AUC
The integration of deep learning into DME grading has enhanced automated disease assessment and laid the foundation for intelligent clinical decision-support systems.
-
-
COMPARATIVE ANALYSIS AND PERFORMANCE EVALUATION OF DEEP LEARNING MODELS
The rapid advancement of deep learning has led to the development of numerous models for DME detection, segmentation, and grading. While most approaches report high performance, direct comparison remains challenging due to differences in datasets, evaluation protocols, and model architectures. Therefore, a comprehensive comparative analysis is essential to identify the strengths and limitations of existing methodologies.
-
Performance Metrics
Different tasks require different evaluation measures. Detection and grading models are generally evaluated using classification metrics, whereas segmentation models rely on overlap-based measures.
Accuracy
F1-Score
=
+
+ + +
1 = 2 ×
×
+
Dice Coefficient
Intersection over Union (IoU)
=
2
+
=
-
Comparative Discussion
CNN-based models provide strong feature extraction capabilities and have achieved high performance in DME detection tasks. Attention-based architectures further improve lesion localization by focusing on clinically relevant retinal regions. More recently, transformer-based models have demonstrated superior capability in capturing global image context, resulting in improved detection and grading performance.
For segmentation tasks, U-Net and its variants remain the most widely adopted architectures due to their simplicity and effectiveness. Transformer-based segmentation models have shown promising results but often require larger datasets and greater computational resources.
Method
Strengths
Limitations
CNN
Efficient feature extraction
Limited global context
Transfer Learning
Works well with small datasets
Domain dependency
Attention Models
Better lesion localization
Increased complexity
U-Net Variants
Accurate segmentation
Limited contextual awareness
Transformers
Global feature representation
High computational cost
Table 5. Comparative analysis of deep learning methodologies for DME analysis.
-
Summary
Overall, deep learning has significantly improved the accuracy and reliability of automated DME analysis. While CNN-based approaches continue to serve as strong baselines, attention and transformer-based architectures are increasingly becoming the
preferred choice for advanced retinal image analysis. The selection of an appropriate model ultimately depends on the available dataset, computational resources, and clinical application requirements.
-
-
CHALLENGES AND FUTURE RESEARCH DIRECTIONS
Despite the remarkable progress achieved by deep learning in DME detection, segmentation, and grading, several challenges continue to limit its widespread clinical adoption. Addressing these issues is essential for developing reliable and scalable diagnostic systems.
-
Current Challenges Limited Annotated Data
Deep learning models require large amounts of labeled data, but obtaining expert annotations for retinal images is expensive and time-consuming.
Class Imbalance
Many datasets contain significantly fewer DME-positive samples than normal images, which may lead to biased model performance.
Lack of Explainability
Most deep learning models operate as black boxes, making it difficult for clinicians to understand the reasoning behind predictions.
Generalization Issues
Models trained on one dataset often experience performance degradation when applied to images acquired from different devices or populations.
Computational Complexity
Advanced architectures such as transformers require substantial computational resources, limiting their deployment in resource- constrained healthcare settings.
-
Future Research Directions Explainable Artificial Intelligence (XAI)
Techniques such as Grad-CAM, SHAP, and LIME can improve model transparency and increase clinician trust in AI-assisted diagnosis.
Federated Learning
Federated learning enables collaborative model training across multiple healthcare institutions while preserving patient privacy.
Self-Supervised Learning
Self-supervised approaches can reduce the dependence on large annotated daasets by learning meaningful representations from unlabeled retinal images.
Multimodal Learning
Combining fundus photographs, OCT images, and clinical information can provide a more comprehensive understanding of disease progression.
Foundation Models
Large-scale vision foundation models have the potential to improve generalization and support multiple retinal analysis tasks within a single framework.
Lightweight Models
Developing efficient models for mobile and edge devices can facilitate real-time DME screening in remote and underserved regions.
-
Summary
Future research should focus on developing interpretable, data-efficient, and clinically deployable AI systems. The integration of explainable AI, federated learning, multimodal imaging, and foundation models is expected to drive the next generation of intelligent DME diagnostic solutions and improve accessibility to retinal healthcare worldwide.
-
-
CONCLUSION
Diabetic Macular Edema remains one of the leading causes of vision impairment among individuals with diabetes, highlighting the need for timely and accurate diagnosis. This review surveyed the evolution of automated DME analysis from traditional image processing techniques to modern deep learning methodologies for detection, segmentation, and grading.
Deep learning architectures, particularly CNNs, attention-based networks, U-Net variants, and transformer models, have significantly improved diagnostic performance and automated retinal image analysis. Publicly available datasets and advances in computational resources have further accelerated research in this field.
Despite these achievements, challenges related to data availability, model interpretability, generalization, and computational requirements remain. Emerging technologies such as explainable AI, federated learning, self-supervised learning, multimodal systems, and foundation models offer promising directions for overcoming these limitations.
Overall, deep learning continues to transform DME diagnosis and management, providing new opportunities for early screening, precise disease assessment, and improved clinical decision support. Continued research in these areas is expected to contribute toward more accurate, reliable, and accessible ophthalmic healthcare solutions.
REFERENCES
-
Early Treatment Diabetic Retinopathy Study Research Group, Photocoagulation for diabetic macular edema: Early Treatment Diabetic Retinopathy Study report no. 1, Archives of Ophthalmology, vol. 103, no. 12, pp. 17961806, 1985.
-
Early Treatment Diabetic Retinopathy Study Research Group, Early photocoagulation for diabetic retinopathy: ETDRS report no. 9, Ophthalmology, vol. 98, no. 5, pp. 766785, 1991.
-
R. Klein, B. E. K. Klein, S. E. Moss, M. D. Davis, and D. L. DeMets, The Wisconsin epidemiologic study of diabetic retinopathy, Archives of Ophthalmology, vol. 102, no. 4, pp. 520526, 1984.
-
J. W. Yau et al., Global prevalence and major risk factors of diabetic retinopathy, Diabetes Care, vol. 35, no. 3, pp. 556564, 2012.
-
T. Y. Wong and C. Sabanayagam, Strategies to tackle the global burden of diabetic retinopathy: From epidemiology to artificial intelligence,
Ophthalmologica, vol. 243, no. 1, pp. 920, 2020.
-
A. Sopharak, B. Uyyanonvara, and S. Barman, Automatic exudate detection from non-dilated diabetic retinopathy retinal images using mathematical morphology methods, Computerized Medical Imaging and Graphics, vol. 32, no. 8, pp. 720727, 2008.
-
M. Niemeijer, B. van Ginneken, J. Staal, M. S. A. Suttorp-Schulten, and M. D. Abramoff, Automatic detection of red lesions in digital color fundus photographs, IEEE Transactions on Medical Imaging, vol. 24, no. 5, pp. 584592, 2005.
-
A. D. Fleming et al., Automated microaneurysm detection using local contrast normalization and local vessel detection, IEEE Transactions on Medical Imaging, vol. 25, no. 9, pp. 12231232, 2006.
-
D. Sánchez, A. M. López, J. Poza, and R. Hornero, Retinal image analysis based on texture descriptors for diabetic retinopathy screening, Medical Engineering & Physics, vol. 31, no. 6, pp. 745752, 2009.
-
M. D. Abramoff, M. K. Garvin, and M. Sonka, Retinal imaging and image analysis, IEEE Reviews in Biomedical Engineering, vol. 3, pp. 169208, 2010.
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Proc. Advances in Neural Information Processing Systems (NIPS), 2012, pp. 10971105.
-
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. International Conference on Learning Representations (ICLR), 2015.
-
C. Szegedy et al., Rethinking the inception architecture for computer vision, in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 28182826.
-
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770778.
-
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 47004708.
-
V. Gulshan et al., Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs, JAMA, vol. 316, no. 22, pp. 24022410, 2016.
-
R. Gargeya and T. Leng, Automated identification of diabetic retinopathy using deep learning, Ophthalmology, vol. 124, no. 7, pp. 962969, 2017.
-
H. Pratt, F. Coenen, D. M. Broadbent, S. P. Harding, and Y. Zheng, Convolutional neural networks for diabetic retinopathy, Procedia Computer Science, vol. 90, pp. 200205, 2016.
-
O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional networks for biomedical image segmentation, in Proc. International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015, pp. 234241.
-
Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. Liang, UNet++: A nested U-Net architecture for medical image segmentation, in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, 2018, pp. 311.
-
O. Oktay et al., Attention U-Net: Learning where to look for the pancreas, arXiv preprint arXiv:1804.03999, 2018.
-
V. Badrinarayanan, A. Kendall, and R. Cipolla, SegNet: A deep convolutional encoderdecoder architecture for image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 24812495, 2017.
-
L. C. Chen et al., Encoderdecoder with atrous separable convolution for semantic image segmentation, in Proc. European Conference on Computer Vision (ECCV), 2018, pp. 801818.
-
J. Hu, L. Shen, and G. Sun, Squeeze-and-excitation networks, in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 71327141.
-
S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, CBAM: Convolutional block attention module, in Proc. European Conference on Computer Vision (ECCV), 2018, pp. 319.
-
. Wang, R. Girshick, A. Gupta, and K. He, Non-local neural networks, in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 77947803.
-
A. Dosovitskiy et al., An image is worth 16×16 words: Transformers for image recognition at scale, in Proc. International Conference on Learning Representations (ICLR), 2021.
-
Z. Liu et al., Swin transformer: Hierarchical vision transformer using shifted windows, in Proc. IEEE International Conference on Computer Vision (ICCV), 2021, pp. 1001210022.
-
J. Chen et al., TransUNet: Transformers make strong encoders for medical image segmentation, arXiv preprint arXiv:2102.04306, 2021.
-
E. Xie et al., SegFormer: Simple and efficient design for semantic segmentation with transformers, Advances in Neural Information Processing Systems, vol. 34, pp. 1207712090, 2021.
-
R. R. Selvaraju et al., Grad-CAM: Visual explanations from deep networks via gradient-based localization, in Proc. IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618626.
-
M. T. Ribeiro, S. Singh, and C. Guestrin, Why should I trust you? Explaining the predictions of any classifier, in Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 11351144.
-
S. M. Lundberg and S. I. Lee, A unified approach to interpreting model predictions, in Proc. Advances in Neural Information Processing Systems (NIPS), 2017, pp. 47654774.
-
B. McMahan et al., Communication-efficient learning of deep networks from decentralized data, in Proc. Artificial Intelligence and Statistics (AISTATS), 2017, pp. 12731282.
-
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, A simple framework for contrastive learning of visual representations, in Proc. International Conference on Machine Learning (ICML), 2020, pp. 15971607.
-
K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, Momentum contrast for unsupervised visual representation learning, in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 97299738.
-
J. Grill et al., Bootstrap your own latent: A new approach to self-supervised learning, Advances in Neural Information Processing Systems, vol. 33, pp. 2127121284, 2020.
-
A. Radford et al., Learning transferable visual models from natural language supervision, in Proc. International Conference on Machine Learning (ICML), 2021.
-
A. Kirillov et al., Segment Anything, in Proc. IEEE International Conference on Computer Vision (ICCV), 2023, pp. 40154026.
-
M. D. Abramoff et al., Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices, npj Digital Medicine, vol. 1, no. 39, pp. 18, 2018.
