Diabetic Retinopathy Detection using Machine Learning

: - Diabetic retinopathy is a disease caused by uncontrolled chronic diabetes and it can cause complete blindness if not timely treated. Therefore early medical diagnosis of diabetic retinopathy and it medical cure is essential to prevent the severe side effects of diabetic retinopathy. Manual detection of diabetic retinopathy by ophthalmologist take plenty of time and patients need to suffer a lot at this time. An automated system can help detect diabetic retinopathy quickly and we can easily follow-up treatment to avoid further effects to the eye. This study proposes a machine learning method for extracting three features like exudates, hemorrhages, and micro aneurysms and classification using hybrid classifier which is a combination of support vector machine, k nearest neighbour, random forest, logistic regression, multilayer perceptron network. From the results of the experiments, the highest accuracy values 82%. Hybrid approach produced a precision score of 0.8119,Recall score of 0.8116 and f-measure score of 0.8028.


I. INTRODUCTION
Diabetic Retinopathy is a complication that affect the eye due to the result of high blood glucose called diabetes.It can cause vision loss and in severe condition can lead to complete blindness.Early symptoms of diabetic retinopathy includes blurred vision, darker areas of vision, eye floaters and difficulty in perceiving colours.Proper detection of diabetic retinopathy in early stage is extremely important to prevent complete blindness.Of an estimated 285 million people with diabetes mellitus worldwide, approximately one third have signs of diabetic retinopathy.Globally the number of people affected with diabetic retinopathy will increase from 126.6 million in 2010 to 191.0 million by 2030.Non Proliferative Diabetic Retinopathy (NPDR) is an early stage of disease in retina where tiny red spots occur.These tiny spots may represent haemorrhage and abnormal pouching of blood vessels represents microaneurysms.The lining of these blood vessels can become damaged enough to allow leakage of fluid and fatty material called exudates.Available physical tests to detect diabetic retinopathy includes pupil dilation, visual acuity test, optical coherence tomography, etc.But they are time consuming and patients need to suffer a lot.This paper focuses on automated computer aided detection of diabetic retinopathy using machine learning hybrid model by extracting the features haemorrhage, microaneurysms and exudates.The classifier used in this proposed model is the hybrid combination of SVM and KNN.

II . LITERATURE REVIEW [Farrikh Alzami
, 2019] described a system for diabetic retinopathy grade classification based on fractal analysis and random forest using MESSIDOR dataset.Their system segmented the images, then computed the fractal dimensions as features.They failed to distinguish mild diabetic retinopathy to severe diabetic retinopathy.[Qomariah 2019] proffered an automated system for classification of Diabetic Retinopathy and normal retinal images using concurrent neural network (CNN) and support vector machine (SVM).Features comprised of exudates, haemorrhage and microaneurysms.The author partitioned the proposed system into 2 parts: the first part composed with feature extraction based on neural networks and the second part performed classification using SVM.[Kumar, 2018] proposed a system for improved diabetic retinopathy detection by extracting area and number of microaneurysms using colour fundus images from DIARETDB1 dataset.Pre-processing of fundus images were performed using green channel extraction, histogram equalization and morphological process.Principal component analysis (PCA), contrast limited adaptive histogram equalization (CLAHE), morphological process, averaging filtering were applied for microaneurysms detection and classification is done by linear support vector machine (SVM).[Mohamed Chetoui, 2018]  [Sangwan, 2015] described a system that identifies different stages of diabetic retinopathy based on blood vessels, haemorrhage and exudates.The features are extracted using image pre-processing and they are fed into the neural network.SVM based training provided into the data and classify the images into three categories as mild, moderate non proliferative diabetic retinopathy and proliferative diabetic retinopathy.But the system could not give expected results if the exudate areas in the fundus images exceeds that of an optical disc size.[Morium Akter, 2014] described a system for morphology based exudates detection from colour fundus images.The model uses grayscale conversion, histogram equalisation, thresholding, erosion, dilation, logical AND operation and watershed transformation.The system produces an output with ranges of exudates affected in diabetic retinopathy.[Handayani, 2013] proposed a system for the classification of non-proliferative diabetic retinopathy using soft margin SVM.Hard exudates in the retinal fundus images are used to classify severity level of non-proliferative diabetic retinopathy.Mathematical morphology is applied to segment hard exudates.But the system does not include micoaneurysms and haemorrhage as the features.
[Saravanan, 2013] proposed an automated system for the red lesion diabetic retinopathy detection based on microaneurysms using GMM classifier.The feature is extracted using mathematical morphology, filter based method and supervised learning method.Severity level of candidate microaneurysms is detected into four stages.
[Venkatalakshmi, 2011] described automated system for hard exudate detection using sharp edge and colour highlights as two features.Methods involved in the detection process were colour based classification, sharp edge detection, and extraction of optic disc.Training and testing were done using DRIVE and DIARETDB0 dataset.The system used MATLAB 7.8 for graphical user interface (GUI).

A. Dataset
This study used publicly available Kaggle Dataset for Diabetic Retinopathy Detection.The database was created with images taken from publicly available retinopathy detection datasets.The Kaggle dataset contain 1000 images with diabetic retinopathy and 1000 images without diabetic retinopathy.From the total images we have chosen 122 images with diabetic retinopathy and 122 normal images.Chosen abnormal images contains exudates, hemorhages, and microaneurysms.
The presence of diabetic retinopathy is based on the appearance, number, spread and size, area of exudates, microaneurysms, and hemorrhages as shown in Fig. 1.Exudates are the bright areas with the yellowish appearance which has colour variance from colour of optic disc in slight range.The ruptured blood vessel contains lipid causes the occurrence of exudates.The ruptured microaneurysms in the blood vessels causes the formation of hemorrhages.Spread of exudates and hemorrhages appear in severe diabetic retinopathy images which is the last stage of diabetic retinopathy.

a. Pre-processing
In image pre-processing, to find exudates, initially image from dataset is converted to HSV image.Colour space conversion is converting an image that is represented in one colour space to another colour space, the goal being to make the translated image look as similar as possible to the original.Red, Blue, Green channels in the given image to Hue, Saturation, Value.It is useful to extract yellow coloured exudates from RGB image when we convert RGB to HSV.Then edge zero padding, median filtering and adaptive histogram equalization is done.

Image Segmentation
After image pre-processing, to segment exudates we have done smoothing, masking and bitwise AND.Smoothing is employed to remove high spatial frequency noise from image.Image blurring is achieved by convolving the image with a low-pass filter kernel.Masking is an image processing method in which we define a small 'image piece' and use it to modify a larger image.Here we are masking yellow coloured ([60,255,255]) exudates and optic disc in smoothed image with blue ([0,0,0255]) colour.Bitwise AND operations are used in image manipulation and used for extracting essential parts in the image.Bitwise operations help in image masking.Image creation can be enabled with the help of these operations.These operations can be helpful in enhancing the properties of the input images.Here we are combining input image with masked image there by eliminate portions other than optic disc and exudates from original image.

c. Feature Extraction
For binary classification, here we are using 2 features, ie, number of exudates as first parameter and number of hemorrhages and micro aneurysms as second parameter.That is, we are counting number of white pixels from the segmented images and divide it by total number of pixels in the image.

d. Classification
In the proposed method we are implementing hybrid classifier.That is we are using combination of five classifiers, Support vector machines, K nearest neighbours, Random forest.Each classifier will classify the total 244 images into either normal or abnormal image.SVM classifier with kernel radial bias function and degree 3 is used.After obtaining the classifiers we have done voting as hybrid method.Training of dataset is done on five different classifiers and testing is done.Training and testing set are prepared in ratio 80:20.SVM: Support Vector Machine is a supervised machine learning algorithm which is extensively used for both classification and regression day to day problems .It is mostly used in classification problems rather than regression problems.In the SVM algorithm, we will have n number of features.We can plot these each data item hat is n features as a point in n-dimensional space where the value of each feature represent the value of a particular coordinate in the n dimensional space.Then we classifies the plotted data points into n classes by means of a hyperplane.KNN: The k-nearest neighbors (KNN) algorithm is a simple and it is easy-to-implement focused on supervised machine learning algorithm.It is mainly used to solve both classification and regression problems.A supervised machine learning algorithm is one that pointed on labelled input data from user V. CONCLUSION In this proposed method hemorrhages, exudates and microaneurysms are detected.For exudate detection green channel extraction, masking, smoothing, bitwise AND are done which results in better calculation and extraction of exudates.For detection of hemorrhages and micro aneurysms, morphological operations are performed like opening.Dilation and erosion operators are performed here.For diabetic retinopathy detection, count the number for MA occurred, count the number of hemorrhages occurred and count the number of exudates occurred in the image so we can decide the condition of image.Then features are calculated and feed to both SVM, KNN, Random Forest classifier.Voting of three classifiers are chosen as final prediction .So from the extracted feature it directly concludes the disease grade as normal or abnormal.So earlier detection and diagnosis of diabetic retinopathy help the patients from blindness and also the severe effects of disease can be decreases.

Fig 1 .
Fig 1.Normal image and an image with exudates, hemorrhages and microaneurysmsB.Steps Fig 2 a) Abnormal image before pre-processing

Fig 4 .
Fig: Flowchart of exudate segmentationTo segment hemorrhages and microaneurysms median blurring, thresholding, image erosion and image dilation are performed.Image erosion and dilation are the morphological operations performed on image.Thresholding partitions an image into foreground and background.This image analysis technique is a type of image segmentation that isolates objects by converting grayscale images into binary images.Morphological Opening is defined as an erosion followed by a dilation.Opening can remove small bright spots and connect small dark cracks.This tends to open up gaps between features.Morphological erosion sets a pixel to the minimum over all pixels in the neighbourhood.Morphological dilation sets a pixel to the maximum over all pixels in the neighbourhood.Fig 5.represents abnormal and normal images after hemorrhages and micro aneurysms segmentation.He segmented images are represented in binary images where white spots in the images represents the feature vectors or parameters.These parameters are counted for further classification processes.

Fig:
Fig: Flowchart of hemorrhages and micro aneurysms segmentation

International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 http://www.ijert.org IJERTV9IS060170 (This work is licensed under a Creative Commons Attribution 4.0 International License.) Published by : dataset
, directed to learn a function.The function produces an appropriate output when a new unlabelled data is feed on the algorithm.KNN captures the idea of similarity which is often called distance / proximity / closeness.Here we are calculating the distance between points on a graph.This distance is used to classify the given data.That is less distance with data point suggests that higher similarity.It is the simplest method of combining the outputs from multiple machine learning algorithms.Initially we create two or more standalone machine learning models with our training dataset.Then a voting classifier can then be used to combine our standalone models and average the predictions of the standalone sub-models when a new data is given to the model for predictions.The predictions of the sub-models can be weighted by providing weight for each models manually or heuristically.FN) is the case when the result is negative and individual can have it.SVM results in 68% accuracy.KNN classifier results in 76% and random forest results in 90% accuracy.After voting of three classifiers, the testing set results in 82% accuracy.Hybrid approach produced a precision score of 0.8619,Recall score of 0.8116 and f-measure score of 0.8028.That is out of 49 test samples 36 produced correct prediction.