Computerization Method to classifying of Red Blood Cells using Boosting Technique

: Red blood cells are the rounded and flexible, keep oxygenation to the human body. Along with red blood cells other abnormal cells like sickle cells, elongated cells and Echinocytes present in the body. In present microscopic image are containing touching and overlapping cells, which have to segment properly without any error then proposed automated methodology is established. To identify shape of blood cells and classify in different class according to shape of cells using image processing technique and machine learning approach. XG-boost is the classiﬁer used for classiﬁcation of the cells in multiclass. Firstly, performed binary classification of cells and multiclass classification of abnormal cells are executed. It is performed in both analysis single as well as multilevel analysis. So testing accuracy for Extreme Gradient Boosting 98%achieved.


I. INTRODUCTION
The red blood cells (RBC) are highly specific, well adapted for their primary function of transporting oxygen from the lungs to all of the body tissues. This is flexible and round in shape. . In Some portion of human other abnormal cells are present e.g. sickle cells, elongated cells and Echinocytes [10]. Sickle cell anaemia is a serious and frequently fatal disease characterized by a genetic defect of haemoglobin. This disease primarily affects RBC's where the RBC will be Sickled in shape or crescent shaped. Sickle cells are stick to the vessels due to lack of flexibility, blockage of blood and experience painful episode of pain called crises. Elliptocytes are also called ovalocytes is an hereditary disorders, this abnormal cells present in patient rather than normal cells such as infectious anemias, thalassemia, iron defiency anaemia and in newborn babies. Echninocytes are burr-like erythrocyte with narrow, uniform, spike like surface projections The new reagent for sickle test has been adjusted and found optima concentration. However when applied to fresh red blood cell, immediately the RBC transform to echinocytes. A conditions Associated with Echinocytes causes Pyruvate Kinase Deficiency, Uremia, Microangiopathic Hemolytic Anaemia. This type of anaemia happens when red blood cells are destroyed and loss elasticity by an abnormal process in our body before their lifespan is over. As a result, our body doesn't have sufficient RBC to function. The average red cells are lives 100-120 days and sickle cells in human lives 10-20 days. Anaemia is a serious global public health problem that affects approximately 25% of the world's population, particularly west sub-Saharan Africa, Asia and some region in India. There is no cure to prevent from anaemia but early diagnosis is very important. Thus, the simple blood test is a tiresome and erroneous work which should be replaced by an effective, progressive and correct tool to successfully diagnose the sickle cell disease. One such tool can be prepared by using Image Processing Technique and machine learning algorithm. This tool has to run on open-cv python [2]support to the entire operating system such as Windows, Linux, macOS, FreeBSD, NetBSD etc. It is faster and effective than Matlab. Open-CV is a computer vision library and open source software.

II. LITURETURE SURVEY
Recently, researcher used image processing techniques to identify blood abnormalities by automatic methods. P. Rakshit and K. Bhownik [3] presented Weiner filter and Sobel Edge detection method to find the edge of the blood cells. Its required small execution time and computationally efficient but not concentrated to segment overlapping cells. Aruna N.S. and Hariharan S. [4] employed different Edge detectors like Canny, Sobel, Roberts, Prewitt and LoG detectors are used to find out the best detector for automatic diagnosis. In this result found that Canny Edge Detection method is more superior for the diagnosis as it gives more details of the original image. Hala Algailani and Musab Elkheir Salih Hamad [5] watershed segmentation used to properly classify the overlapping cells but problem occurred with over segmentation affect quality of detection that reason use nonlocal means filter. Chy and Rahaman [6] uses threshold segmentation to segment RBC containing images but not beneficial for overlapping cells and SVM classifier efficiently trained to testing purpose.
In this studied, some quit problem of segmentation technique to segment microscopic images then proposed system used segmentation and XG-Boost classifier for the classification of RBC images into various categories.

III. PROPOSED SYSTEM
This section describes the workflow of the research and the machine learning algorithms used in this study. This section also explains how the dataset was generated, pre-processed, and trained and tested. " Fig. 1" shows the entire process used to build the machine learning models in this research. The workflow divided into two phase training and testing phase to build machine learning model. The data are split into training and testing phase, testing phase accounting for 60% of the total data. The remaining 40% are used as the test set. In training phase step by step procedure follows for overall system performance to predict accurate output. Same as like in testing phase, feature extracted from test data to given for the prediction. Feature construction transformed the raw  [12], which is available at //erythrocytesidb.uib.es/. The images containing dataset in RGB format given to the pre-processing phase for correct detection. B. Pre-processing:-To prepare the input images according to the standard input of the proposed system, pre-processing is applied on the dataset. The aim of pre-processing is an improvement of the microscopic image data that suppresses unwanted distortions or enhances some image features important for further processing. In pre-processing certain operation performed resize image, remove noise and destroyed unwanted spot or holes to gives wrong meaning. This is beneficial for accurate detection and classification of red blood cells. The acquired image in RGB form and converted into grey. Median filter used to remove noise, find all connected component and clear the border. C. Segmentation:-Image Segmentation techniques make a MASSIVE impact here. The anaemia blood smear will be segmented by appropriate segmentation methods. Segmentation of images required five main techniques such as pixel intensity, threshold, boundary detection, region-based processing and morphological methods. Here Morphological based segmentation operations are applied on binary images. Closing and opening morphological operation perform by using Erosion and Dilation operator. D. Feature Extraction:-In this step, extracted shape and textural feature from the images. On this basis extract relevant information present in the image.

Shape feature
In the shape feature, perimeter, circularity factor, deviation factor, form factor, aspect ratio, equivalent diameter, diameter, major axis and minor axis are computed.
• Area is actual number of pixels in the region. This variation is used to find out abnormalities associated with changes in red cell dimensions. • Perimeter is the distance around the boundary of the pixel. • Circularity factor computed as ratio major axis to minor axis. • Deviation factor is the ratio of circularity factor to area. • Form factor computed as (4*π*area/(perimeter) 2 ) • Aspect ratio is the ratio of the objects to its width. • Equivalent diameter Computed as sqrt(4*Area/pi). • Diameter computed as 4*area/perimeter.

Textural feature
In this, Haralick and Hu moment are used to find out texture information from the image. Four main haralick features namely Homogeneity, contrast, entropy and energy are used.

E. Extreme
Gradient Boosting (XG-Boost)Classification:-For the classification of blood cells XG-Boost is used here. Firstly, RBC is classified in normal cell and abnormal cells. After that abnormal cells are categorised in three different classes which are sickle cells, elongated cells and echinocytes. XG-Boost [7] based on decision-tree ensemble Machine Learning algorithm that uses a gradient boosting framework. In detection problems are not involving structured data.it uses unstructured data (text, images, etc.) artificial neural networks have a tendency to leave behind all other frameworks. However, decision tree based algorithms are considered in structured/tabular best-in-class right now. The algorithm differentiates itself from other because portability, cloud integration, language and Parallelization. In boosting grows tress sequentially and removes error from previous tree. It is learn from predecessors and update the residual error [8] [11].
• An initial model F0 is defined to predict the target variable y. This model will be associated with a residual (y -F0) • A new model h1 is fit to the residuals from the previous step • Now, F0 and h1 are combined to give F1, the boosted version of F0. The mean squared error from F1 will be lower than that from F0: (1) IV EXPERIMENT RESULT Proposed work can be implemented on microscopic images of any size. Microscopic images of red blood cells contain number of normal and abnormal cells. From the image region of interest is extracted and find feature of each cells. Firstly process on each cells and classified various categories.

A. Single Level Analysis
For verifying the accuracy of proposed system, microscopic images used in studies is being utilized. The Linux is utilized as operating environment and Open-CV Python 3.6.9 is adopted for experimentation of proposed work. " Fig. 2" shows Analysis of normal cells ROI extracted from the microscopic image taken as input in " Fig. 2(a)".Resize input image are converted into gray scale and enhance quality of image in " fig.2 (d)". Morphological operation and threshold based segmentation is applied shown in " fig.  2(e)".obtain gray scale image to extract texture feature are HU-moment and haralick. " Fig. 2(g)" it can be concluded that cells detected as normal cells category. Same result it is concluded for " fig.3 (g)" Sickle cells, " fig.4(g)" Elongated cells and " fig.5(g)" Echinocytes.        V. CONCLUSION The present study demonstrates the effectiveness of different features and the Extreme Gradient Boosting Algorithm for Red Blood cells identification and classification. The region of interest is segmented upon the base of the shape and texture feature. Form factor, Aspect ratio, Circularity factors and Deviation Factors are features utilized to examine the shape of the cells and perform a vital role in the classification process. The processing time of the XG-Boost classifier for training is 3.06 sec. and for testing is 5.40 sec. The XG-Boost classifier achieves the classification accuracy of 98%, 97% recall and 98% f1-score. Another system can be developed with large dataset of abnormal cells in multiclass to high accuracy.