Colon Cancer Classification on Histopathological Images using Deep Learning Techniques

DOI : 10.17577/IJERTCONV10IS08017

Download Full-Text PDF Cite this Publication

Text Only Version

Colon Cancer Classification on Histopathological Images using Deep Learning Techniques

Deiva Nayagam R

Assistant Professor

Dept. of ECE Ramco Institute of Technology Rajapalayam, India

Aarthi K

Dept. of ECE,

Ramco Institute of Technology, Rajapalayam, India

Mirra S

Dept. of ECE,

Ramco Institute of Technology, Rajapalayam, India

Abstract Colorectal cancer is recognized in each gender around the globe. As per the report generated by WHO in 2018, carcinoma placed within the third position whereas

    1. million people are affected. Therefore, it is the second most common cancer among females & third among males. An effective way to recognize colon cancer at an early stage and substantial treatment can reduce the ensuring death rates to a great extent. We use histopathological images for classification in our work. In this work, we proposed the best classification methodology based on CNN methods. Then, we used deep learning technology to distinguish between healthy and diseased large intestine tissues. We train a neural network, were going to use 20% of the image dataset for validation and the other 80% for training. From this we can get the classified output having an accuracy of about 99.7%

      Keywords Colorectal cancer, Histopathological Images, Deep Learning, Image classification

      segmentation, video analysis, to recognize obstacles in self-driving cars, aswell as speech recognition in natural language processing. As CNNs are playing a significant role in these fast-growing and emerging areas, they are very popular in Deep Learning. CNNs have fundamentally changed our approach towards image recognition as they can detect patterns and make sense of them. They are considered the most effective architecture for image classification, retrieval and detection tasks as the accuracy of their results is very high. The CNN- based deep neural system is widely used in the medical classification task. CNN is an excellent feature extractor, therefore utilizing it to classify medical images can avoid complicated and expensive feature engineering. Since CNN has these many advantages we are using it in our work in order to achieve high performance and better accuracy.


        A cancer of the colon or rectum, located at the digestive tract's lower end. Early cases can begin as non- cancerous polyps. Colon cancer can occur in any part of the colon. An examination of entire colon using a long, flexible tube equipped with a camera (colonoscopy) is one way to detect colon cancer and polyps. The tissues that is collected from colonoscopy is examined under a microscope and the image obtained after examination is histopathological imageand this histopathological images are used in our work. In our work we use CNN for training our model. CNN is one of the deep learning technique which helps in achieving higher accuracy. Deep Learning, which has emerged as an effective tool for analyzing big data uses complex algorithms and artificial neural networks to train machines/computers so that they can learn from experience, classify and recognize data/images just like a human brain does. Within Deep Learning, a Convolutional Neural Network or CNN is a type of artificial neural network, which is widely used for image/object recognition and classification. Deep Learning thus recognizes objects in an image by using a CNN. CNNs are playing a major role in diverse tasks/functions like image processing problems, computer vision tasks like localization and


        Colorectal carcinoma (CRC) is one of the most common cancers and one of the main causes of cancer- related death globally. This kind of cancer has a considerableburden in most of the regions, according to recent epidemiological statistics, and it is still associated with very high fatality rate. As a result, early tumour detection and differentiation are critical for the life and well-being of a huge number of patients. The examination of hematoxylin and eosin (H&E)-stained tissue sections by microscopy remains the first step in the diagnostic workup of solid tumours. This is a time- consuming process that necessitates meticulous attention to detail. Furthermore, diagnoses are impacted by the expertise and experience of the pathologist, and they are not always repeatable amongst pathologists. We provide a deep learning-based technique for detecting and segmenting colorectal cancer using digitised H&E- stained histology slides. In this study, we show that when compared to pathologist-based diagnosis using H&E-stained slides digitised from clinical samples, this neural network technique provides median accuracy of 99.9% for normal slides and 94.8% for cancer slides. Given our approach's excellent accuracy on normal slides, neural network methods could be used as a screening tool. Many treatments are based on molecular

        investigations, which include extracting tumour tissue from paraffin blocks for sequencing, thanks to the emergence of targeted medications. By acting as a screening device, an automated system might potentially minimise pathologists' effort while also reducing diagnosis subjectivity [1]

        The majority of the work in tissue-based diagnostics is still done manually by a pathologist using a microscope to examine hematoxylin and eosin (H&E)-stained slides. Quantifying the size and number of tumour areas is required to define pN stage in metastatic disease or tumour content for downstream genetic research. The ability to accurately identify cancer/malignant cells from normal/benign cells is at the heart of such tasks. The estimation of tumour content, on the other hand, is unreliable, with significant interpathologist variation. Results may differ greatly depending on the disease entity, pathologist's experience, and pathologist's current state. Furthermore, accreditation organisations require the identification of tumour content and the circling of tumour locations on H&E slides to designate areas to sample for downstream genomic analysis in order to enrich for tumour content and ensure correct determination of genetic variations. Pathologists must frequently employ high magnification to discover tumour cells due to the tiny size of tumour regions. Pathologists' workload will be greatly increased as a result of this need. Many applications, such as image processing,[2] sound/voice processing, and language translation, have been effectively used to deep learning. Deep learning has recently been shown to be useful in medical image processing, such as magnetic resonance imaging, computed tomography, biopsies, and endoscopy. [3] Digital pathology datasets havejust been publically available, allowing researchers to test the viability of using deep learning techniques to improve the speed and accuracy of histologic diagnosis. CAMELYON17 [4] is a project that aims to detect and classify breast cancer metastases in whole-slide images of histological lymph node sections. Previous study on the CAMELYON dataset found 92.4% sensitivity using a convolutional neural network (CNN) architecture, which is much greater than a human pathologist conducting an exhaustive search, which yielded 73.2% sensitivity. Deep learning can also be used to predict clinical outcomes from histology pictures directly. A CNN was trained with a nine- class accuracy of >94% using more than 100,000 H&E picture patches. When compared to the Union for International Cancer Control staging system, this information has proven to be effective in improving survival prediction. [5] We employed a deep learning-based approach to detet tumour locations in a model of colorectal cancer and compared the findings to those obtained by a pathologist to see if CNNs may be an appropriate aid in a clinical situation. When compared to pathologists, our method obtains an accuracy of 99.9% for normal slides and 94.8% for cancer slides on H&E-stained histology slides. Automation of this task would save highly

        trained pathologist resources in a high-volume molecular laboratory. [6] Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, they present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Method proposed uses a multilayered Long Short- Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Finally, they found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier. Convolutional Neural Network (CNN) and the famous architecture LeNet-5, we successfully classified functional MRI data of Alzheimers subjects from normal controls, where the accuracy of testing data reached 96.85%. This experiment suggests that the shift and scale invariant features extracted by CNN followed by deep learning classification represents the most powerful method of distinguishing clinical data from healthy data in fMRI. This approach also allows for expansion of the methodology to predict more complicated systems [7]. In [8] they designed design a modularized neural network for LDCT and compared it with commercial iterative reconstruction methods from three leading CT vendors. While popular networks are trained for an end- to-end mapping, our network performs an end-to- process mapping so that intermediate denoised images are obtained with associated noise reduction directions towards a final denoised image. The learned workflow allows radiologists-in-the-loop to optimize the denoising depth in a task-specific fashion. Our network was trained with the Mayo LDCT Dataset, and tested on separate chest and abdominal CT exams from Massachusetts General Hospital. This study confirms that our deep learning approach performed either favorably or comparably in terms of noise suppression and structural fidelity, and is much faster than the commercial iterative reconstruction algorithms.[9] They presented a framework to automatically detect and localize tumors as small as 100 x100 pixels in gigapixel microscopy images sized 100,000 x 100,000 pixels. Our method leverages a convolutional neural network (CNN) architecture and obtains state-of-the- art results on the Camelyon16 dataset in the challenging lesion- level tumor detection task. At 8 false positives per image, they detect 92.4% of the tumors, relative to 82.7% by the previous best automated approach. This approach could considerably reduce false negative rates in metastasis detection [10]. Combining the deep learning system's predictions with the human pathologist's diagnoses increased the pathologist's AUC to 0.995, representing an approximately 85 percent

        reduction in human error rate. These results demonstrate the power of using deep learning to produce significant improvements in the accuracy of pathological diagnoses.


        Image classification is the process of categorizing and labeling groups of pixels or vectors within an image based on specific rules. A class is essentially a label, for instance, 'car', 'animal', 'building' and so on. The categorization law can be devised using one or more spectral or textural characteristics. The 3 main types of image classificationtechniques in remote sensing are:

        • Unsupervised image classification.

        • Supervised image classification.

        • Object-based image analysis.

        Convolutional Neural Networks (CNNs) are the backbone of image classification, a deep learning phenomenon that takes an image and assigns it a class and a label that makes it unique. Image classification using CNN forms a significant part of machine learning experiments. We use supervised classification in our work. Supervised classification uses classification algorithms and regression techniques to develop predictive models.


        Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning.

        Learning can be supervised, semi- supervised or unsupervised. Deep-learning architectures such as deep neural networks, deep belief networks, deep reinforcement learning, recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis, climate science, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance. Artificial neural networks (ANNs) were inspired by information processing and distributed communication nodes in biological systems. ANNs have various differences from biological brains. Specifically, artificial neural networks tend to be static and symbolic, while the biological brain of most living organisms is dynamic (plastic) and analogue. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Early work showed that a linear perceptron cannot be a universal classifier, but that a network with a non- polynomial activation function with one hidden layer of unbounded width can. Deep learning is a modern variation which is concerned with an unbounded number of layers of bounded size, which permits practical application and optimized implementation, while retaining theoretical universality under mild conditions. In deep learning the layers are also permitted to be

        heterogeneous and to deviate widely from biologically informed connectionist models, forthe sake of efficiency, trainability and understandability, hence the "structured" part. Deep learning is a class of machine learning algorithms that uses multiple layers to progressively extract higher-level features from the raw input. For example, in image processing, lower layers may identify edges, while higher layers may identify the concepts relevant to a human such as digits or letters or faces.

        A. Convolutional neural network

        Deep learning is a machine learning technique that teaches computers to do what comes naturally to humans: learn by example. Deep learning is a key technology behind driverless cars, enabling them to recognize a stop sign, or to distinguish a pedestrian from a lamppost. Within Deep Learning, a Convolutional Neural Network or CNN is a type of artificial neural network, which is widely used for image/object recognition and classification. Deep Learning thus recognizes objects in an image by using a CNN. Deep is more like a marketing term to make something sounds more professional than otherwise. CNN is a type of deep neural network, and there are many other types. CNNs are popular because they have very useful applications to image recognition. CNN is a supervised type of Deep learning, most preferable used in image recognition, imagesegmentation and computer vision [12].


        Instead of using existing machine learning, we are going for CNN. We are training the model using CNN techniques.

        Fig.1:Flow diagram of classification using CNN

        The Fig.1 represents the flow of the classification steps that is proceeded in our work. Sometimes pathologists may assume cancerous cells as normal cells which may lead to serious health problems even death. Hence by analysing histopathological images images, we can detect the malignant cells more accurately using classification techniques. The classification of cancerous cell types has been found to be more accurate using the CNN classification method. The input images are pre- processed to eliminate illumination and low contrast difference between colon and the background. In the

        existing methods, they have used different classification methods which gives less accuracy whereas the application of CNN gives higher accuracy. Usage of python in the proposed system (colon cancer classification) reduces the complexity with better accuracy. Using classification we can predict whether the tissues are malignant or benign. In metric evaluation we quantify the performance of our trained model. This involves pre-training a model on an initial dataset, and then using the model to make prediction on a different target dataset which was not used during training. Then the next step is comparing the predictions to the expected value in the holdout datasets. The datasets used in our work is LC25000 [11]. LC25000 is a dataset that contains 25000 histopathological images of lung and colon. In this dataset there are five classes and each class contains 5000 images. We are using colon datasets for our work(i.e.) 10,000 images. Colon images are classified into colon adenocarcinoma and benign colonic tissue. And from these 10,000 images 20% images are used for testing and 80% used for training.

        Step1: Here the histopathological images are given as input to the system. 20% of the image dataset is given for validation/testing and the remaining 80% are used for training the system.

        Step2: Here pre-processing techniques are performed in- order to enhance the images, to eliminate noise (i.e., background) and to separate the background of the image from the required part (i.e., colon)

        Step3: Then the system is trained using Convolutional Neural Network

        Step4: And this system trained using CNN in step3 is considered as the model (purpose: to classify the images)

        Step5: The remaining images in the image dataset is given as input to the model

        Step6: Then the model produces the classified output and the accuracy that we obtain in our work is 99.7%


First we import the libraries that are required for training and testing. After successfully importing all the libraries, we split the image datasets into two. 20% of the image dataset is given for testing and the remaining 80% is given for training. In total there are 10,000 images, when split in the above mentioned ratio 2000 images are used for testing and 8000 images are used for training. Then we train the model using Convolutional Neural Networks(CNN). The total parameters of the CNN model including the trainable and non-trainable parameters are displayed. After training the model, epoch/iteration is done and the accuracy and the loss during training and testing is obtained as output. Accuracy is defined as the percentage of correct predictions for the test data. It can

be calculated easily by dividing the number of correct predictions by the number of total predictions. The graphical representation of the training accuracy and loss, validation accuracy and loss is obtained in fig.2.

Fig.2: Graphical representation of validation, trainingaccuracy and loss

Here the image is classified and we get an accuracy of 99.7%. When compared to all the other methods and techniques, classification of colon cancer images using CNNhas the highest accuracy and the computational time is also less. The model trained using CNN is highly efficient but requires very less resources. The use of supervised learning outperforms the unsupervised techniques. Using deep learning techniques, we have achieved high performance. In the proposed model error rate and loss is comparatively less. This addresses many needs in the medical field. It reduces the work of the pathologists, though pathologists are experienced there are chances when they can assume cancer cells as normal cells, to avoid these kind of consequences the proposed model is very useful. This makes image classification easier and even the output will be more accurate than the manual classification.


We thank Ramco Institute of Technology, Rajapalayam for providing the support for completing the work successfully. Also We thank the management, department, and staff members for providing the laboratory to utilise the resources in an unconditional manner.


[1] Brunnström H, Johansson A, Westbom-Fremer S, Backman M, Djureinovic D, Patthey A, et al. PD-L1 immunohistochemistry in clinical diagnostics of lung cancer: Inter-pathologist variability is higher than assay variability. Mod Pathol 2017;30:1411-21.

[2] Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: A Large-Scale Hierarchical Image Database, in IEEE Conference on Computer Vision and Pattern Recognition, Miami,2009.

[3] Min JK, Kwak MS, Cha JM. Overview of deep learning in gastrointestinal endoscopy. Gut Liver 2019;13:388-93.

[4] Bandi P, Geessink O, Manson Q, Van Dijk M, Balkenhol M, Hermsen M, et al. From detection of individual metastases to classification of lymph node status at the patientlevel: The CAMELYON17 challenge. IEEE Trans Med Imaging 2019;38:550- 60.

[5] Kather JN, Krisam J, Charoentong P, Luedde T, Herpel E, Weis CA, et al. Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study. PLoS Med 2019;16:1-22.

[6] Sutskever I, Vinyals O, Le QV. Sequence to Sequence Learning with Neural Networks, in Advances in Neural Information Processing Systems 27. New York, United States:Curran Associates, Inc.; 2014. p. 3104-12.

[7] Sarraf S, Tofighi G. Deep Learning-Based Pipeline to Recognize Alzheimer Disease Using fMRI Data, in Future Technologies Conference, San Francisco; 2016.

[8] Shan H, Padole A, Homayounieh F, Kruger U, Khera RT, Nitiwarangkul C, et al. Competitive performance of a modularized deep neural network compared to commercial algorithms for low- dose CT image reconstruction. Nat Mach Intell 2019;1:269-76.

[9] Liu Y, Gadepalli K, Norouzi M, Dahl GE, Kohlberger T, Boyko A,et al. Detecting Cancer Metastases on Gigapixel Pathology Images, arXiv: 1703.02442v2; 2017.

[10] Wang D, Khosla A, Gargeya R, Irshad H, Beck AH. Deep learning for identifying metastatic breast cancer. arXiv:1606.05718v1; 2016.

[11] Manju Dabass, Sharda Vashisth, Rekha Vig, A convolution neural network with multi-level convolutional and attention learning for classification of cancer grades and tissue structures in colon histopathological images, Computers in Biology and Medicine, Vol. 47, 2022.

[12] R.Deiva Nayagam, A.Anitha Rahini, A.Farzana Fathima, K.Lakshmi Gomathi, An Enhanced Retinal Vessel Segmentation using Deep Convolution Neural Network Journal of Physics, conference Series, 1917, 2021.