Medical Image Classification and Cancer Detection using Deep Convolutional Neural Networks

DOI : 10.17577/IJERTCONV9IS13003

Download Full-Text PDF Cite this Publication

Text Only Version

Medical Image Classification and Cancer Detection using Deep Convolutional Neural Networks

Akshay Kumar S Department of Computer Science College of Engineering Perumon

Perinad P O, Kollam, Kerala

Karthika A J Department of Computer Science College of Engineering Perumon

Perinad P O, Kollam, Kerala

Allen Saju N Department of Computer Science College of Engineering Perumon

Perinad P O, Kollam, Kerala

Fida Mohammed Khaleel Department of Computer Science College of Engineering Perumon Perinad P O, Kollam, Kerala

Gayathri J L

PG Scholar Department of Computer Science College of Engineering Perumon

Perinad P O, Kollam, Kerala

Sujarani M S

Assistant Professor (CSE) College of Engineering Perrumon, Perinad P O, Kollam, Kerala

Abstract – One of the most startling machine learning approaches is deep learning. Its utilized in image categoriza- tion, image detection, clinical archives, and object identification, among other things. Medical image archives are growingat an alarming rate, thanks to the widespread use of digital photographs as information in hospitals. Digital images play an important role in predicting the severity of a patients disease, and medical images have numerous applications in diagnosis and research. Because of recent advances in imaging technology, automatically classifying medical images is a research problemthat is still being worked on by computer vision researchers. Medical image classification according to various classifiers an appropriate classifier will be needed. After organ prediction and classification, the modification of the project was cancer detection. A pre-trained convolutional network and the trans- fer learning process similar to organ detection are used for cancer detection. The validation of this data was done by splitting train and testing data. The conclusion of this method is most suitable for the classification of different medical images of hu-man body organs.

Keywords – Medical Image Classification; Cancer Detection; Deep learning; Convolutional Neural Network (CNN); DNN; CBIR

Fig. 1. Sample images of medical image dataset


The spread of digital devices and camera technologies has changedthe exponential development in medical image output. A modern hospital is currently using a computerized picture to anticipate theseverity of the illness of a patient. As digital images evolve rapidly,classification of images has become increasingly useful. MagneticResonance Imaging (MRI) seems to be another type of scan that can make and provide accurate information as well as precisely ar-ticulated images of various organs of the body, as well as the brain.

Because of the high capacity of medical images, it is relativelynot possible for a doctor or physician to manually classify images.As a result, the recognition rate is entirely reliant on the image- based interpretation of functionalities. These features are essentialfor categorization, but they must be handmade to sustain aware- ness of previous knowledge. The creation of hand-crafted featuresis a difficult task that necessitates labelling the data for training and tuning. A data abstraction technique for image recognition andcategorization experimentation is more vibrant to the variability ofimaging techniques and diseases which has less sensitive to various domain-specific intentionality [19]. The proposed system aidedin reducing the cost of the prostate and some other frameworks. This technique is much precise for prostate classification. Even so, the techniques assessment was performed on sparse data, restricting the applications use of optimization algorithm with the support of texture analysis for separating the exterior and interior boundaries of the bladder wall [2]. The scholars used goals give to separate the strength and weakness bounds. The bidirectional convolutional repeated strands are used in a segmentation technique toretrieve internal and multi-data from the prostate [3]. The primary goal of medical image recognition studies is to categorize medicalimages of various parts of the body into the appropriate classes; thus, training-testing technologies are able to solve the issues. Theinterpretation of images is essential for categorization. Feature ex- traction is an important step in the medical field. Our technique learns through the use of a set of predefined images and features, and the results are based on test data principles. Reduced image elements like color, texture, spatial layout and shape are frequently used to transmit images as extracted features in order to describe the data [4];[5]. To conduct image classification, machine learning methods are regarded as low image graphics. In this context refers, machine learning and image classification study are shiftingto deep learning methods, which are thought to be even more ac- curate than the conventional machine learning approach to improve image classification and extraction efficiency,[6]-[8]. The deep neural networks operating concept is based on the human brain. The deep neural network model is made up of many different layers of neurons[?]. The data is routed through large structures that relate closely to the human brain, with neurons serving as processing elements. The proposed model has two phases, one is feature extraction and the other is a classification of the model. A convolutionalneural network consists of three types of layers they are convolutional layer, pooling layer and fully connected layer. Transfer learning of a network is associated with a fine-tuned approach [10]. Theclassification method was done by the last layers in CNN that willbe resulted in organ detection. The last layer that is a fully connected SoftMax layer was trained by adding our own dataset that isused for the further classification process [11]. In inception v3 the Google-based network is pre- trained in the image net dataset. We have to modify inception v3 by adding our own dataset. The outputof inception v3 was trained for its own classes. So, we have to utilize the top layers in inception v3 except for the last layer. We useda publicly available dataset of MRI images from different human organs, including the chest, breast, etc., in the Organ classification method [12],[13], as shown in fig.

  1. There seem to be a maximumof 12 classes, of data set are used in our method. There are 300 im-ages in every class, for a sum of 3600 images in 12 classes in the dataset. We used an activity plan, as a result, 2520 images are used for training and 1080 images are used for testing.

    All images are converted from DICOM (Digital Imaging and Communication in Medicine) to JPG format during the first step. To overcome this problem, before we nourished the data in the convolutionary neural network we performed intensity normalization. For

    the cancer detection system, we used a publicly available dataset containing images of the brain and breast. Such two classes were obtained from the Cancer records. Due to time constraints, we just established 2 parts of the human organ, that is brain and breast, for cancer detection. It will just be possible to collaborate on some con- cepts related to Medical image classification and Cancer detection for future research processes[14].


      1. Automatic Classification of Medical X-Ray ImageUsing a Bag of Visual Words

        In this paper [15], Mohammad Reza Zare, Ahmed Mueen, Woo Chaw Seng proposed an iterative classification framework that produces four classification models which were constructed from different classes to address the accuracy of thismodel. Because of the rapid advancements in medical devices, an increasing number of medical images are created. The expense of manually monitoring these images is very high, and it is also subject to mistakes. As a result, it has been enhanced through effective classification that uses data directly retrieved from the image. One common method is to divide images into subgroups and retrieve visual features fromeach part of the region separately prior to actually combining theminto a single application. The bag of visual words (BOW) model has also been used in medical image classification and artificial tasks. The classification process is divided into two stages: trainingand testing. During the training phase, the chosen features are derived from training images and the classifier trained on extractedmethods to develop a classification model. The bow is employed as a method for feature extraction. To accomplish this task, each feature vector in an image neighbor with a Euclidean metric.


        • All the results concluded that an average classification accuracy obtained on the whole dataset. It is not possible to obtain minimum precise for all individual categories.

        • The main drawback of database and similar databases where best performance on main classes hides a lack of accuracies in less predominant classes.

        • There is a misclassification have been occurred because of inter-class equality between both classes and they are retrieved to same sub-body region.

      2. Deep Learning of Feature Representation withMultiple Instance Learning for Medical ImagesAnalysis

        In this paper [16], Yan Xu, Tao Mo, Qiwei Feng, Peilin Zhong, Maode Lai5, Eric I-Chao Chang develop an algorithm with a minimum of manual annotation and good feature representations to complete high-level tasks like classification and segmentation. The article investigates the performance of completing high-level talks with minimal summarization and great feature interpretationsfor medical images. Images like cells have significant clinical functionalities in medical image analysis. Earlier developed characteristics such as SIFT and HARR are incapable of representing such particles exhaustively. As a result, feature representation is crucial. In this paper, we look at how deep learning can be used to automatically extract feature representations (DNN). We propose a deep learning-based structure with a series of linear filters in

        the encoder and decoder. Deep learning networks are created by deduced high-level features from reduced functionalities. The nodes of lower layers constitute lower grade elements, whereas the nodes of the upper layer reflect higher-level features. In comparison to reduce layer features, the last hidden layer nodes can symbolize intrinsically characteristics.


        • Due to features generated by the limited amount of unlabeled data and the single-layer network in the unsupervised feature learning, unsupervised feature performance is slightly worse than supervised.

      3. Deep Learning for Content-Based Image Retrieval: A Comprehensive Study

        In this paper [17], Ji Wan, Dayong Wang Performs better feature representations and similarity measures are critical to a content- based image retrieval (CBIR) systems extraction performance. Even though comprehensive research attempts over generations, itstays one of the most difficult open problems which significantly impedes the achievement of real-world CBIR systems. The biggestissue has been related to a very well semantic gap issue, which occurs among low-level image pixels recorded by machines and high-level semantic ideas considered by humans. Deep learning models enable a system to understand complicated tasks that meaningfully map raw sensory input data to the output without having to rely on the human-crafted focus on relevant domain knowledge by discovering deep architectures to learn featuresat different levels of abstraction from data automatically. In this article, we try to examine deep learning techniques with imple- mentation to CBIR tasks, influenced by the achievements of deep learning. Even though much research interest in applying deep learning for image segmentation and retrieval in computer vision, there seems to be a restricted level of attention concentrating on CBIR implementations. It seeks to analyze whether deep learning is a lengthy wish for closing the semantic gap in CBIR, as well as how much evidential enhancement in CBIR tasks can be attained by investigating state-of-the-art deep learning methods for learning visual features and similarity measures. Indistinct, we explorea deep learning framework with a proposal to CBIR tasks witha focus on a substantial set of empirical studies by analyzing state- of-the-art deep learning techniques (convolutional neural networks) for CBIR activities under larger samples.

      4. ImageNet Classification with Deep Convolutional Neural Networks

        In this paper [18], Alex Krizhevsky, Ilya Sutskever, Geoffrey

      5. Hinton Machine learning methods are used extensively in existingmethods to object identification. We can improve their position by collecting larger datasets, learning more efficient methods, and employing the best anti-overfitting methods. Till lately, databases of labelled images have been on the order of tens of thousandsof images. Simple recognition tasks could be fixed fairly well with databases of this size, mainly when supplemented with label-preserving transitions. ImageNet is a collection of around15 million labelled high-resolution images organized into approx- imately 22,000 categories. The images were gathered from the internet and branded by humans via Amazons Mechanical Turk crowd-sourcing tool. Starting in 2010, as part of the Pascal Visual Object challenge, a yearly contest called the ImageNet Large-Scale

      Visual Recognition Challenge (ILSVRC) has been kept. To modify our experimental studies, we did not use unsupervised pre-training, despite the fact that we consider it to assist, mainly if we receive sufficient processing power to substantially increase the size of the network without gaining a respective rise in the quantity of labelled data. Thus far, our results have enhanced as we have produced our net wider and prepared longer, It will take numerous orders of magnitude to meet the inferotemporal route of the human visual system. In the end, wed want to use very large deep convolutional nets on video streams where the temporal structure offers a piece of really meaningful information that is lacking or much less apparent in static images.


      • The networks size is limited mainly by the amount of memory available on current GPUs and by the amount of training time that we are willing to tolerate.

      • The network takes between five and six days to train on two GTX 580 3GB GPUs

      • The experiment suggests that the results can be improved simply by waiting for faster GPUs and bigger datasets to become available.


      1. Dataset

        In the organ classification method, we used a publicly available dataset that contains images of various human body parts such as the chest, breast, and so on. There are a total of 12 classes, and its worth noting that 11 of them come from a public collection of can- cer image archives, while the other 12 come from Messidor, which is sourced from an open-access website. There are 300 images in each class, for a total of 3600 images in 12 classes in the dataset. We employed a training-testing framework, thus we chose a train- test ratio of 70% and 30% at random in our scenario. As a result, for training, 2520 images are used, whereas, for testing, 1080 are used. There is no one image that is utilized for both training and testing. All images are converted from DICOM (Digital Imaging and Com- munication in Medicine) to JPG format in the first step. Because of the increased variety, the feature matri for the neural network to categorize more accurately may become more complex. Before feeding the data to the convolutional neural network, intensity nor- malization was used to address this issue. For the cancer detection method, we have used the dataset which is publicly available that contains images of the Brain and Breast. These two classes have been acquired from Cancer archives. There are 353 images in the class of brain which was the only available dataset and over 3000 images in the class of Breast. Here there is no one image that is utilized for both training and testing.

      2. Data Pre-processing

        Deep learning is now widely employed in a variety of applications that are based on artificial neural networks and treated similarlyto the human brain. Feedforward neural networks, which include many hidden layers, are good examples of deep architecture mod-els. All of the images are enlarged to the same distance scale, whichis the physical space that each pixel in an image represents. Each image was converted to 224*224 dimension after normalization, and an input image of 224*224 was provided to the initial convo- lution layers. Using the activation function of ReLu, the first con- volutional layer is applied using the kernel of 4*4 with the same padding, stride of 1 and 8 filters. All of the images have been re-

        Fig. 2. Confusion matrix of organ classification

        Fig. 3. Confusion matrix of Brain

        Fig. 4. Confusion matrix of breast

        duced in size to 224*224 pixels in order to be utilized as input for the InceptionV3 method.

      3. Fine Tuning of Pre-trained Network

      Typically, DCNN image classification and detection algorithms and rely on two phases, the first of which is feature extraction andthe second of which is the classification or detection module. Inan end-to-end learning framework, the feature is extracted by giv ing the training images and learned after that classifier of SoftMaxlayer is employed for training image data. The training model con- sisted of numerous layers, five of which were convolutional and three of which were completely connected. The deep learning al- gorithm directly learns low-level, mid-level, and abstract character- istics from images, which is in contrast to handmade features. To

      Fig. 5. Block diagram of proposed method

      extract the deep features, a training set of images is utilized as an input for the Inception-V3 pre-trained technique. We use 48 layers in our case, including convolution layers, fully connected layers, and a set of photos with various modalities. We relocated features of the Inception-V3 method by applying both training and valida- tion from our medical images dataset. The goal of soft-function is to perform re-learning based on the datasets 12 classes. The third version of Googles Deep Learning Convolutional Architectures, Inception V3, is the third in a series of Deep Learning Convolu- tional Architectures. The original ImageNet dataset consisting of 1 million training images, was used to train Inception V3 with a dataset of 1,000 classes. Transfer learning is a deep network tech- nique that allows us to retrain a network according to new levels by fine-tuning parameters. Features are taken from the given set of images throughout the fine-tuning phase. Using the kernel of 4*4, the same padding, the stride of 1 and 8 filters, a max-pooling layer is applied to reduce the size of the input. This pooling layer pro- duces a 112*112-dimensional output. Images are utilized as input for a pre-trained Inception-V3 method and freeze the last three lay- ers, with the goal of re-joining these layers with the remainder of the network. To the pre-trained Inception-V3 approach, we added a Global Average Pooling Layer, a Dropout layer, Batch Normal- ization Layer and a Dense (SoftMax) output layer in our scenario. The final fully linked layer is the same size as our datasets num- ber of classes, which is 12. We raised the learning rate factors of the fully connected layer by connecting the transferred layer with the remaining network to improve Inception-V3s learning process. Once the DCNN model has been refined and trained to categorize medical pictures and for detecting cancer, it is ready to use. The block diagram of the proposed method is shown in the fig. 5.


      The suggested deep convolution neural network for classifying medical images and detecting cancer has been developed and trained using a popular and widely-used deep learning method- ology in this paper. In this case, image pre-processing was per- formed first, followed by fine-tuning to improve classification ac- curacy while reducing training time and then using the same model to detect cancer. In the cancer detection model also, the images are pre-processed, then fine-tuning is done to improve the cancer de- tection. The irregularities in the images are removed through pre- processing. Then the 70% of the images are used for training and the rest 30% is stored away for validation and testing. We used 10 epochs to train the network. After training and validation, we could get the net accuracy of organ classification to be almost 97.95% against the validation accuracy and test accuracy as shown in fig. 2.

      Fig. 6. Validation and training accuracy of organ classification.


For classification tasks a deep learning strategy for the classification of medical images is developed by training the images. Through the use of machine tools and reliable image, analysis is animportant factor that can improve doctor and patient performance. The development of certain image analysis techniques that can as- sist doctors in numerous fields such as medical science is a necessity of the present society. Of that kind, techniques could save lives,and it is clear that sicknesses can be found to predict before they influence the human body. We hope to study massive image data sources in the following years for medical imaging and complication detection. From the existing papers, we can choose a better methodology which is the DCNN which gives a better outcome.


  1. M. R. Zare, W. C. Seng, and A. Mueen Automatic classifica- tion of medical X-ray images using a bag of visual words,IETCompute‌

  2. Y. Xu, T. Mo, Q. Feng, P. Zhong, M. Lai, and E. I.-C. ChangDeep learning of feature representation with multiple instancelearning for medical image analysis, ICASSP

  3. J. Wan, D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. Zhang, and J. Li Deep learning for content-based image retrieval: A com- prehensive study, Proc. ACM Int. Conf. Multimedia (MM), 2014

  4. Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton Im-ageNet

    Fig. 7. Accuracy in cancer detection in brain.

  5. Classification with Deep Convolutional Neural Net- works,Jan 2012 Rehan Ashraf, Muhammed Asif Habib, Muhammed Akram Deep Convolution Neural Network for Big Data Medical Im- age Classification,June 17, 2020

    Fig. 8. Accuracy in cancer detection in breast.

    Validation accuracy is the accuracy calculated in validation dataset and test accuracy is the accuracy calculated in the test dataset. The result shows that the proposed method based on a deep convolu- tion neural network is more reliable and efficient than the state- of- the-art methods in term of classification accuracy. In the cancer detection model, we could only manage to set up models for two classes the breast and the brain due to time constraints. The dataset available for breast were numerous and of high-resolution medical images and we could achieve an accuracy of 100% after training it for just 20 epochs as shown in fig. 3. The model detected each and every breast image which was cancerous and non-cancerous. After training the model of the class brain, we could achieve an accuracy of 83.34% after training it for 20 epochs as shown in fig.

    4. This model thus could detect if the image of the brain or breast was cancerous or non-cancerous. Python isused to implement the proposed method with the following hardware and software speci- fication: Windows 10, Intel Core i5, 1.60 GHz-2.30GHz with 4GB RAM. The proposed method has been evaluated in terms of Con- fusion Matrix which is based on the validation accuracy to the test accuracy as shown.

  6. Q. Zhu, B. Du, and P. Yan Boundary-weighted domain adaptive neural network for prostate MR image segmenta- tion,IEEE Trans.

    Med. Imag., Mar 2020

  7. P. Moeskops, M. A. Viergever, A. M. Mendrik, L. S. de Vries,

    M. J. N. L. Benders, and I. Isgum Automatic segmentation ofMR brain images with a convolutional neural network, IEEETrans. Med.

    Imag., May 2016

  8. R. Ashraf, K. B. Bajwa, and T. Mahmood Content-based im-age retrieval by exploring bandletized regions through support vector machines,J. Inf. Sci. Eng.,2016

  9. J. Kawahara, C. J. Brown, S. P. Miller, B. G. Booth, V. Chau, R. E. Grunau, J. G. Zwicker, and G. Hamarneh BrainNetCNN: Convolutional neural networks for brain net- works; towards predicting neurodevelopment,Neu- roImage,

    Feb. 2017

  10. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, Proc. IEEE Conf. Comput. Vis. Pattern Recog-nit. (CVPR), Jun. 2015, pp. 19.

  11. K. Kamnitsas, C. Ledig, V. F. J. Newcombe, J. P. Simpson, A.

    Kane, D. K. Menon, D. Rueckert, and B. Glocker Efficientmulti-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation,Med. Image Anal.,Feb. 2017

  12. T. Brosch and R. Tam Manifold learning of brain MRIsby deep learn- ing, Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. Berlin, Germany: Springer, 2013.

  13. A. Payan and G. Montana Predicting Alzheimers disease: A neuroimaging study with 3D convolutional neural net- works,2015

  14. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and

    X. Tang, Residual attention network for image classification,Proc. IEEE Conf. Comput. Vis. Pattern Recog-nit. (CVPR), Jul. 2017, pp. 31563164

  15. Mohammad Reza Zare, Ahmed Mueen, Woo Chaw Seng Au- tomatic classification of medical X-ray images using a bag ofvisual words,

  16. Yan Xu, Tao Mo, Qiwei Feng, Peilin Zhong, Maode Lai5, Eric I-Chao Chang Deep Learning of Feature Representation with Multiple Instance Learning for Medical Images Analysis,

  17. Ji Wan, Dayong Wang Deep Learning for Content-Based Im-age Retrieval: A Comprehensive Study,

  18. Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, Ima- geNet Classification with Deep Convolutional Neural Net- works,

  19. Q. Zhu, B. Du, B. Turkbey, P. Choyke, and P. Yan Exploiting interslice correlation for MRI prostate image segmentation, from recursive neural networks aspect,Complexity, Feb. 2018

Leave a Reply