Lung Pathology Detection Using Deep Learning

Ankith Kumaran R; Rohith Kumar B; Dr. J. Kalaivani

doi:10.17577/IJERTV12IS040098

Volume 12, Issue 04 (April 2023)

Lung Pathology Detection Using Deep Learning

DOI : 10.17577/IJERTV12IS040098

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 38
Authors : Ankith Kumaran R , Rohith Kumar B , Dr. J. Kalaivani
Paper ID : IJERTV12IS040098
Volume & Issue : Volume 12, Issue 04 (April 2023)
Published (First Online): 24-04-2023
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Lung Pathology Detection Using Deep Learning

Ankith Kumaran R, Rohith Kumar B, Dr. J. Kalaivani

SRM Institute of Science and Technology, Kattankulathur

Abstract:- Since the advent of COVID-19 , there has been a weakness to the lungs due to infections and the immunity of the same is more so ever under threat; the coronavirus disease 2019 (COVID-

19) that began at the start of this decade has had deleterious effects on the world and is still raging on, causing substantial losses to peoples lives. Consequently, there has also been a in-crease in lung related cases, since COVID-19 leads to weakening of lungs. This needs to be addressed immediately, or more lives will be lost to this deadly disease.With the advent of Ar-tificial Intelligence and Deep Learning, it has provided hope for fast and accurate medical di-agnosis, which is vital for healthcare facilities. Image Classification is the technique of extract-ing patterns and information from an image that plays a major role in medical image classifica-tion. With the applications of image classifica- tion and deep learning, the disease pneumonia can be identified just by feeding images to the processing model, which will then classify it into distinct classes – primarily affected or normal.

Keywords:- Pathology, Diagnosis, Deep Learning, Classifica-tion

1 Introduction

Ailments affecting the lungs and respiratory sys- tem are most prevalent in three regions: The lung airway, the tissues of the lung or the blood vessels surrounding the lung. These maladies can be fatal if not diagnosed at the earliest pos- sible and remedial measures administered. To counter this fatal observation, prompt diagno- sis of these diseases are vital, and the need of the hour considering the ongoing global pan- demic resulting in weakening of respiratory im- munity. Assistance to the medical community

can present itself in the form of diagnosis sys- tems engineered using the advancements in the field of image classification and deep learning. Chest radiographs, made available similar to the set of images used by this project published by the National Institutes of Health (NIH), can aid and assist in the creation of these medical systems and help elevate accuracy of diagnosis, which is essential in diagnosis systems.

The objective of this paper is to develop and compare CNN models [1] considering previous attempts [2][3][4][5] to create models, and address the drawbacks of the past attempts in order to maximise efficiency, and perform a comparative analysis. To achieve the highest accuracy, the models trained in this project are MobileNet[7], DenseNet[8] and ResNet[9] each seperately concatenated with a GAP Layer[20], a subset of Convolutional Neural Networks designed for embedded vision applications [15] with five convolution layers, and a fully connected layer as the output and final layer. The numbers of layers were decided through extensive training to find the optimal accuracy taking into account the computational costs and complexity associated with it. To achieve the optimal accuracy, three CNN models [13] are created which are then trained to classify and hence be able to accurately predict radiograph images as normal (or) affected. Accuracy was determined using evaluation metrics such as ROC curves[24] and similarity matrices, which are favored in medical imaging systems due to its ability to depict how accurately the model is able to classify the radiographs.

Related Work

There have been previous attempts to tackle this problem of classifying radiographs with near

perfect accuracy. Listed below are some cita-tions which we used to infer our study:

Xiaosong Wang et al. [2] published their paper which aimed at collecting chest radiographs from different sources using natural language process-ing extraction methods and subsequently named the dataset ChestX-ray8 comprising of 108,948 images segregated into eight disease classes. The paper was also able to demonstrate a vanilla CNN model which was able to detect and locate thoracic diseases of these eight classes spatially. They implemented a supervised multi-label im-age classification and localization framework, which was validated using the same dataset. As a result, the proposed model was able to achieve an accuracy of 0.7891 on average across the eight diseases.

Gerardo Luis et al. [3] proposed their pa-per, the objective being a lung disease de-tection model implementing feature extraction and extreme machine learning, to primarily de- tect three prominent diseases; Pulmonary Fibro-sis(PF), Chronic Obstructive Pulmonary Dis-ease(COPD), and Healthy Lung (HL) samples. The dataset used for the model was an image cluster set produced with the assistance of the Walter Cantidio Hospital, Fortaleza. The pro-posed methodology extracted featured from seg-mented CT images and applied extreme machine learning(ELM) to classify the lung disease in three classes. Additionally, this paper also put forward an algorithm to analysis of CT images on personal workstations, which reduced soft-ware dependencies and cost incurred with CT workstations.The models performance was eval-uated using metrics and confusion matrices, re-sulting in an 0.85 due to sensitivity and speci-ficity issues.

Jooae Choe et al. [4] approached the problem statement by applying content-based image re-trieval using deep learning for detecting inter-stitial lung disease using CT radiographs. The paper was based on applying content based im-age retrieval on a database consisting of four dis-ease classes – usual interstitial pneumonia (UIP)

, nonspecific interstitial pneumonia (NIP) , cryp-togenic organizing pneumonia and chronic hy-persensitivity pneumonia. Natural Language queries was executed to extract images from the resulting database. A deep learning model was created and the ability of the model to com-pare and contrast the extent of different re-gional based diseases were quantified using eval-uation metrics. The results were compared be-fore and after content-based image retrieval and it yielded positive results, with the overall accu-racy increasing from 0.524 to 0.728.

Naman Gupta et al. [5] published their find-ings of applying evolutionary algorithms to per-form the same task. Four evolutionary algo-rithms were applied to accurately classify im-ages into two classes; Chronic Obstructive Pul-monary Disease(COPD) and Fibrosis. These evolutionary algorithms ensure that the opti-mal subset was created for model training with lower dimensions to reduce computational costs. These optimal subsets were then fed into differ-ent classification models such as random forest classifier, support vector machines and so on. The highest accuracy was computed at 0.994 by K-nearest neighbors.

Siddanth Tripathi et al. [6] used a publicly pub-lished dataset named ChestX-ray 8 to train dif-ferent version of deep learning models and com-pared their accuracy dependent on certain fac-tors. Machine Learning as well as Deep Learning techniques was deployed in this paper, primarily CNN and CapsNet using spatial transformation techniques. Training and evaluation resulted in an accuracy of 0.67 with an combination of CNN built upon visual geometry group and a spatial transformer network.

3 Proposed Work

Pre-trained convolutional neural net-works(CNN) namely MobileNet[7], DenseNet[8] and ResNet[9] are going to function as the base layer of our model and transfer learning[25] methods are applied to fine tune the network to this particular dataset. In addition, four supplementary layers consisting of GlobalAver-agePooling [20] , dense and dropout layers are also adde to maximise accuracy and efficiency. Past attempts have only used pre-trained mod-els but in this paper, with the applications of transfer learning and additional layers, a higher accuracy will be reached with insignificant computational costs. The final output will have the same amount of neurons as the number of classes(14 disease classes + 1 normal class) and a sigmoid activation will be employed. After the model is trained, the network will be able to output percentages corresponding to which pathology the radiograph most closely exhibits, as most of these disease have similar symptoms and conditions. Thus, our model can be used as a preliminary scanning tool for medical practitioners to streamline their process of disease evaluation and elimination. These models will then be evaluated based on metrics such as categorical accuracy, ROC and AUC curves.

The dataset we will be training our model on,

Figure 1: Pathology Description

Figure 2: Pathology Correlation

titled ChestX-ray8[2] was released by the Na- tional Institute of Health, the central govern- ing agency in The United States of America and consists of more 110,000 chest radiographs with labelled pathologies. These images were extracted using text annotation and query ex- traction methods and gathered from over 30,000 patients, who shall remain anonymous for secu- rity reasons. These images were then segregated into 15 classes, 14 different classes and normal classes. For clarity, peruse the image below to determine how each disease can be distinguished from the rest and the correlation between them.

3.1 Convolutional Neural Network Archi- tecture

Convolutional Neural Networks[13] are feed- forward networks consisting of convolutional lay- ers, pooling layers and fully connected layers with different activation functions suitable for individual layers as required. In this project, various pre-trained CNN models were used with different activation functions employed in order to maximise feature extraction efficiency and ac- curacy.

Figure 3: Graph plot of Activation Functions

Convolutional layers[14] are the building block of these CNNs. The operation is defined as the merging of two function in statistics. For ease of access, the images are first converted into matri- ces of equal size, and convolutional filter is ap- plied to these images, which performs individual element-wise multiplication and stores the sum in a 3×3 feature vector. Numerous detector lay- ers are operated with the input matrix which ex- tract features to generate a layer of feature maps which forms an individual convolutional layer.

Activation functions: These are functions that are essential to determine whether a particular neuron in the neural network will activate or not, based on various mathematical functions. Dif- ferent activation functions can be employed in order to maximize the accuracy and quality of neural networks. The activation functions used differ from each other, in order to maximize fea- ture extraction and eliminate the vanishing gra- dient problem.

For the last dense layer, the binary cross- entropy[17] loss function will be used as op- posed to the softmax function. This is be- cause the probability of the 15 different classes should be independent of each other, which can- not be achieved while using the softmax func- tion. Thus, the network will internally be able to create 15 different models(one for each class), and compute their probabilities independently. As a result, the sigmoid function was also em- ployed to convert the multi-label classification into an n-binary classification problem.

These activation functions in combination are commonly used in developing CNNs as it is able to deal with the vanishing gradient problem and is useful for nonlinearity of layers.

Optimizers [9] are algorithms used to adjust the hyper-parameters of the neural network to minimise loss by changing the weights of the neuron and the learning rate, .The optimizer used in this paper is titled Adam[16] – short for Adaptive Moment Estimation, which uses both the function of RMS and momentum based gra- dient descent. Since our dataset involves a large number of parameters and data, Adam is most

useful since it is able to intuitively interpret loss functions and is able to tune the hyperparameter with minimal costs incurred.

Pooling layers are usually employed in CNNs to maximise pixel intensity of images. Global Average Pooling(GAP)[22] Layers are used in this paper to reduce the dimensionality and com- plexity of the image. This helps recognize only the most essential parts of the images.

Flattening and Fully Connected Layers are used to flatten the matrices, further reducing the dimensionality and producing a uniform feature vector which can be used for classification. This vector is then fed to the FC layer, which consists of multiple layers. All these layers are connected to each other extensively, and is used to extract features. This process is known as forward prop- agation, after which the loss function is calcu- lated. The loss function used in this model is Cross-Entropy [10], since this is a classification problem. This function is then backpropagated using the optimizer Adam, and the weights are subsequently modified.

Additional Features also employed in this model are algorithms such as Model Checkpoint, Reduce LR on Plateau and Early Stop- ping. Model checkpoint helps maintain a record of the best accuracy scores and we can revert back to these hyperparameters if the model does not perform as expected. Reduce LR on Plateau is used to reduce the learning rate , in case the gradient descent ends up in a plateau and misses out on the global minima. Early stopping is used to avoid overfitting to the training data and is a form of regularization.

Model
1. MobileNet
  
  MobileNet[7] is the first of four pre-trained mod- els considered for this paper. MobileNet was developed in 2017 by TensorFlow and is re- garded as its first mobile computer vision model. The base structure of MobileNet depends on an architecture that employs streamlined depth wise separable convolutions to construct neural networks that are extensive, but comparatively lightweight. It is able to exhibit lightweight neural network properties primarily due to the depth wise separable convolutions which reduce the number of parameters significantly. The ar- chitecture of MobileNet is set up as seen below.
  
  Depth Wise Separable Convolutions [26] are ex- tracted based on the principle of the Sobel Fil- ter[27], which states that image matrices can be consequently separated into its vertical and hor- izontal components, and computed separately.
  
  Figure 4: MobileNet Layers
  
  This helps reduce the number of columns and filters required to process the images, and ulti- mately hastens the process. Here, two 3*3 ma- trices are used as filters which are then convolved with the image to generate their derivatives- two products for both horizontal and vertical com- ponents.
  
  For example, consider the two filters
  
  1 0 1
  
  Gx = 2 0 2
  
  1 0 1
  
  and
  
  1 2 1
  
  Gy = 0 0 0
  
  1 2 1
  
  to be calibrated equivalent to the following values. Considering the original image to be A(matrix). When passed through a depth wise separable filter, the computations are as follows:
  
  1 0 1
  
  Gx = 2 0 2 A
  
  1 0 1
  
  1 2 1
  
  Gy = 0 0 0 A
  
  1 2 1
  
  Figure 5: MobileNet Architecture
  
  These filters can be further decomposed by pass- ing a smoothing filter to produce the products of an averaging and differentiation kernel. As a result, The above equations can be further sim- plified as:
  
  1
  
  Gx = 2 1 0 1 A
  
  1
  
  1
  
  Gy = 0 1 2 1 A
  
  1
  
  MobileNet employs 3×3 depthwise seperable convolutions that results in about 8-9 times less computations than the standard convolu- tions. From these computations, we can see exactly why MobileNet is able to produce uch lightweight networks with fast computations.
  
  All layers of the MobileNet are followed by a Batchnorm and nonlinearity function in ReLU, with the exception of the final layers which uti- lizes a softmax layer for classification. Addi- tional layers can be added based on image di- mensions and degree of classification.The archi- tecture of MobileNet is displayed in Fig 5.
2. ResNet
  
  Residual Networks, or ResNets, were introduced in 2015 by He et al.[9] when the deep learning community faced the conundrum of whether im- plementing better learning networks was as sim- ple as stacking more layers. Through research, it was formulated that as layers were added, train- ing and error rates also increased as a result of vanishing/exploding gradient. As the num- ber of layers increased, the accuracy flat-lined and subsequently started declining rapidly. This problem was addressed with the introduction of ResNets, with the addition of residual blocks helping in the elimination of the vanishing gra- dient problem.
  
  Through the implementation of residual blocks,
  
  Figure 6: Structure of Residual Blocks
  
  dient problem which has limited learning for a prolonged period of time. Residual blocks are based on a technique called skip connections, which essentially enables the network to skip some layers by connecting activations of initial layers to further layers, therefore bypassing some layers in between. This is especially beneficial because any layers that are found to affect the performance of the model can now be skipped, and ignored altogether by regularization.
  
  The degradation problem faced by the exten- sive number of layers is addressed by design- ing a deep-learning residual framework, where the layers are explicitly fitted to a residual map- ping. Formulating the activations of these layers as H(x), it is mapped to another nonlinear map- ping of F(x) = H(x) – x. As a result, the initial mapping of functions is now recast into F(x) + x. This is now simpler to optimize than the unref- erenced, extensive mapping present in conven- tional networks. This formulation can be con- structed by feedforward neural networks with shortcut connections, which enable the net- work to skip one or two layers. These connec- tions perform the primary function of identity mapping, and the results of their computations are concatenated to the output of the stacked layers, as shown in Fig. 6.
  
  There are a wide number of variants to the ResNet architecture, all titled differently. For this paper, ResNetX model is going to be imple- mented which consists of the following architec- ture(Fig. 7).
3. DenseNet
  
  DenseNets[8] are another class of networks that were established to alleviate the vanishing/ ex- ploding gradient problem that arose as a re- sult of the extensive number of layers. While other models like ResNet and Highway connec- tions[28] overcame this problem by implement-
  
  researchers have been able to overcome the gra- ing skip and shortcut connections, DenseNet
  
  Figure 7: ResNet Architecture
  
  Figure 8: DenseNet Architecture
  
  resolved this problem by simplifying the con- nection pattern between the convolutional lay- ers and modifying the system architecture. In traditional CNNs, each layer receives the out- put of the previous layers in a feed-forward fashion, implying that for N layers, there are N direct connections between concurrent lay- ers. DenseNet modifies these layer connections by simply connecting each layer directly with each other layer, hence the name – Densely Connected Networks. Therefore, there are N(N+1)/2 direct connections for layers in the DenseNet. This configuration ensures maximum information and gradient flow, by maximizing the applications of feature reuse.
  
  With this configuration, the output feature maps in the DenseNet are thus not summed to- gether, but instead concatenated and considered as inputs by the subsequent layers. As a result, DenseNet is able to function with fewer param- eters than the orthodox CNN, paving the way for feature reuse, where the features which con- tribute the most to image classification are max- imised and redundant feature maps are ignored. Consider the output of the layer l which is a concatenation of the preceding outputs x0 , x1 and so on, where x2 is the output of layer n after composite functions have been performed on the image x0. Thus, xl is formulated as follows:
  
  Xl = Hl([X0, X1, …Xl1])
  
  where [X0,…. xl-1] is the result of the concatena- tion of the feature maps computed in the previ- ous layers and Hl refers to the composite func- tions of operations like BatchNorm,ReLU and Pooling. These multiple computations of Hl can be densely packed into one single tensor for ease of implementation, hence justifying the title DenseNet.
Pipeline

The project pipeline follows the conventional Deep Learning approach, involving components such as Data Pre-processing, Model Training and Model Evaluation. The number of epochs determined for this model was fixed at 50 af- ter training and testing, taking into account the computational resources associated with it and the size of the dataset. Additional modules such as EarlyStopping, ReduceLRonPlateau, and ModelCheckpoint were also utilised to en- sure that training is halted if there is no signifi- cant progress or if accuracy starts to decline af- ter a set number of epochs. The best performing parameters are saved as best weights for refer- ence and future usage.The system architecture is illustrated as follows:

Figure 9: System Architecture and Design
1. Dataset
  
  Chest Radiographs dataset titled Chest X- ray16 of 41.86 GB has been downloaded from
  
  the Kaggle forum, released by the National In- stitute of Health, USA. This dataset consists of a total of 112,120 images consisting of 15 different classes.Each of these categories have been anno- tated by labels with a CSV file linking them.
2. Training
  
  In the beginning, the dataset was examined for low quality images and possible outliers. Sta- tistical analysis was initiated to determine if the data was skewed in one direction. Results showed that the data was skewed, hence nor- malization and regularization needed to be done. Data Augmentation was performed using the image process generator module and the dataset size was increased by altering the angle of view- ing and so on. The radiographs were then fed to the different models in batches of 32 with image dimensions set as 224. The training was per- formed on the NVIDIA GeForce RTX 2070 with Max-Q configuration. Along with CUDA accel- eration, the model processed 100 training images per batch, and 10 test images. The learning rate was initially set to 0.3, which was then modified accordingly as the epochs progressed.

Experimental Results

To evaluate the performance of the different models, accuracy, validation and testing loss, supplemented with sensitivity and specificity values which was then utilised to construct the AUC-ROC curve were the metrics considered. Accuracy is measured during each stage of the epoch where a batch of images is used to test how accurately the model is able to classify the images into their respective classes. Loss func- tions are calculated using back propagation and is an indicator of how far apart our predictions are from the actual target class.

Sensitivity is the metric that is responsible for displaying the proportion of positive classes that got correctly classified. It is also called true pos- itive rate. Higher the sensitivity score, the bet- ter the model is at accurately classifying images. Specificity, on the other hand, is the metric that signifies the proportion of negative classes that got correctly classified. It is also known as true negative rate. Both these metrics are plotted in a graph to construct the Receiver Operating Characteristics (ROC) curve, where Area Un- der the Curve(AUC) is calculated at different thresholds. These metrics ae calculated using the following formulae:

Sensitivity = TP/TP + FN (3)

Figure 10: ROC Curve for DenseNet Configura- tion

Specificity = TN/TN + FP (4)

FNR = FN/TP + FN (5)

FPR = FP/TN + FP = 1 Specificity (6)

False Negative Rate(FNR) and False Positive Rate(FPR) are additional metrics which tells us the proportion of images that been incorrectly classified(both positive and negative classes).

The ROC curve is an evalutation metric de- signed for classification problems, which is es- sentially a probability curve that plots TPR vs FPR values respectively at different classifica- tion thresholds, with FPR on the X-axis and TPR on the Y-axis. This curve can be sum- marized to produce the AUC score, which is an indicator of the ability of the classifier to distin- guish between classes. The ROC curve for this model is depicted in fig 10.

The different AUC values for each disease class and models can also be visualised in the form of a table, seen in table 1.

Disease Class	DenseNet AUC	ResNet AUC	MobileNet AUC
Cardiomegaly	0.913	0.53	0.825
Emphysema	0.936	0.54	0.856
Effusion	0.905	0.65	0.84
Infiltration	0.665	0.59	0.625
Hernia	0.867	0.59	0.862
Mass	0.886	0.52	0.704
Nodule	0.79	0.56	0.605
Atelectasis	0.834	0.60	0.72
Pneumothorax	0.906	0.52	0.839
Pleural Thickening	0.815	0.58	0.742
Fibrosis	0.786	0.61	0.688
Edema	0.914	0.74	0.842
Consolidation	0.818	0.64	0.749

Table 1: Disease Classes and their AUC scores

It can be inferred from table 1 that the various models performed well, with the AUC scores for each disease class being greater than 0.5 and close to 1. As a result, our models are able to distinguish between the different classes ac- curately. The DenseNet configuration achieved the highest AUC scores than the other models,

Table 2: Model Performance Comparison

and is also worth noting that the model achieved an accuracy of 88.7 percent, higher than previ- ously existing systems. Validation loss was also minimized as the epochs progressed, which is an indicator of a well-trained network.

Conclusion

This paper proposes different CNN models for accurately identifying normal and affected cases present in chest radiograph images. For optimal accuracy, the different CNN models was supple- mented with algorithms such as Early Stopping, ReduceLR on Plateau, and Model Checkpoint to ensure that the computational costs are not exorbitant in the trade-off for high accuracy.As a result, the different CNN models were able to achieve near optimal scores on the basis of accu- racy and AUC scores. Thus, it can be concluded that this CNN model can be used by medical practitioners and doctors as a diagnostic tool for early detection of pathologies and other aberra- tions. The CNN models was successfully im- plemented by utilizing various optimization al- gorithms like changing learning rates, adjusting the number of epochs and batch sizes and addi- tion of fully connected layers to maximise back propagation. As a result, this model can be in- stalled by medical practitioners to quickly pro- cess chest radiographs in order to detect lung diseases as early as possible in kids as well as adults. This ultimately helps increase the effi- cacy of the healthcare system in their time of diagnosis and treatment.

To verify the performance of our model, we cross-referenced the evaluation metrics of our neural network with other attempts to do the same; our algorithm was able to perform better than these algorithms, as seen in Table 2.

Future Enhancement

In the future, with more additions to optimise neural networks and advancements of computa- tional resources, diagnostic models can be imple- mented which outperform every previous model and achieve near perfect accuracy. It can also be expected that neural network models can be cou- pled with ensemble, bagging, boosting and other revolutionary concepts such as genetic program-

Model Design	Accuracy	Val Loss
CNN+VGG+STN[6]	0.677	0.699
VDSNet [23]	0.733	0.589
DenseNet121	0.887	0.376

ming in order to train and deploy high quality di- agnostic models. To deal with vast sizes of data and images, the training process can be trans- ferred to a cloud computing system, where mul- tiple GPUs can be utilised to hasten the train- ing process. A multidisciplinary effort where medical practitioners are able to configure these models and aid in feature extraction will also see a monumental increase in model accuracy and usage. Medical professionals would also need training to understand the functioning of these diagnostic tools available at their perusal.

References

[1] Jeffrey, Ahmad- Disease detection with deep Convolutional architecture

[2] Wang, X., Peng, Y., Lu, L., Lu, Z.,

Bagheri, M., Summers, R. M. (2017). ChestX- ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classifica- tion and Localization of Common Thorax Dis- eases. ArXiv.

[3] Ramalho GLB, RebouÃ§as Filho PP, Medeiros FNS de, Cortez PC. Lung disease detection using feature extraction and extreme learning machine. Rev Bras Eng BiomÃ©d [Internet]. 2014Sep;30(Rev. Bras. Eng. BiomÃ©d., 2014 30(3)).

[4]Jooae Choe, Hye Jeon Hwang , Joon Beom Seo, Sang Min Lee, Jihye Yun, Min-Ju Kim. Content-based Image Retrieval by Using Deep Learning for Interstitial Lung Disease Diagnosis with Chest CT

[5] Naman Gupta a, Deepak Gupta a, Ashish Khanna a, Pedro P. RebouÃ§as Filho b, Victor Hugo C. de Albuquerque. Evolutionary algo- rithms for automatic lung disease detection

[6] Siddhanth Tripathi, Sinchana Shetty, Somil Jain, Vanshika Sharma – Lung Disease Detection Using Deep Learning

[7] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., An- dreetto, M., Adam, H. (2017). MobileNets: Ef- ficient Convolutional Neural Networks for Mo- bile Vision Applications. ArXiv.

[8] Huang, G., Liu, Z., Weinberger, K. Q. (2016). Densely Connected Convolutional Net- works.

[9] He, K., Zhang, X., Ren, S., Sun, J. (2015).

Deep Residual Learning for Image Recognition. [10]Ilyas,Maksim,Tamerlan – Deep neural net-

work ensemble for pneumonia localization from a large-scale chest x-ray database

[11] Jeremy Irvin, Pranav Rajpurkar -Deep learning for chest radiograph diagnosis

[12] Karen Simonyan, Andrew Zisserman, Very Deep Convolutional Networks forLarge-Scale Image Recognition, International Conference on LearningRepresentations, 2015.

[13] Laith Alzubaidi, Jinglan Zhang, Amjad J. Humaidi- Review of deep learning: concepts, CNN architectures, challenges, applications, fu- ture directions

[14] Saad Albawi; Tareq Abed Mohammed; Saad Al-Zawi – Understanding of a convolutional neural network

[15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun – Deep ResidualLearning for Image Recognition

[16] Diederik P. Kingma, Jimmy Ba – Adam: A Method for Stochastic Optimization

[17] Zhilu Zhang, Mert R. Sabuncu – General- ized Cross Entropy Loss for Training Deep Neu- ral Networks with Noisy Labels

[18] Mir Mohammad Azad, Apoorva Ganapa- thy, Siddhartha Vadlamudi, Harish Paruchuri – "Medical Diagnosis using Deep Learning Tech- niques: A Research Survey"

[19] Alhassan Mabrouk, Rebeca P. DÃaz Re- dondo, Abdelghani Dahou Disease Detection on Chest X-ray Images Using Ensemble of Deep Convolutional Neural Networks

[20] C. Szegedy, W. Liu, Y. Jia et al., Going deeper with convolutions, in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 19, Boston, America, 2015.

[21] R. H. Abiyev and M. K. Maaitah, Deep convolutional neural networks for chest diseases detection, Journal of healthcare engineering, vol. 2018, 12 pages, 2018.

[22]Arna Ghosh ,Biswarup Bhattacharya, Som- nath Basu Roy Chowdhury. AdGAP – Advanced Global Average Pooling

[23] Bharati, S., Podder, P., Mondal, M.

R. H. (2020). Hybrid deep learning for de- tecting lung diseases from X-ray images. In- formatics in Medicine Unlocked, 20, 100391. https://doi.org/10.1016/j.imu.2020.100391

[24] Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of ma- chine learning algorithms. Pattern Recognition, 30(7), 1145-1159.

[25] Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu,

Y., Zhu, H., Xiong, H., He, Q. (2019). A Com-

prehensive Survey on Transfer Learning.

[26] Chollet, F. (2016). Xception: Deep Learn- ing with Depthwise Separable Convolutions.

[27] N. Kanopoulos, N. Vasanthavada and R. L. Baker, "Design of an image edge detection filter using the Sobel operator," in IEEE Journal of Solid-State Circuits

[28] Srivastava, R. K., Greff, K., Schmidhuber,

J. (2015). Highway Networks. ArXiv.