🔒
Global Knowledge Platform
Serving Researchers Since 2012

Breast Cancer Detection by Utilizing Convolutional Neural Network: A Strategy by Making Use of Machine Learning

DOI : 10.17577/IJERTCONV14IS010088
Download Full-Text PDF Cite this Publication

Text Only Version

Breast Cancer Detection by Utilizing Convolutional Neural Network: A Strategy by Making Use of Machine Learning

Rahul.A

Student, St. Joseph Engineering College, Mangalore

Hareesh. B

²HOD, St Joseph Engineering College, Mangalore

Abstract – The leading type of cancer that women face worldwide is breast cancer, making early detection vital for effective treatment. This paper presents a system that is automated for detecting breast cancer using CNN. The system assesses medical images to categorize breast tissue as benign, malignant, or normal. It achieved a test accuracy of 97.3%, indicating strong potential for clinical use. The system also features a web interface enabling healthcare professionals to upload images and receive instant predictions, including confidence scores and clinical insights. Additionally, it offers automated report generation and database management to facilitate comprehensive patient records and support clinical decision-making.

Index Terms – Breast cancer, Convolutional Neural Networks(CNN),Medical imaging, Machine learning, Deep learning

B

  1. INTRODUCTION

    reast cancer remains a common cause of cancer-related mortality among women worldwide. According to the

    WHO estimates that approximately 2.3 million cases of breast cancer occur each year, making it the most widespread cancer among women worldwide. Detecting the disease early significantly enhances treatment success and survival rates, with five-year survival rates exceeding 90% when the disease is diagnosed promptly. The socioeconomic burden of breast cancer is considerable, impacting healthcare systems, patients' quality of life, and economic productivity due to treatment expenses and work absences.

    Traditionally, mammography is used as a screening technique to detect breast cancer, and ultrasound and MRI rely heavily on radiologists' expertise. These methods can be influenced by human error and differences in interpretation. For instance, mammographys effectiveness drops in dense breast tissue, resulting in missed diagnoses in as many as 20% of cases. Medical imaging has been integrated into machine learning and artificial intelligence, which has demonstrated promising potential to enhance accuracy and cut down interpretation time, helping to overcome these challenges.

    Deep learning, especially (CNNs), has shown outstanding results in medical image analysis tasks. CNNs can

    The system automatically detects key features, such as edges, textures, and intricate patterns, in medical images, aiding in the uncovering of abnormalities that human observers might miss. This paper provides an all-inclusive study on developing and deploying a CNN-based framework for detecting the type of breast cancer. The model categorizes breast tissue images into benign,malignant, and regular classes. It is built to be scalable and seamlessly integrate into current clinical workflows, marking a substantial step forward in AI-enhanced diagnostics.

  2. LITERATURE REVIEW

    1. Traditional Detection of Breast Cancer

      Traditional breast cancer detection mainly relies on imaging techniques like mammography, which has been the primary screening method for many years. Mammography utilizes low-dose X-rays to detect microcalcifications and masses; however, its effectiveness is reduced by high false-positive rates. This is especially true in dense breast tissue, where sensitivity can drop to as low as 48%. Ultrasound is used alongside mammography to improve imaging in dense breasts, but it requires highly skilled operators and lacks standardization. MRI is costly and time-consuming, requires specialists, which limits its routine use for screening.

    2. Medical Imaging in Machine Learning

      The application of medical Imaging in machine Learning has grown significantly in recent years. The utilization of Support Vector Machines (SVM) and Random Forest algorithms classifies breast cancer by analyzing handcrafted features, such as texture and shape. Still, these methods often need detailed feature engineering and face challenges with high-dimensional data. Research shows these approaches achieve accuracies between 85% and 90%, suggesting there is potential to improve their robustness and ability to generalize.

    3. Approaches in Deep Learning

    Latest studies confirm models that use deep learning excel in analyzing medical images. CNNs, in particular, have proven effective in interpreting mammograms, with several studies showing accuracy levels that match or surpass those of experienced radiologists. For example, Shen et al. (2019) reported an accuracy of 94.5% using a deep learning model on a large mammography dataset. These advancements highlight the potential of CNNs to revolutionize the screening of breast cancer by reducing human error and improving diagnostic consistency.

  3. METHODOLOGY

    1. Problem Statement

      The main challenge in this research is creating an accurate, automated system that detects breast cancer to help healthcare professionals with early diagnosis and classification of breast tissue abnormalities. The system aims to surpass the limitations of traditional methods by leveraging deep learning to provide dependable, real-time classifications.

    2. Dataset Description

      The medical photos are included in the dataset utilized for this investigation, gathered from many publicly accessible sources, including the Digital Database for Screening Mammography (DDSM) and the Breast Ultrasound Image Dataset. The collection process included consolidating various sources, applying tailored preprocessing steps to improve image quality, curating data to verify label accuracy, and balancing the classes: benign, malignant, and normal. The dataset comprises over 10,000 images, with approximately 3,500 images per class, ensuring sufficient statistical robustness.

    3. Data Preprocessing

      The steps involved in preprocessing are resizing images to 224×224 pixels to ensure uniform input, normalizing pixel values to the range [0, 1] by dividing by 255, and data augmentation techniques such as rotation (up to 20°), scaling (ranging from 0.9 to 1.1), and horizontal flipping. These measures increase dataset variability and help minimize overfitting during training.

    4. Model Architecture

      The proposed CNN architecture features multiple convolutional layers, followed by pooling and fully connected layers. It starts with 32 filters of size 3×3, then escalates to 64 and 128 filters in subsequent layers, all of which are activated with ReLU. Max-pooling decreases spatial dimensions, while dropout layers with a 0.25 rate help mitigate overfitting. The last connected layer provides class probabilities using the Softmax activation function.

    5. Training Configuration

      The training setup involves an input of 224 × 224 × 3 pixels for RGB images and utilizes a batch size of 32. The Adam optimizer starts with a learning rate of 0.001, which is reduced by tenfold every 10 epochs. The loss function used is categorical cross-entropy, and the metrics tracked include accuracy, precision, recall, and F1-score. The training duration is 50 epochs on an NVIDIA GTX 1080 Ti GPU, with early stopping employed to prevent overfitting.

      IV SYSTEM IMPLEMENTATION

      1. Web-Based Interface

        The system is built as a web app using Flask, providing an intuitive interface for healthcare professionals. It features patient information management with secure data entry, image upload capabilities with format validation (e.g., .png,

        .jpg), real-time prediction results displayed ith confidence scores, and automated report generation in PDF format.

      2. Database Integration

        The system utilizes a database(MySQL) for storing user information (e.g., age, medical history), prediction history with timestamps, confidence scores ranging from

        0 to 1, clinical recommendations, and audit trails for regulatory compliance. The database is encrypted using AES-256 to ensure data security.

      3. Report Generation

    The automated report generation feature produces detailed PDF reports with the Report Lab library. These reports contain patient demographics, prediction outcomes with confidence scores (such as 97% for malignant), clinical suggestions (like recommending a biopsy for malignant cases), and visualizations of probability distributions using bar charts, which help clinicians

    interpret the data more easily.

    V RESULTS AND DISCUSSION

    1. Model Performance

      The Neural Network model developed has achieved an accuracy of 97.3%, with a precision of 96.8%, a recall of 97.1%, and an F1-score of 96.9% on a test set of 12,000 images. These benchmarks were obtained through a 10-fold cross-validation process to ensure reliability.

    2. Class-wise Performance

      The model showed consistent accuracy across normal (98%), benign (96.5%), and malignant (97.5%) classes, with a high sensitivity of 95% for identifying malignant cases. This high sensitivity is vital for minimizing false negatives in clinical environments.

    3. Confidence Score Analysis

      The system assigns confidence scores to each prediction based on the highest SoftMax probability. Predictions with scores above 95% are highly accurate, with only 2% being misclassified, which shows the system's dependable decision support.

    4. Clinical Recommendations

    The system provides automated clinical recommendations based on classification: regular monitoring (such as annual ultrasounds) for benign conditions, urgent referrals to oncology with biopsy advice for malignant conditions, and routine screening (like biennial mammographies) for normal conditions. These guidelines are customized to meet the standards of the World Cancer Society.

    B. Test Set Evaluation

    A separate test set of 2,000 images, not included in the training set, was used to evaluate the final model. The results confirmed the models effectiveness, with no significant drop in accuracy compared to the validation set.

    C. Clinical Validation

    Five radiologists initially clinically validated the system by reviewing 100 cases. Their feedback highlighted the systems usability, with 80% of predictions aligning with expert diagnoses, demonstrating strong potential for clinical adoption.

    Fig. 1: Sample Images of categories of Breast Tissue

  4. VALIDATION AND TESTING

    A. Cross-validation

    The model was validated using 10-fold cross-validation, with each fold consisting of 90% training data and 10% validation data. The consistent performance across all folds, indicated by an accuracy standard deviation of less than 1%, confirms the models robustness.

  5. LIMITATIONS AND CHALLENGES

    1. Dataset Limitations

      The dataset's limit of 10,000 images may restrict the applicability of the findings to diverse populations. Variations in image quality arise from source differences, and preprocessing biases for customization might impact real- world accuracy. Moreover, specific demographics, like Asian populations, are underrepresented, and the data only extends through 2023.

    2. Technical Limitations

      Computational demands necessitate high-end GPUs, which create deployment challenges in low-resource environments. Model interpretability remains a black box, with ongoing research into methods like Grad-CAM. Large-scale applications may encounter real-time processing limits, and dependence on specific hardware reduces portability.

    3. Clinical Implementation Challenges

    Regulatory compliance requires FDA or similar approvals, which involve extensive testing. Integration complexity comes from ensuring compatibility with hospital information systems, and user training is vital for accurately interpreting AI outputs. Additionally, liability issues related to diagnostic mistakes must be addressed.

  6. FUTURE WORK

  1. Model Enhancement

    Future initiatives aim to expand the dataset by partnering with multiple institutions, leveraging advanced models such as ResNet-50, incorporating explainable AI with attention mechanisms, merging multimodal data sources like ultrasound and MRI, and supporting continuous learning through online updates.

  2. System Enhancement

    Scalability improvements include deploying on the AWS cloud, developing applications for iOS and Android, enhancing real-time processing with edge computing, introducing advanced reporting with 3D visualizations, and automating quality control along with statistical process control.

  3. Clinical Studies

Planned activities include large-scale clinical trials across

10 hospitals, real-world testing with 5,000 patients, and assessments of diagnostic accuracyaiming for 99% sensitivityand workflow improvements to cut diagnosis time by 30%.

VI CONCLUSION

This paper introduces a system that detects two types of breast cancer based on Convolutional Neural Networks, achieving an accuracy of 97.3%. It highlights the ability of deep learning models used in medical imaging and provides a practical tool to support clinical decisions, despite some challenges remaining for their wider adoption. The system utilizes a multi-source, customized dataset that provides diverse yet clinically relevant data, contributing to its high accuracy. Features like a web interface, automated report generation, and database integration make it suitable for clinical use. However, issues related to dataset size, computational demands, and regulatory hurdles need to be addressed. Future efforts should focus on expanding datasets, obtaining regulatory approval, and conducting thorough clinical validation to ensure the safety, effectiveness, and usability of these products in healthcare settings.