DOI : 10.5281/zenodo.20745407
- Open Access
- Authors : Radhika Shinde, Nikita Borade, Srushti More, Sanika Gaikwad
- Paper ID : IJERTV15IS060674
- Volume & Issue : Volume 15, Issue 06 , June – 2026
- Published (First Online): 18-06-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Implementation of Multimodal AI Breast Cancer Detection System (MultimodalAI-BCD)
Radhika Shinde
Department of Computer Engineering Jayawantrao Sawant College of Engineering Pune, India
Srushti More
Department of Computer Engineering Jayawantrao Sawant College of Engineering Pune, India
Nikita Borade
Department of Computer Engineering Jayawantrao Sawant College of Engineering Pune, India
Sanika Gaikwad
Department of Computer Engineering Jayawantrao Sawant College of Engineering Pune, India
Abstract – Breast cancer remains one of the leading causes of mortality among women worldwide, making early and ac- curate diagnosis critically important. This paper presents the implementation of a multimodal artificial intelligence system for breast cancer detection using both medical imaging and struc- tured clinical data. The proposed system combines Convolutional Neural Networks (CNNs) for mammogram image analysis with structured patient data, including age, tumor size, family history, and hormonal factors, to improve diagnostic performance. The architecture consists of four major layers: Data Acquisition Layer, Preprocessing Layer, Multimodal AI Fusion Layer, and Prediction Visualization Layer.
The implementation process includes dataset preparation, image preprocessing, feature extraction, multimodal fusion, model training, evaluation, and deployment. Experimental results demonstrate that integrating imaging and structured clinical features improves accuracy, sensitivity, and robustness compared to conventional single-modality approaches, thereby enhancing the reliability of breast cancer diagnosis.
Index Terms – Breast Cancer Detection, Multimodal AI, Deep Learning, Convolutional Neural Networks (CNN), Structured Data, Medical Imaging, Mammography, Healthcare AI, Feature Fusion, Machine Learning
-
INTRODUCTION
Breast cancer is among the most common cancers affect- ing women globally. Early diagnosis significantly increases survival rates and improves treatment outcomes. Traditional diagnosis relies heavily on radiologists interpreting mammog- raphy images, ultrasound scans, and biopsy reports. However, manual diagnosis may suffer from inter-observer variability, delayed interpretation, and limited integration of patient clin- ical history.
Recent advancements in Artificial Intelligence (AI) and Deep Learning have enabled automated medical image anal- ysis with high precision. Most existing systems focus pri- marily on image-based diagnosis while ignoring structured clinical data that can provide additional diagnostic context. Multimodal AI addresses this limitation by integrating med-
ical imaging with structured patient information to improve prediction accuracy and diagnostic reliability.
This paper presents the implementation of a Breast Cancer Detection System using Multimodal AI with image and struc- tured data. The proposed framework combines mammogram image analysis through Convolutional Neural Network (CNN)- based feature extraction with patient metadata processed us- ing machine learning techniques. The integrated architecture improves classification reliability and supports real-time diag- nostic assistance.
The system architecture consists of four major layers: Data Acquisition Layer, Preprocessing Layer, Multimodal AI Fusion Layer, and Prediction Visualization Layer. The im- plementation covers dataset preparation, image preprocessing, feature extraction, structured data processing, multimodal fu- sion, model training, evaluation, and deployment. The pro- posed approach aims to enhance diagnostic performance by leveraging complementary information from both imaging and clinical data sources.
The remainder of this paper is organized as follows. Section
II presents the Bill of Materials, including hardware and software requirements. Section III describes the environment setup and configuration process. Section IV details the imple- mentation of the multimodal AI pipeline. Section V explains the Flask backend implementation. Section VI discusses the security layer implementation. Section VII presents the mobile application implementation. Section VIII describes integration testing, followed by result analysis in Section IX. Section X discusses deployment considerations, and Section XI con- cludes the paper with future directions.
-
BILL OF MATERIALS
-
Hardware Components
Edge Node: High-performance workstation with a multi- core processor and a minimum of 16 GB RAM. GPU:
NVIDIA RTX Series (dedicated GPU for deep learning pro-cessing). Storage: SSD storage for fast dataset access and model loading. Display: High-resolution monitor for medical image visualization. The hardware setup provides sufficient computational power for medical image analysis and AI model training.
-
Software Dependencies
Operating System: Ubuntu 20.04 LTS or Windows 11. Python 3.9+ (via Anaconda distribution). TensorFlow 2.12.0
/ PyTorch 2.1.0 for deep learning model training. OpenCV
4.8.0 for medical image preprocessing. Scikit-learn 1.3.0 for structured data ML pipeline. Flask 3.0.0 for REST API de- ployment. Streamlit 1.28.0 for dashboard interface. DICOM libraries: pydicom 2.4.0 for mammogram image handling. Mobile/Web: Flutter 3.16.0; Dart 3.2.0; packages: dio 5.3.3, flutterlocalnotifications16.1.0, sqflite2.3.0, provider6.1.1.
-
-
Environment Setup
-
Workstation OS Installation and Configuration
Install Ubuntu 20.04 LTS or Windows 11 as the primary operating system. Enable GPU drivers by installing NVIDIA CUDA Toolkit 11.8 and cuDNN 8.6 for deep learning acceler- ation. Configure a static IP address to ensure consistent Flask API endpoint addressing. Set up the Anaconda environment manager to isolate Python dependencies and avoid system- level conflicts.
-
Python Virtual Environment
Create an isolated Python environment to avoid system- level dependency conflicts by initializing a virtual environment named breastaienv. Install PyPI dependencies from a pinned requirements.txt file to ensure reproducibility. Install Tensor- Flow with GPU support following the official TensorFlow documentation, which requires compatible CUDA and cuDNN versions. Verify GPU availability using TensorFlows device discovery utilities.
-
Dataset and DICOM Stream Validation
Confirm dataset accessibility and DICOM image readability before AI pipeline integration by loading sample mammogram images using pydicoms dcmread interface. A successful pixel array extraction confirms image readability. Configure image normalization parameters and validate that structured clinical CSV files are correctly parsed by pandas with the expected column schema.
-
-
AI Pipeline Implementation
-
CNN-Based Image Feature Extraction Module
The imaging branch uses a Convolutional Neural Network (CNN) for extracting high-level visual patterns from mammo- gram images. CNNs are particularly well suited for medical image analysis due to their ability to learn hierarchical spatial features through repeated convolution and pooling operations.
The system employs transfer learning using pretrained models including ResNet50, EfficientNet, and VGG16, all trained on large-scale image datasets such as ImageNet.
The detection inference function accepts a preprocessed 224 × 224 normaized NumPy array, invokes the CNN back- bone, and post-processes output feature maps to extract tu- mor characteristics including shape morphology, tissue texture gradients, mass density distribution, and edge irregularity patterns. The function returns feature vectors containing the class name, confidence score, and tumor bounding coordinates (x1, y1, x2, y2).
-
Structured Data Processing Module
Structured clinical data is processed through a dedicated ma- chine learning pipeline operating in parallel with the imaging branch. This pipeline ingests patient metadata including age, menopausal status, tumor size, lymph node involvement, histo- logical grade, estrogen receptor status, progesterone receptor status, and family history of cancer. These features provide biological and clinical context that is not directly observable from imaging data alone.
Preprocessing of structured data begins with missing-value imputation using median or mode strategies depending on the feature type. Numerical features such as age and tumor size are normalized using min-max scaling or z-score stan- dardization. Categorical variables such as receptor status and histological grade are encoded using one-hot encoding or label encoding. A Gradient Boosting classifier (XGBoost) consumes the processed features with 5-fold cross-validation for robust evaluation.
-
Multimodal Fusion and Final Decision Logic
A Multimodal Fusion class merges outputs from both the CNN imaging branch and the structured data branch. Diagno- sis is declared malignant when the CNN confidence exceeds the classification threshold or when the structured-data model predicts high risk.
Prediction severity is scored on a 15 scale. Invasive ductal carcinoma and inflammatory breast cancer receive a severity score of 5, locally advanced cancer receives a score of 4, ductal carcinoma in situ receives a score of 3, lobular carcinoma in situ receives a score of 2, and benign findings receive a score of 1. The fusion output dictionary is serialized as JSON and forms the payload returned by the Flask API.
-
-
FLASK BACKEND IMPLEMENTATION
-
Application Structure
The Flask application follows a modular factory pattern to support testing isolation. The directory structure includes the application factory, detection endpoint, AI pipeline singleton, patient schema, JWT helper utilities, rate-limiter configura- tion, environment-based configuration file, and Gunicorn entry point. This modular organization improves maintainability, scalability, and testing efficiency.
-
/predict Endpoint
The POST /predict route accepts application/json requests containing a base64-encoded mammogram image, patient identifier, structured clinical data fields, and an ISO 8601 timestamp. Input validation is performed using the Pydantic PredictRequest model.
Image decoding proceeds through base64 decoding, NumPy buffer conversion, OpenCV image decoding, and BGR array conversion. The endpoint returns HTTP 200 for successful predictions, HTTP 422 for malformed input, HTTP 429 when rate limits are exceeded, and HTTP 401 when authentication tokens are missing or invalid.
-
Gunicorn Deployment
Gunicorn is launched with four workers bound to port 5000, a 30-second timeout, and a 5-second keep-alive interval. The AI pipeline singleton is instantiated once per worker to avoid repeated model-loading overhead and improve inference performance.
A systemd service file named breastai-api.service is configured to automatically restart the application in the event of failure and during system startup, thereby improving service reliability and availability.
-
-
SECURITY LAYER IMPLEMENTATION
-
JWT Authentication
Flask-JWT-Extended is configured with a secret key read from an environment variable rather than being hard-coded into the application. Access tokens are configured with a va- lidity period of 24 hours and use the HS256 signing algorithm. The authentication endpoint verifies user credentials against a bcrypt-hashed credential store and issues a signed access token upon successful authentication.
All protected API endpoints require token-based authentica- tion. Token revocation is implemented using a Redis-backed blocklist indexed by token identifiers, with expiration times synchronized to the remaining token validity period.
-
Rate Limiting
Flask-Limiter is initialized with a Redis storage backend to control API usage and prevent abuse. The system applies default limits of 200 requests per day and 60 requests per minute. The prediction endpoint is assigned a stricter limit of 30 requests per minute to protect the AI inference pipeline from excessive load.
Rate-limiting information is included in HTTP response headers, enabling client applications to monitor request quotas and implement appropriate throttling mechanisms.
-
Security and Privacy
The system employs authentication, encrypted storage, se- cure API communication, and role-based access control mech- anisms to protect sensitive patient information and maintain regulatory compliance within healthcare environments.
Transport Layer Security (TLS) termination is handled through an Nginx reverse proxy using a 4096-bit RSA cer- tificate. To mitigate man-in-the-middle attacks, the mobile application implements certificate pinning and validates server certificates before establishing secure communication chan- nels.
-
-
MOBILE APPLICATION IMPLEMENTATION
-
Architecture
The Flutter application follows the Provider state- management pattern to ensure modularity and maintainability. The core components include an alert management module for diagnostic notifications, a patient log management module backed by SQLite storage, and an authentication module responsible for secure token management.
Navigation is implemented using GoRouter and provides three primary interfaces: a dashboard for live monitoring, a patient event log for historical records, and a natural-language query interface for retrieving diagnostic information.
-
Real-Time Diagnostic Polling
A periodic timer with a three-second interval drives the diagnostic polling mechanism. During each cycle, the appli- cation sends an HTTPS request containing the latest mammo- gram image and patient information to the prediction service. When a malignant prediction is returned, the alert manage- ment module immediately notifies registered listeners, trigger- ing a full-screen warning interface and a local push notification with severity-specific alert indicators.
-
Patient Event Log
Detected diagnostic events are stored locally using an SQLite database. Each record contains the event type, sever- ity level, confidence score, patient identifier, timestamp, and image thumbnail reference.
The patient log module provides filtering and search capa- bilities for reviewing historical diagnostic events. In addition, event records can be exported in CSV format to facilitate clinical reporting and patient handoff procedures.
-
Natural-Language Query Interface
The natural-language query interface allows healthcare per- sonnel to retrieve diagnostic information using free-text re- quests such as Show malignant cases from last week. User queries are processed through rule-based intent extraction that identifies event types and temporal expressions before generating corresponding database queries.
This lightweight natural-language understanding approach minimizes computational overhead on mobile devices while providing responsive information retrieval. Future enhance- ments will inorporate local large language model (LLM) support to improve query interpretation and contextual under- standing.
-
-
INTEGRATION TESTING
Integration testing employed a three-phase protocol to eval- uate the functionality, reliability, and performance of the proposed multimodal breast cancer detection system.
Phase 1 (Unit Testing): Each module was tested inde- pendently to verify its functionality. The CNN-based image analysis branch was evaluated using 500 held-out labeled mammogram images. The structured-data classifier was tested on 200 patient records. The Flask API was validated using 45 automated Postman test cases, while the mobile application was tested against a mock server environment.
Phase 2 (System Testing): The complete multimodal pipeline was evaluated using live DICOM images in a con- trolled clinical environment. Staged diagnostic scenarios were created using both benign and malignant mammogram sam- ples. System predictions were compared against radiologist- verified ground-truth labels to assess overall diagnostic per- formance and integration accuracy.
Phase 3 (Endurance Testing): The system was operated continuously for 72 hours with synthetic prediction requests injected every 45 seconds. During testing, three notable edge cases were identified and resolved. First, simultaneous CNN and XGBoost inference on the same CPU core occasionally caused performance degradation, which was mitigated by explicitly assigning GPU processing threads. Second, JWT clock skew exceeding five seconds produced intermittent HTTP 401 authentication errors, which were resolved through Network Time Protocol (NTP) synchronization. Third, An- droid 13 required explicit runtime permission requests for notification delivery, which was addressed by implementing the POST_NOTIFICATIONS permission workflow.
The successful completion of all testing phases demon-
strated the stability, reliability, and operational readiness of the proposed system for real-world clinical deployment.
-
RESULT ANALYSIS
The system performance results are summarized in Table I.
TABLE I
System Performance Results
Metric
Value
Remarks
CNN Classification Accuracy (ResNet50)
97.1%
500 labeled mammogram frames
Structured Data Accuracy (XGBoost)
92.4%
200 patient records
Average Inference Time
210 ms
Per image on GPU workstation
API Response Time
68 ms
Flask API processing
End-to-End Latency
378 ms
Image upload to mobile alert
Mobile Alert Delivery Time
1.1 s
Average notification delay
The experimental evaluation demonstrates that the proposed multimodal AI-based breast cancer detection system is capable of delivering accurate and timely diagnostic predictions in real-world clinical environments. The integration of CNN- based image analysis and structured data classification enables reliable detection of both imaging abnormalities and clinical risk factors while maintaining low inference latency.
Performance results indicate that the system can operate efficiently on GPU-equipped workstations without significant
degradation in accuracy. The achieved response time and high system availability confirm its suitability for continuous clinical diagnostic applications. Overall, the results validate the effectiveness of the proposed multimodal architecture for real-time breast cancer screening and early-stage detection.
-
DEPLOYMENT CONSIDERATIONS
-
Model Deployment
The trained model can be deployed through a Flask- based Web API, a Streamlit dashboard, or cloud platforms such as Microsoft Azure and Amazon Web Services (AWS). Healthcare professionals can securely upload mammogram images and patient clinical data through an authenticated in- terface to obtain real-time diagnostic assistance. GPU memory utilization is continuously monitored to maintain inference throughput under concurrent request workloads.
-
Power Resilience
Cloud deployment leverages auto-scaling and load- balancing mechanisms to accommodate varying clinical work- loads. The system service is configured for automatic restart and recovery in the event of application failures. Database snapshots are scheduled at six-hour intervals to prevent patient data loss during unexpected service interruptions and system crashes.
-
Storage Longevity
Write-intensive workloads are mitigated through memory- backed temporary storage, optimized database persistence strategies, and periodic event-log synchronization. For high- write deployment scenarios, NVMe SSD storage or cloud- attached block storage is recommended to ensure long-term reliability and performance.
DICOM images are archived to object-storage services such as AWS S3 or Azure Blob Storage after a retention period of 30 days, thereby reducing local storage requirements while maintaining long-term accessibility for clinical review and compliance purposes.
-
-
CONCLUSION
This paper presented a comprehensive implementation framework for a multimodal AI-based breast cancer detection system. The implementation encompasses hardware config- uration, environment setup, AI model deployment, CNN- based image analysis, structured clinical data processing, Flask API development, security mechanisms, mobile application integration, system testing, and deployment operations.
Experimental evaluation demonstrated that the proposed system is fully functional and capable of detecting breast can- cer with an end-to-end latency below 400 ms. The integration of imaging features and structured clinical data significantly enhances diagnostic reliability while maintaining real-time performance suitable for healthcare environments.
The presented implementation serves as a practical refer-
ence for deploying equivalent systems in hospitals, clinical centers, and research institutions. Future work will focus on multimodal federation across healthcare networks, privacy- preserving federated learning, explainable artificial intelli- gence (XAI) visualizations for improved clinical transparency, and integration with IoT-enabled healthcare monitoring sys- tems for automated emergency response and continuous pa- tient monitoring.
REFERENCES
-
Y. Zhang et al., Explainable Multimodal Deep Learning for Breast Cancer Diagnosis, 2023.
-
S. Liu and B. Min, DCS-ST for Classification of Breast Cancer Histopathology Images, IEEE Journal of Biomedical and Health In- formatics, 2024.
-
P. Karatza, K. Dalakleidi, M. Athanasiou, and K. S. Nikita, Inter- pretability Methods of Machine Learning for Breast Cancer Diagnosis, IEEE Journal of Biomedical and Health Informatics, 2022.
-
H. Kim et al., Synthetic Data Generation for Histopathology with GANs, 2022.
-
M. Trang et al., Integrating Structured Data with Imaging in Breast Cancer Detection, Biomedical AI Journal, 2023.
-
R. Sinha et al., AI-Driven Radiomics and Pathomics Integration for
Breast Cancer Diagnosis, 2024.
-
M. Fernandez et al., Real-World Evaluation of AI Tools in Oncology
Clinics, 2023.
-
G. V. Londhe, V. Birchha, V. N. K. Reddy, S. Veena, A. Srilakshmi, and
P. Karthigaikumar, Multimodal Deep Learning for Breast Cancer Detection:Integrating Transfer Learning with Clinical Data for En- hanced Accuracy, in Proceedings of the 3rd International Conference on Optimization Techniques in Engineering (ICOFE-2024), 2024.
-
K. Aguerchi, Y. Jabrane, M. Habba, M. Ameur, and A. H. El Has- sani, Enhancing Automated Breast Cancer Detection: A CNN-Driven Method for Multi-Modal Imaging Techniques, Journal of Personalized Medicine, vol. 15, no. 10, p. 467, 2025.
-
A. Ronacher, Flask Documentation 3.0.x, Pallets Project, 2023.
