DOI : 10.17577/IJERTCONV14IS010091- Open Access

- Authors : S Pavan Kumar Nayak, Rakshitha P
- Paper ID : IJERTCONV14IS010091
- Volume & Issue : Volume 14, Issue 01, Techprints 9.0
- Published (First Online) : 01-03-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Medication Prediction and Recommendation System using Machine learning
S Pavan Kumar Nayak1, Rakshitha P2
1Student , St. Joseph.Engineering College, Mangalore
2Assistant Professor,St. Joseph Engineering College, Mangalore
Abstract — The.main objectives of this study aim to build a system for the healthcare domain which adapts machine learning techniques to forecast diseases based on a patients symptoms and render customized medical recommendations. The accuracy achieved of 95.12 using Random Forest on the symptom-disease dataset far exceeds the results of related studies. In addition to accurate prediction of diseases, the system further enhances the health of the patient with tailored recommendations, including specific medications, diet regimens, and appropriate exercises. The effectiveness of this work in dealing with both simple and complex diseases demonstrates the practical health-care assistance the system can provide. Machine learning facilitates scalable and also efficient.system not only for forecasting diseases, but for recommending valuable treatments centered on the patients profile. The model aims to cover a wider range of less common diseases and strengthen its performance by expanding the datasets trained on, improving its diverse training data.
Keywords Symptom analysis ยท Disease prediction ยท Machine learning ยท Medication recommendation ยท Preventive care
1 Introduction
The integration of machine learning (ML) with health- care has brought major improvements in medical diagnosis and treatment recommendations. One of the most valuable developments is the creation of Medication prediction and recommendation system with machine learning (PMRS), which use ML algorithms to analyze patient data and also provides the customized health-care advice for disease management and prevention.
Traditional health-care methods often follow a one- size-fits-all approach, which can lead to misdiagnoses, less effective treatments, and inefficient use of medical resources. This highlights the need for the personalized systems that can consider each patients unique symptoms, medical history, and other personal details to deliver better, more accurate care.
This research, We propose a machine learning-based health-care recommendation system designed to predict diseases from patient-reported symptoms and offer personalized treatment suggestions.
Using a large symptomdisease dataset and a Random Forest (RF) model, our system achieved a high accuracy of 95.12%, outperforming similar existing studies.Beyond disease prediction, the system provides tailored recommendations, including suitable medications, diet plans, and exercise routines, helping patients manages their health more effectively.
By focusing on both common and complex diseases, this system aims to improve the patients outcomes, enhances the care, and promote proactive health management.
Future work will expand the dataset, cover underrepresented diseases, and further refine the model to boost its accuracy and usability in real-world health-care
settings.
This study aims to achieve several key objectives:
-
Analysis of Symptoms and Disease Patterns: We will analyze a rich datasets of symptoms and corresponding
diseases to uncover insights into disease patterns and symptomdisease relationships.
-
Machine learning model Development: We will train ,evaluate various ml models, such as support vector classifier (SVC) and RF, to accurately predict diseases from user-input symptoms.
-
Personalized Recommendations: The systems will feature capabilities for recommending tailored medications, precautions, workouts, and dietary adjustments based on predicted diseases.
-
Practical Utility Assessment: Finally, we will evaluates the practical effectiveness of the system in improving patient outcomes and enhancing health-care delivery.
This research also makes an important contribution to personalized health-care by creating the framework for developing a Medication prediction and recommendation system with machine learning techniques. It addresses key challenges in current health-care systems, which often struggle to use complex patient data to provide personalized and actionable recommendations.
Through exploratory data analysis, our study offers a valuable insights into that how symptoms are distributed across different diseases, helping improve the understanding of symptomdisease relationships.
In addition, this research highlights not just the prediction power of ml models, but also the practical use in helping patients actively manage their health. By connecting advanced ml methods with real- world health-care applications, our work aims to improve the patient care, support early disease management, and lower health-care costs.
-
Problem statement
Existing approaches in health-care leverage standardized methods which tend to ignore specific patient requirements, resulting in inaccurate diagnosis, inefficient treatment pathways, and wastage of resources. Existing practices do not offer the insight and personal care that takes into account the patients symptoms, medical history, and lifestyle choices. This highlights the growing importance of having a proactive, patient-centered recommendation framework, which helps in precise diagnosis and bespoke guidance on medical challenges, enhancing health-care results and improving the allocation of health-care resources.
-
Motivations
Need for Personalization in healthcare The conventional health care practices do not identify gaps in the unique requirements of patients, resulting in prolonged recovery times and ineffective treatment plans, increasing the patients overall health expenditure. There is an opportunity to improve accuracy in the diagnosis as well as in treatment effectiveness by personalizing health care recommendations.
Applying Machine Learning in Health care The increased supply of health care data creates an opportunity to enhance medical diagnosis and recommendation systems through machine learning algorithms. The field of ML has the potential to assess intricate patterns for various symptoms and diseases and provide tailored suggestions.
Preventive Care and Proactive Disease Management A key challenge in modern health-care is the late diagnosis of diseases, which often results in more complex and expen- sive treatments. Developing a system not only predicts diseases but also recommends preventive measures can greatly improve the patients outcomes and also reduce health-care costs.
Practical Utility for Patients: A system that not only but also predicts diseases and recommends medications, life- style changes (such as diet and exercise), and precautions has real-world utility ,and helps patients to manage their health more effectively.
-
Contributions
-
Development of a medication prediction system using ml: This research presents a comprehensive system that uses ml algorithms to predict diseases based on symptoms and offers personalized recommendations for medications, work- outs, precautions, and dietary adjustments.
-
Integration of ML Models: The study trains and evaluates several ML models, and including random forest, achieving high accuracy rates (95.12%) in disease prediction. The comparison of models demonstrates the systems robustness and reliability.
-
Exploratory Data Analysis for SymptomDisease Relationships: The research provides insights into the distribution of symptoms across various diseases through exploratory data analysis and visualization. This analysis hels in understanding disease patterns, aiding in better symptomdisease correlation and prediction accuracy. Real-World Application in Health-care: By incorporating features for recommending medications, precautions, and lifestyle changes, the system enhances its practical utility. It empowers users to take some proactive steps toward managing their health, promoting preventive care , and self-management.
-
This research contributes to the development of future health-care systems that can offer personalized, data- driven recommendations. It sets a foundation for integrating ML models with medical datasets for improved diagnostics, treatment, and preventive care strategies.
-
This research addresses a critical gap in modern health-care by leveraging data and ml techniques to deliver personalized, proactive, and practical health-care solutions.
The remainder of this paper is organized as follows: Sect. 2 presents the literature review, Sect. 3 discusses the proposed methodology, Sect. 4 illustrates the visualization, Sect. 5 details the results and discussion, Sect. 6 explains how to deploy the proposed methodology, and Sect. 7 concludes the work.
2 Related Work
This article described a machine learning-based meal system for diabetic patients that uses personal information such as medical history, choices, and levels of blood glucose. The system generated tailored meal plans using decision trees, random forests, and neural networks, exhibiting effective alignment with medical recommendations while also improving diabetes care, set- ting the way for more efficient, customized, and scalable dietary planning. Whereas [9] the authors offered a patient- centered health-care recommendation method based on machine learning applications. The program predicted and offered appropriate applications based on individual requirements, using machine learning and patient data. This strategy gave individuals the ability to actively control their health and enhances health-care quality.
In [10], the authors conducted a comprehensive survey of recommendation systems, focusing on both the technical aspects of recommendation models and the practical applications across various service domains. They collected and reviewed over 135 top-ranking articles and papers published between the year 2010 and 2021, analyzing trends in system models, techniques, and application fields. The study categorized system models, such as content- based filtering, collaborative filtering, and hybrid systems, and examined their adoption and evolution over the past dec- ade. Additionally, they investigated the diverse application service fields where recommendation systems are utilized, highlighting the interplay between research advancements and the business growth of these service domains. By synthesizing insights from both academic research and industry data, the study shed light on the intricate relationship between recommendation system development and its real-world impact on various service sectors. The authors utilized Google Scholar as the primary search engine and applied stringent criteria to select high- quality research papers from reputable journal databases and top- tier conferences. Through this rigorous approach, they aimed to provide reliable analysis of the system trends and their implications for practical service applications.
In [11], delves into the intersection of the data and health-care systems, recognizing the significance of the health-care industry as a data-rich environment. They emphasize the need for comprehensive big data approaches to extract valuable insights from the vast volumes of data generated in health-care. With the advent of cloud computing and the distributed programming of frameworks like Hadoop and Spark, the authors highlight the feasibility of the leveraging data analytics for health- care recommendations. They underscore the importance of knowledge- driven recommendations to benefit all stakeholders in the health-care ecosystem. This paper discusses various types of recommender systems tailored specifically for the health- care domain, which utilize big data analytics to generate personalized recommendations. By leveraging machine learning algorithms and artificial intelligence techniques, these systems aim to provide actionable insights from health-care data. The authors emphasize the role of dis- tributed environments for processing health-care data, highlighting the essentiality of big data analytics in facil- itating comprehensive analysis. The paper not only pro- vides valuable insights
Dataset description
The dataset used in the study is designed to support classification of diseases based on patient symptoms. It consists of 4920 instances, each characterized by 132 attributes. Below is the detailed description of dataset:
into existing health-care recommender systems but also identifies research gaps for further investigation. By shedding light on the current state-of-the-art and potential areas for improvement, the authors contribute to advancing the field of big data analytics for health-care recommendations.
Authors in [12] presented an ensemble-based approach using ML amd DL models to predict the likelihood of developing cardiovascular disease. The model used six classification algorithms and random forests to extract key features from a publicly available dataset. The ML ensemble model achieved the best disease pre- diction accuracy of 88.70%.
3 The proposed methodology
Algorithm provides a clear framework for implementing a medication prediction and recommendation system with machine learning powered by machine learning. Comments on Key Steps:
Data Acquisition and Quality Assessment:
-
Ensures that only clean, high-quality data is used to train models, which directly impacts the performance of ML algorithms.
Data Preprocessing:
-
Handles common issues in datasets like missing values, outliers, and class imbalance. Each operation (e.g., imputation, outlier detection, resampling) is critical for improving the quality of the training data.
Model Selection and Hyperparameter Tuning:
-
One model is used in this pipeline (Random forest), offering the potential for better comparison and robustness. Hyperparameter tuning improves model performance by finding optimal configurations.
-
Evaluation and Interpretability:
The model was evaluated using the performance of metrics such as accuracy, precision, recall, F1-score, and a confusion matrix to assess its prediction reliability, these evaluation metrics provided the valuable insights into the models effectiveness and ensured trustworthy, consistent predictions.
These steps ensure that the model is prepared for real- world deployment, including scaling for larger datasets and addressing ethical and privacy concerns.
-
Source and Collection: The dataset was sourced.from Kaggle.Medical Dataset, a public medical database containing information on various symptoms and diseases collected from a diverse cohort of patients. This dataset includes data from electronic health records and clinical trials.
-
Data Quality: The dataset has been validated for
accuracy by cross-referencing with medical literature and expert reviews.
-
Attributes Description: Each instance in the dataset includes 132 attributes, covering a range of symptoms and diagnostic features.
These attributes include:
-
Symptom Attributes: e.g., fever, cough, head- ache, rash
-
Diagnostic Features: e.g., age, gender, previous medical conditions
-
-
Handling Missing Data: The missing.values.were addressed.using mean imputation for numerical attributes and mode imputation for categorical attri- butes. Instances with excessive missing values were removed to maintain dataset integrity.
-
Outlier Detection:Outlier were also identified using statistical techniques such as Z-score and interquartile range (IQR). Outliers were either adjusted or removed based on their impact on model performance.
-
Class Distribution: The dataset includes a balanced representation of various diseases.
-
Data Preprocessing: Before modeling, data prepro- cessing steps included normalization of numerical attributes, encoding of categorical variables using one-hot encoding, and feature scaling to ensure uniformity across the dataset.
-
Potential Limitations: It is also important to note that the dataset may have limitations such as being limited to specific geographic regions or periods, which may not fully represent all possible disease scenarios. Addi- tionally, the dataset might be subject to biases related to demographic representation, as it may not cover all age groups, ethnicities, or socioeconomic backgrounds equally. The absence of data on certain rare diseases or atypical symptom presentations could also impact the comprehensiveness of the predictions. Furthermore, the accuracy of the recommendations depends heavily on the quality and completeness of the input data, which may vary across different sources or over time. These limitations highlight the need for continuous updates and validation of the dataset to ensure the robustness and generalizability of the recommendation system.
Table 1 describes a symptomdisease prediction datasets, where the each row represents a patient, and the columns (features) capture the presence or absence of various symptoms, as well as the predicted disease.
-
Preprocessing and data analysis
Before model development, several comprehensive data
preprocessing steps were undertaken to ensure the accuracy and effectiveness of the predictive model. These steps include:
-
Handling Missing Data:
-
Identification and Imputation: Missing values were also identified using basic statistical methods such as calculating mean and standard deviation.
-
Replacement: Numerical attributes with missing values were imputed using mean imputation, while categorical attributes were imputed using mode imputation.
-
Deletion: Instances with excessive missing values were removed to maintain data integrity.
-
-
Handling Outliers:
-
Identification: Outliers were detected using statis- tical techniques such as Z-Score and Interquartile Range (IQR).
-
Treatment: Outliers were either adjusted or removed based on their impact on model perfor- mance. In some cases, outlier values were replaced with acceptable values based on the data mean or nearby values.
-
-
Handling Class Imbalance:
-
Evaluation: The distribution of classes in the dataset was analyzed to determine if any classes were underrepresented.
-
Balancing: Class balancing methods were applied to ensure a representative distribution of all classes in the final model.
-
These steps ensure that the data is clean and accurately represented, which contributes to improved model performance and predictions.
-
-
Scalability and real-world application
The systems scalability and integration into real-world medical work-flows were considered to ensure practical applicability:
-
Scalability:
-
Data Expansion: The system can be scaled to accommodate larger datasets by leveraging distributed computing and cloud-based solutions. This allows for handling increasing volumes of patient data and expanding the models coverage of diseases and symptoms.
-
Model Adaptation: The architecture is designed to incorporate new symptoms and diseases, allowing for continuous improvement as more data becomes available.
-
-
Integration:
-
Workflow Integration: The system can be inte- grated into existing electronic health records (EHR) systems and clinical decision support sys- tems. This integration facilitates seamless access to recommendations during patient consultations.
-
Handling New and Unseen Data: The model is designed to handle new and previously unseen data through techniques like incremental learning and model retraining. This ensures that the system remains accurate and up-to-date with evolving medical knowledge.
-
-
Model building
The model-building phase involves the selection and implementation of ML algorithms to develop an accurate predictive model for disease diagnosis.In the project, algorithms included are support vector classi- fier (SVC) and random forest.These algorithms offers diverse approaches to classify, each with its strengths and the weaknesses. The selected algorithms are instantiated using appropriate parameters and trained on the preprocessed dataset. The goal is to create the models capable to effectively learning from input symptom data and accurately predicting the corresponding disease. This stage emphasizes the importance of exploring various algorithms is to identify most suitable approach for the given task. This study is based on SVC and RF for the following reasons:
-
Simplicity and Interpretability: SVC and RF are relatively simpler and more interpretable than deep learning models, making them easier to understand and evaluate for specific tasks like symptom-based disease prediction.
-
Small Dataset Suitability: Unlike deep learning meth- ods, which require extensive data and computational resources, SVC and RF can achieve strong perfor- mance even with limited data, making them suitable if the dataset is not very large.
-
Performance on Structured Data: These models are effective for structured, tabular data and may perform
competitively without the overhead of deep learning architectures, which are more beneficial for unstructured data like images or text.
Support Vector Classifier (SVC):
-
Utilized grid search CV for hyperparameter tuning to find the best parameters for the SVC model, including regularization parameter (C), kernel type, and gamma value.
-
Implemented the grid search CV with a fivefold cross-
validation strategy to ensure robustness of parameter selection.
-
Trained the SVC model with the best parameters
obtained from grid search CV.
-
Achieved an accuracy rate of 95.12% with the optimized SVC model on the test dataset.
Random Forest (RF):
-
Employed RF classifier with default hyper- parameters for simplicity.
-
Trained the RF model directly without hyperparameter tuning.
-
Achieved a high accuracy rate with the RF model on the test dataset.
-
While not specified in the code, random forest models
are known for their robustness and ability to handle complex datasets with high dimensionality, making them suitable for disease prediction tasks.
Both SVC and RF models were able to achieve high accuracy rates, with SVC notably reaching 95.12% accuracy. The utilization of grid search CV for hyperparameter tuning and careful selection of features likely contributed to the excellent performance of both models. Additionally, SVCs flexibility in handling nonlinear data and RFs ability to handle complex relationships between features may have played significant roles in achieving accurate disease predictions in your recommendation system.
-
-
-
Training and prediction
Once the models are instantiated and trained using the training dataset (70% of the total dataset), they are evaluated on the testing dataset (30% of the total dataset) to assess their performance and predictive accuracy. The trained models are used to redict disease diagnoses based on input symptom data. During the prediction phase, the trained models utilize the learned patterns and relationships from the training the data to make each predictions on unseen data. Performance metrics such as accuracy scores and confusion matrices are computed to evaluate the models effectiveness in accurately predicting disease diagnoses. This stage serves to validate the models performance and assess their suitability for real-world application in providing personalized medical recommendations based on input symptoms. This is also presented in Fig. 1.
3.5 Model evaluation and performance
While achieving high accuracy in model predictions is a critical goal, it is important to consider the potential for overestimation of accuracy and its implications for real- world applications. The reported accuracy of 95.12%% in the models performance might suggest an overly optimistic view, which is not always realistic in practical scenarios. This could be attributed to many factors such as data leakage, overfitting, or an imbalanced dataset.
To address these concerns and provide a more com- prehensive evaluation, the following additional metrics and techniques were employed:
-
Cross-Validation: To ensure the robustness of the model, cross-validation was implemented. This technique involves.dividing.the.dataset.into multiple folds and iteratively training and validating the model on different subsets of the data. Cross- validation helps to assess the models performance on unseen data and reduces the risk of overfitting.
-
Precision: Precision measures the accuracy of the positive predictions made by the model. It is defined as the ratio of the true positive predictions to the sum of true positive and false positive predictions. Precision is crucial in scenarios where the cost of false positives is high.
-
Recall: Recall, or sensitivity, evaluates the models ability to identify all relevant positive cases. It is defined as the ratio of true positive predictions to the sum of true positive and the false negative predictions. High recall is important in contexts where missing a positive case could be detrimental.
-
F1-Score: The F1-score is the harmonic mean of the precision and recall, providing a single metric that
balances both concerns. It is.particularly.useful when
dealing with imbalanced datasets, as it combines both precision and recall into a single value.
By incorporating these metrics and cross-validation techniques, the evaluation of the models performance is more reliable and reflects its practical applicability. These additional measures help to ensure that the model not only performs well with the training data but also generalizes effectively to new, unseen data.
4 Visualizations
Figure 2 shows a bar chart of the Top 10 Most Frequent Symptoms based on the provided image. The symptoms are listed along the y-axis, and their respective frequencies are on the x -axis. The chart visualizes that fatigue and vomiting are the most frequent symptoms, followed by high fever and other symptoms like loss of appetite, nausea, etc.
Figure 3 presents a clustered heatmap that visualizes the presence of various symptoms across different diseases. Each cell in the heatmap represents the frequency of a symptom associated with a particular disease, with colors ranging from cool (low frequency) to warm (high fre- quency). This clustered heatmap provides a clear under- standing of the visualization. By using color gradients to represent symptom frequency, the figure offers insights into the relationship between diseases and their associated symptoms. This type of heatmap is particularly useful for identifying patterns and overlaps, helping clinicians or researchers see which symptoms are prevalent for multiple diseases. The cooler and warmer color contrasts provide an immediate sense of the intensity of these relationships. However, grouping related symptoms or diseases could further enhance the clarity of this information.
Figure 4 is a pie chart representing distribution of the
41 diseases within the dataset. Each slice of the pie corresponds to a different disease, with the size of the slice indicating the frequency of that disease.in.the dataset..This chart provides the clear overview of the most and least common diseases.
-
Results and discussion
The analysis of the ML experiments reveals promising insights into disease prediction accuracy and model performance. Among the suite of models they are trained for disease prediction, I chose the support vector classifier (SVC) achieving an impressive accuracy rate of 95.12%%. This high accuracy underscores the efficacy of utilizing sophisticated ML algorithms in health-care recommendation systems. Moreover, incorporating fea-
tures for recommending medications, precautions, work- outs, and dietary adjustments tailored to predicted diseases enhances the practical utility of the recommendation sys- tem. By offering personalized recommendations based on predicted diseases, the system can empower users to manage their health and mitigate disease risks proactively. Overall, these results demonstrate the potential of personalized medical recommendation systems to revolutionize health-care delivery and improve patient outcomes.
Figure 5 illustrates the model evaluation metrics figure and presents a concise overview of the models overall performance across precision, recall, F1-score, and accu- racy, with all metrics scoring consistently at or
near 0.950. The nearly identical scores for precision, recall, F1-score, and accuracy reflect a well-balanced model. This indicates that the model performs equally well in terms of identify- ing true positives, avoiding false positives, and maintaining overall predictive accuracy.
-
Confusion matrix
The confusion.matrix.offers the detailed breakdown of the models performance by presenting the counts of true positive, true negative, false positive, and false negative predictions across different disease categories. This visual
Fig. 1 Workflow of the model
Fig. 2 Top 10 most frequent symptoms
Fig. 3 Symptoms distribution across diseases
representation allows the comprehensive evaluation of models accuracy, precision, recall, and F1-score for each disease class. By analyzing the confusion matrix, insights into the models ability to correctly classify dis- eases and detect potential misclassifications can be gained. Furthermore, the confusion matrix facilitates the identifi- cation of specific disease classes that may pose challenges for the model, guiding future optimization efforts to improve overall prediction accuracy. The confusion matrix is displayed in Fig. 6.
-
Accuracy: is a performance metric used to evaluate overall performance of a classification model.
-
Precision: is a performance metric used to evaluate quality of the classifiers positive predictions.
-
Recall: It is performance indicator used to assess classifiers accuracy in identifying positive examples. It is also known as sensitivity or true positive rate.
-
F1-score: is a performance metric that considers both precision and recall providing a single value that summarizes the performance of a classifier
Metrics derived from the matrix:
The model demonstrates excellent performance, as most predictions are correct, with high true positive and true negative counts. The number of false positives (22) and false negatives (23) is minimal, indicating that model is well-calibrated. While the overall metrics (accuracy, precision and recall, F1-score) are near-perfect, slight improvements could be made to further reduce false posi- tives and false negatives, particularly if the applcation requires high sensitivity or specificity.
Table 2 provides the detailed breakdown of the models classification performance across various diseases, high- lighting the precision, recall, F1-score, and support metrics. The model demonstrates perfect classification performance for several diseases, such as Acne, Arthritis, Dimorphic hemorrhoids (piles), Fungal infection, Hypo- glycemia, Impetigo, Peptic ulcer disease, Psoriasis, and Urinary tract infection. These results indicate that the model can confidently and accurately distinguish these diseases from others in the dataset.
For diseases like AIDS, Chickenpox, Dengue, Heart attack, and Malaria, the F1-scores are above 0.9, reflecting strong classification capabilities. These diseases likely have distinctive features that models effectively captures.
Diseases such as Common cold (F1 = 0.88) and
-
-
Comparison with related work
In this section, we compare our medical recommendation system with existing research to highlight the differences, advantages, and limitation of our approach to current methodologies in this field. Table 3 lists the titles or categories of the different research studies and methodologies being analyzed and compared.
Hypertension (F1 = 0.92) show moderately good results.
While they are above average, there might be some overlap with other conditions that slightly reduce precision or recall. A few diseases, such as Hepatitis
D (F1 = 0.071), Hepatitis E (F1 = 0.235), and Typhoid (F1 0.6), have notably lower F1-scores. This suggests that the model struggles to identify these diseases, either due to insufficient or ambiguous data, class imbalance, or overlapping symptoms with other conditions. Some diseases, like Cervical spondylosis (Recall = 0.739) and Hyperthyroidism (Recall = 0.708), show lower recall, indicating that the model misses a significant number of actual cases.Diseases with high precision but lower recall (e.g., Chroniccholestasis, Hepatitis A) reflect a tendency to classify conservatively, prioritizing accuracy over sensitivity. The support column highlights the number of instances for each disease. Diseases with lower support (e.g., Peptic ulcer disease with 17 cases) might contribute to the models inability to generalize effectively for these categories. For diseases like Migraine (F1 = 0.677) and Tuberculosis (F1 = 0.491), moderate to poor performance could also stem from data limitations or overlap with
other diseases.
This table provides the comparative analysis of studies focusing on different methodologies, datasets, and results for various ML and DL tasks.
The proposed model achieves high accuracy (95.12%%) using support vector classification (SVC) and random forest (RF), demonstrating its strength in symptomdisease classification. The incorporation of additional contextual data like medications,precautions,and workouts adds
practical relevance and utility beyond pure predictive accuracy. The datasets size (132 attributes, 4920 instan- ces) reflects a comprehensive setup but also hints at potential challenges related to complexity and feature redundancy
Fig. 4 Distribution of diseases
A [14] study compared traditional ML and ensemble techniques for the coronary heartdisease (CHD) prediction. Stacking achieves the highest accuracy (75.1%), but all methods underperform compared to other studies in the table. The large dataset (70,000 instances) ensures robust- ness, but the relatively low
Fig. 5 Model evaluation metrics
accuracy suggests that CHD prediction might require more sophisticated or domain- specific approaches. Ensemble techniques like bagging and boosting show similar performances, indicating limited
gains from method diversification in this context
Fig. 6 Confusion matrix for the binary classification model performance
-
-
Deployment
In deploying, a classification model was integrated into a React.js-based web application, providing a clean and interactive user interface. The React front end includes input fields for entering symptoms, a button to initiate the prediction, and sections to display the predicted disease along with the relevant details such as its description, precautions, medications, recommended workouts, and diets. This user-friendly web interface, connected to a Node.js backend, offers an accessible and responsive way for users to interact with the medication prediction and recommendation system with machine learning.
In Fig [7] a screenshot of the user interface created using React.js for the symptom checker application. The interface includes input fields for entering symptoms, a button to initiate the check, a progress bar indicating processing, and a text area to display the predicted disease along with relevant information such as description, precautions, medications, workouts, and diets. This user-friendly inter- face facilitates interaction with the medication prediction and recommendation system with machine learning.
-
Conclusion
This study presents a robust machine learning-based health- care recommendation system that integrates accurate disease prediction with practical, personalized medical advice. By leveraging RF model on a compre- hensive symptomdisease dataset, the system achieved a remarkable accuracy of 95.12%%, outperforming existing approaches, including ensemble methods and hybrid tech- niques. Beyond diagnosis, the systems ability to provide tailored recommendationssuch as medication, dietary plans, and exercise regimenssignificantly enhances its practical utility, making it an essential tool for patient- centered care and proactive health management.
Comparative the analysis with similar studies highlights the systems scalability and effectiveness, even when addressing diseases of varying complexity. Unlike methods relying on limited datasets or narrowly focused techniques, this system balances accuracy, adaptability, and usability. The integration of real-world health recommendations with predictive analytics demonstrates the potential of machine learning to transform health-care delivery by bridging the gap between diagnosis and treatment.
Table 3 Comparison between key findings and observations
Study Methodology Key findings Dataset Used techniques Best accuracy
Proposed Model
Machine Learning with Symptom and Disease Dataset
RF and SVC achieved the highest accuracy
All datasets used in the RF , SVC, code and training of the GradientBoosting , model are available in [1] KNeighbors,
MultinomialNB
RF= 95.12%
SVC = 95.05%
GradientBoosting =91.93%
KNeighbors =95.08% MultinomialNB=95.06%
-
Data collection (EHRs, real-High accuracy,improvedHospital EHRs RF,adaptive DSS,RF = ~95100%
time monitoring), featureemergency decision- selection, Random Forestmaking, robust feature Classifier, adaptive DSS, cross-use, outperforms other
feature selection, cross-validation, compared with
validation.
algorithms.
Decision Tree, SVM,
KNN
-
Data collection from diverseSystem predicts diseasesDiverse health dataLogistic Regression,All models (Logistic health sources, preprocessing,(Arrhythmia, Sleepincluding EHRs, wearableRandom Forest,Regression, Random Forest,
feature engineering Apnea, Insomnia, Stroke)device data, patient-reportedVoting Classifier forVoting Classifier)
with >90%
outcomes
disease prediction
achieved >90%
-
Data preprocessing, training sixLIME enhancesHealth-related data (EHRs,DL and LME forLIME maintains high
deep learning algorithms (e.g.,transparency bytest reports, treatmentlocal Interpretability accuracy with added
MLP, Gradient Boosting
explaining predictions;histories) for heart disease key features identified:and diabetes
High-BP for heart disease
Interpretability, comparable studies report 7894.96%
[6] Survey data collection, featureRandom Forest achievesRandom Forest achievesRandom Forest, KNN,RF = 93%engineering (6 factors), train93% accuracy multi-output models (Random
Forest, KNN, AdaBoost, XGBoost).
93% accuracy
AdaBoost, XGBoost
Fig. 7 Deployment output, users will enter their values to get classified
-
-
References
-
Hassan, Basma M., and Shahd Mohamed Elagamy. "Personalized medical recommendation system with machine learning." Neural Computing and Applications (2025): 1-17.
-
Zoha Fatima, Dr. Syed Asadullah Hussaini, & Dr.L.K Suresh Kumar. (2024). OPTIMIZE EMERGENCY MEDICATION DECISIONS THROUGH RANDOM FOREST & ADAPTIVE DECISION SUPPORT SYSTEM FOR A PERSONALIZED DRUG RECOMMENDATION FRAMEWORK. The Bioscan, 19(Special Issue-1), 707713.
-
Parkar, I., & Parkar, I. (2025). Personalized Health Recommendation System using Machine Learning [Computer software]. GitHub. Retrieved from IbrahimParkar/Personalized_Health_Recommendation_System
-
Wu, Y.; Zhang, L.; Bhatti, U.A.; Huang, M. Interpretable Machine Learning for Personalized Medical Recommendations: A LIME-Based Approach. Diagnostics 2023, 13, 2681.
https://doi.org/10.3390/diagnostics13162681.
-
Published in the 12th International Symposium on Information and Communication Technology (SOICT 2023)
-
Sharma, I.P.; Nguyen, T.V.; Singh, S.A.; Ongwere, T. Predicting an Optimal Medication/Prescription Regimen for Patient Discordant Chronic Comorbidities Using Multi-Output
Models. Information 2024, 15, 31.
https://doi.org/10.3390/info15010031
-
Jackins, V., Vimal, S., Kaliappan, M. et al. AI-based smart prediction of clinical disease using random forest classifier and Naive Bayes. J Supercomput 77, 51985219 (2021). https://doi.org/10.1007/s11227-020-03481-x.
-
Mao C, Yao L, Luo Y. MedGCN: Medication recommendation and lab test imputation via graph convolutional networks. J Biomed Inform. 2022 Mar;127:104000. doi: 10.1016/j.jbi.2022.104000.
Epub 2022 Jan 29. PMID: 35104644; PMCID: PMC8901567
-
Katzman, J.L., Shaham, U., Cloninger, A. et al. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 18, 24 (2018). https://doi.org/10.1186/s12874-018-0482-1
-
Ko H et al (2022) A survey of recommendation systems: recommendation models, techniques, and application fields. Electronics 11(1):141
-
Lambay MA, Mohideen SP (2020) Big data analytics for healthcare recommendation systems. In: 2020 International conference on system, computation, automation and networking (ICSCAN). IEEE.
-
Alqahtani A et al (2022) Cardiovascular disease detection using ensemble learning. Comput Intell Neurosci 1:5267498
-
Sharma D, Singh Aujla G, Bajaj R (2021) Deep neuro-fuzzy approach for risk and severity prediction using recommendation systems in connected health care. Trans Emerg Telecommun Technol 32(7):e4159
-
Agarwal A et al (2021) Classification model for accuracy and intrusion detection using machine learning approach. PeerJ Comput Sci 7:e437
