DOI : https://doi.org/10.5281/zenodo.18901389
- Open Access

- Authors : Chunduri Madhurya, Seema Kumari, Kaitha Benoni, Dongari Sai Kiran
- Paper ID : IJERTV15IS020853
- Volume & Issue : Volume 15, Issue 02 , February – 2026
- Published (First Online): 07-03-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Multilingual Sentiment Analysis using Random Forest
Chunduri Madhurya
Computer Science and Engineering (CSE) Vardhaman College of Engineering,Shamshabad
Kaitha Benoni
Computer Science and Engineering (CSE) Vardhaman College of Engineering,Shamshabad
Seema Kumari
Computer Science and Engineering (CSE) Vardhaman College of Engineering,Shamshabad
Dongari Sai Kiran
Computer Science and Engineering (CSE) Vardhaman College of Engineering,Shamshabad
Abstract – The main aim of the Multilingual customer sentiment analysis mode is the classification of the reviews that are written in Telugu and Hindi. We will be applying different machine-learning algorithms classify the reviews into positive, negative, and neutral so that the businesses are able to understand public opinions and make changes in the marketing strategies. An extensive dataset of Telugu and Hindi reviews was generated using different AI tools. The dataset had been preprocessed using some techniques like tokenization, eliminating the stop words, and normalization which made the textual data clean and standard. Later on, the text information is represented as numerical vectors by using Term Frequency-Inverse Document Frequency (TF-IDF) in order to make it machine learning algorithm- ready.
A number of classifiers were also used in this study, including: Logistic Regression, Support Vector Machine (SVM), Naive Bayes, Decision Tree, and Random Forest. Each model was trained and tested on the dataset and their performance was compared. Among all the models, the Random Forest classifier achieved the best results, demonstrating superior capability in handling complex multilingual data. This research emphasizes the potential of incorporating underrepresented languages like Telugu and Hindi into sentiment analysis systems and contributes to the broader scope of multilingual natural language processing applications.
Index Terms – Multilingual Sentiment Analysis, Customer Reviews, Random Forest, Natural Language Processing, TF-IDF, Text Classification
- INTRODUCTION
Review sentiment analysis has become a powerful tool for companies to analyze customer opinions and make data-driven decisions. By identifying whether a review is positive, negative, or neutral, companies can better understand consumer sentiment, refine their marketing strategies, improve products, and enhance customer service . [1] Additionally, SA plays a crucial role in brand reputation management, helping businesses respond effectively to feedback and maintain customer trust. [2]
With the emergence of smartphones and digital platforms, humans are no longer merely reading information theyre sharing their views and experiences on the internet actively. E- commerce websites have become a major hub for customer feedback, with thousands of reviews being posted daily. These reviews hold valuable insights into consumer preferences,
product satisfaction, and market trends, but the sheer volume of data makes it nearly impossible for businesses to process them manually. [3]
Sentiment analysis has evolved from rule-based and lexicon- based methods to advanced machine learning and deep learning techniques, including SVM, Na¨ve Bayes, Decision Trees, Logistic Regression, CNN, and Neural Networks. [4]
The importance of multilingual sentiment analysis lies in its ability to process and interpret opinions in diverse languages, businesses to understand customer sentiments from a broader demographic, the study aims to identify the best approaches for sentiment analysis across languages, with potential applications in improving customer engagement, product innovation, and business growth. [5]
- LITERATURE SURVEY
In this section, the literature is carried out on for analyzing reviews according to the Table.I on multilingual sentiments.
Alzahrani et al. [6] has developed a model by using LSTM and CNN-LSTM for sentiment analysis that can detect consumer sentiment, for improving market strategies and obtained a better accuracy on Amazon product reviews.
Fu Shang et al. [7] have enhanced e-commerce suggestions based on user reviews using deep learning models like Neural Collaborative Filtering and Hierarchical Attention Networks. Performance was measured using NDCG@K for the Large scale E-commerce user reviews.
Arodh Lal Karn et al. [8] have proposed a customercentric hybrid recommendation system that integrates Hybrid Recommendation Models with Sentiment Analysis for e- commerce. Precision, recall, and F1 score were used in the assessment. Where the reviews were collected from the website of online networking and e-commerce.
Aribowo et al. [9] have analyzed Indonesian YouTube comments using various machine learning models for sentiment
Author Title Methodology and Dataset Metrics Limitation Mohammad Eid Alzaharnai et al. Developing an Intelligent System with Deep Learning Algorithms for Sentiment Analysis of E-Commerce Product Reviews
LSTM and CNN-LSTM models on Amazon web- site reviews
LSTM Accuracy, CNN-LSTM Ac- curacy
Shang Jiatai Shi et al. Enhancing E-Commerce Recommendation Systems with Deep Learning-based Sentiment Analysis of User Re- views
Hierarchical Attention Network, Neural Collaborative Filtering on Large-scale e- commerce reviews NDCG@K Arodh Lal Karn et al. Hybrid recommendation system for E-Commerce applications by integrating hybrid sentiment analysis HRM, Sentiment Analysis (SA) on Online Social
Networks, E-commerce apps
Precision, Recall, F1- score Aribowo et al. Evaluation of preprocessing and tree-based machine learning for sentiment analysis TF-IDF, Extra Trees on Indonesian YouTube comments Accuracy Savci, Pinar et al. Sentiment analysis in multilanguage E-commerce data RNN, CNN, LSTM on Ecommerce app
Accuracy, Precision, Recall, F1score Rizkya et al. Sentiment Analysis of Shopee Reviews using Naive Bayes Naive Bayes on Shopee Review Data Accuracy, Precision, Recall, F1score Sentime nt Handling
Atoutori et al. Enterprise callbot modeling using ML SVM, RF, LR, NB on Call
center data
F1-score, Response Time, Cohens Kappa
Fang et al. Sentiment analysis using Chinese BERT and BiLSTM Chinese BERT + CNNBiLSTM on Chinese ecommerce reviews
Accuracy, F1- score computat ional cost
Ibn Harunarusi et al. Amazon review sentiment analysis using ML MNB, RF, LSTM, CNN with TF-IDF on Amazon product reviews
Confusion matrix, RO
C, AUC, LSTM
Alzahrani et al. Sentiment analysis with LSTM and CNN-LSTM LSTM and CNN-LSTM on Amazon e-commerce reviews
Accuracy of LSTM, CN
NLSTM
- Dataset Specificity
- ClassImbalance
- Focus on Text Only
- Cold-Start Issue
- Dataset Bias
- Interpretability
- Dynamic Sentiment
- Data Sparsity
- Sentiment Errors
- High Complexity
- Slang
- Noisy data
- Spelling errors
- Class Imbalance
- Dataset Size
- Language Variability
- Class Imbalance
- Neutral
- No context understanding
- Lack of emotional intelligence
- High
- Scalability issues
- Context understanding
- Resource-heavy
- Dataset specificity
- Class imbalance
- Narrow textual focus
analysis. Standard preprocessing and feature extraction were applied using the count vectorizer and TF-IDF, and the Extra Tree classifier was used for the accuracy evaluation.
Savci et al. [10] used machine learning and deep learning techniques to perform sentiment analysis on Arabic e- commerce data. The multilingual cased model of the bert base was evaluated using precision and SVM accuracy.
Rizkya et al. [11] have used the Na¨ve Bayes Classifier for sentiment analysis of Shopee reviews, employing preprocessing and classification techniques.Various statistics like Accuracy, precision, recall, and F1-score are used to assess performance.
Aattouri et al. [12] have evaluated AI-based callbots using real call center data, applying models such as KNN, SVM, RF, LR, and NB. The F1 score, response time, and Cohens Kappa are used to assess the models performance.
Fang et al. [13] have proposed a sentiment analysis model using Chinese-BERT with Whole Word Masking and a fused CNN-BiLSTM architecture on Chinese e-commerce reviews. Performance was assessed using accuracy and F1-score.
Bin Harunasir et al. [14] using supervised machine learning models, performed sentiment analysis on Amazon product reviews. They compared MNB, RF, LSTM, and CNN, using TF-IDF Transformer and Vectorizer, and evaluated performance using metrics from the confusion matrix, ROC, and AUC.
Sana Riaz et al. [15] developed a model using machine learning approaches including LR, SVM, NB, RFC, and KNN for sentiment analysis that can detect consumer sentiment to improve market strategies on a multilingual data set of Roman Urdu and Sindhi ecommerce review.
TABLE I: Survey of Sentiment Analysis
- HODOLOGY
This methodology focuses on performing sentiment analysis on customer comments written in Telugu and Hindi lanuguage. The goal is to classify these comments into positive, neutral, or negative sentiments using a combination of Machine Learning, Deep Learning, and Pre-Trained Models.
The below Fig .1 flow diagram illustrates the procedures in a machine learning process with a Random Forest model.
Fig. 1: Flow Diagram
A. Data Collection
The data set used for this sentiment analysis is named converted-reviews.csv.The below in Fig.2 is a self-generated collection of 261 user reviews designed specifically for
analyzing customer opinions. This data set contains two columns.
- Rating – A numerical score between 1.0 and 5.0,given by the user.
Fig. 4: Cleaned Dataset
- Review text customerreview text.
– The original, unprocessed
Fig. 2: Dataset
Here, were preparing the data for our model by distorting the reviews and ratings from each other. The Review- Text column has the actual feedback or reviews given by individuals, and we place that in X. The Rating column indicates how the individual rated the product or service and we place that in y. So essentially, the input is X (what folks said), and the output Y (how they felt about that). This helps the model grasp the connection between the text and the rating.
C. Split into Train/Test
The pie chart below Fig.3 indicates that 37.5 of the reviews are negative, 25.0 is neutral, and 37.5 is positive. That is to say, there is the same proportion of positive and negative feedback, but a lesser part is neutral.
Fig. 3: Pie chart for Rating Category Distribution
B. Data pre-processing
In this data preprocessing step it prepares raw text data for sentiment analysis. In multilingual settings, text comes in different languages and often includes noise such as URLs, emojis, special characters, and inconsistent casing that the computer does not really need.
We clean this review text by:
- Removing emojis
- Removing links or additional punctuation,
- Maintaining helpful words and letters in Hindi, Telugu,and other language
The cleaned dataset, as shown in the Fig.4, has been corrected by erasing any faults or redundant details, and is now ready for analysis. Fix missing values, correct errors, remove emojis, add additional punctuation, and remove duplicates.
During multilingual sentiment analysis data preprocessing, the dataset was split into test and training subsets to objectively determine the performance of the model. An 80:20 split was utilized as a conventional method, with 80 of the data being used to train the model and 20 left out for testing. This method enables the model to learn from a wide variety of multilingual reviews but also maintains a separate section of unseen data to test its generalization capacity. The split was done using a stratified approach, ensuring that the initial sentiment class. distribution remained the same in both sets to avoid data imbalance.
The figures below show the cleaned reviews, Spilt into training and testing datasets.From the Fig.5 80 of the data is used to train the model, and From the Fig. 6 20 is used to test or testing the accuracy of the model.
Fig. 5: Cleaned Train Dataset
Fig. 6: Cleaned Test Dataset
D. Text Vectorization
A critical process in text data preparation formachine learning models is converting the raw text into a form in which algorithms can operate. For this research, the multilingual review text was vectorized through the use of the Term Frequency-Inverse Document Frequency (TF-IDF) method, which translates text into numerical feature vectors. TF-IDF preserves the significance of a word in a review and overall dataset and assists the model in identifying frequent words versus contextually relevant words. Prior to TFIDF application, the text was normalized by removing punctuation, converting to lowercase, and tokenization.
Fig. 7: Words Distribution of Train dataset
The above Figures plots display the distribution of sentence lengths in terms of number of words over the training and test sets of the multilingual sentiment analysis data. Sentence lengths in the training set vary between 2 to 5 words, with the average being roughly
2.96 words shown in the Fig.7.
Fig. 8: Words Distribution of Test dataset
The most frequent is the 2-word sentences, followed very closely by the 3-word entries, meaning that the majority of user reviews are very brief. Likewise, the test set follows the same pattern with sentence lengths ranging similarly from 2 to 5 words and a slightly higher mean of 3.12 words represented in Fig.8.
E. Model Training
After the vecorization training a model using Random Forest Regression involves building an ensemble of decision trees, each trained on a random subset of the data. The idea behind
this method is to combine the predictions of multiple trees to improve accuracy and reduce the risk of overfitting.
Each tree learns patterns by minimizing prediction error, and their outputs are averaged to make final predictions. This approach reduces overfitting, improves accuracy, and provides stable results. The model is trained on labeled data (X-train-vec, y-train) and performs well in complex or noisy regression tasks by combining the strengths of many weak learners.
The model was rained on 80for testing. 100 trees (nestimators=100) were utilized in the forest to maintain a balance between accuracy and computational speed. The classifier was trained on the TF-IDF feature vectors obtained from the multilingual, reviews enabling it to generalize patterns and differences between positive, neutral and negative sentiments in various languages.
After training the Random Forest Regression model, we performed evaluation using several standard regression metrics to assess its predictive performance. The Mean Squared Error (MSE) calculates the average of the squared deviations between the actual value (yi) and predicted values (yi) as:
MSE
where n is the total number of observations, yi is the actual value of the i-th observation, and yi is the predicted value for the same observation.
The percentage of the dependent variables variance that can be predicted from the independent variables is shown by the R- squared score (R²).
where yi is the predicted value, y¯ is the average of all actual values, and yi is the actual value of the i-th observation.
The average of the absolute discrepancies between the expected and actual values is known as the Mean Absolute Error (MAE).
MAE
where n is the total number of observations and yi and yi are the actual and predicted values for the i-th observation, respectively.
The median of all absolute errors is determined by the Median Absolute Error (MedAE) as follows:
MedAE = median(|yi yi|)
The Explained Variance Score (EVS) measures how much of the target variables variance can be accounted for by the model as follows:
EVS
where y and y are vectors of actual and predicted values, respectively. The term Var(y) denotes the variance of actual values, while Var(y y) is the variance of residuals.
On evaluation, Random Forest Regression indicated robust performance with training accuracy and testing accuracy standing at 82.69 and 71.70 respectively. Some of the important regression measurements were an MSE of 0.9861, R² of 0.3849, MAE of 0.7064, MedAE of 0.5237, and EVS of 0.3889. These outcomes demonstrate the models dependability and efficiency in managing multilingual sentiment analysis with little overfitting.
F. Metrics
We used metrics like Accuracy, Support, Precision, Recall, and F1-score to assess the classification models overall performance across the whole dataset.
Out of all predicted positive instances, the precision measures the percentage of correctly predicted positive instances. It is described as:
Precision
where TP represents the true positives, and FP denotes the false positives (incorrectly predicted positives).
The recall, sometimes referred to as sensitivity, measures how well the model can identify every real positive instance. The formula is as follows:
Recall
where FN stands for false negatives (missed positives) and TP
is the number of true positives.
The F1-score balances the two metrics by taking the harmonic mean of precision and recall. When there is an unequal distribution of classes, it is especially helpful. The formula is:
2 · Precision · Recall F1- score = Precision + Recall
This measure is especially helpful in imbalanced datasets, so that neither precision nor recall is neglected.
The models overall correctness across all classes is measured by its accuracy. It is computed as:
Accuracy
where TP and TN are the true positives and true negatives (correctly predicted instances), while FP and FN represent the false positives and false negatives (incorrect predictions).
The number of real instances for every class in the dataset is represented by the Support. It is expressed as:
Support = Number of actual instances in each class
These measures provided a complete picture of the predictive accuracy, strengths, and weaknesses of the model, highlighting its capacity to maintain performance across all classes, especially in imbalanced cases.
- Rating – A numerical score between 1.0 and 5.0,given by the user.
- RESULTS AND DISCUSSION
We used a confusion matrix as a key diagnostic tool to accurately evaluate our Random Forest classifier models multilingual sentiment analysis performance.By seeing the confusion matrix, we can find what sentiment classes are well- managed by the model and in what areas adjustments have to be made, hence enhancing better model adjustment and evaluation. Standard evaluation metrics, including accuracy and a comprehensive classification report with precision, recall, and F1-score for each sentiment class, are then applied to the predicted data that has been converted back to the original scale and ground truth labels. Additionally, a confusion matrix is calculated to show the correct and incorrect prediction distribution over all classes.
Fig. 9: Confusion Matrix of Train dataset
From above Fig.9 it represents the training set 26 negative labels were correct, 34 were wrongly predicted as positive, and no neutral labels were correct. Positive labels were mostly accurate with 123 correct predictions. In Fig.10 the test set, 6 negative labels were correct, 10 were wrongly predicted as positive, and only 2 neutral labels were correct. Positive labels were mostly correct, with 34 accurate predictions.
Fig. 10: Confusion Matrix of Test dataset
The plot displays the Random Forest classifiers training and testing accuracy evolution against simulated epochs under multilingual sentiment analysis. Since Random Forest is an
ensemble model that is non-iterative, simulated epochs were created through incrementally training the model with different data splits or staged parameter tuning. The X-axis represents these simulated epochs, while the Y-axis represents accuracy scores. The accuracy of the training curve shown in Fig.11 captures the extent to which the model approximates the training data, while the Fig.12 shows testing accuracy captures generalization performance on new multilingual reviews. A similar proximity of the two curves implies a well balanced model with little overfitting on disparate languages.
Fig. 11: Train Accuracy vs Simulated Epochs
Fig. 12: Test Accuracy vs Simulated Epochs
The below graph plot Fig.13 which indicates the performance of the model on unseen data as it is being trained. And Fig.14 refer to test loss is determined by the Mean Squared Error (MSE), which measures average squared discrepancy between predicted values and actual values in the test set. For every epoch, the model predicts the test set, and the MSE is determined as the average of the squared errors between actual and predicted values.
Fig. 13: Train Loss vs Epochs
Fig. 14: Test Loss vs Epochs
Out of the accuracy levels from the Table.II, Random Forest performed with a high of 0.75, making its better performance for classifying multilingual sentiment better.Coming close behind was Logistic Regression, scoring an accuracy rate of
0.65 and performing marginally better than that of SVM and Na¨ve Bayes each at a similar 0.63. The lowest accuracy of
0.56 was recorded by the Decision Tree model, indicating that it was the worst performing among the models tested for this task.The Random Forest was shown to be the best model for the multilingual sentiment analysis task.
Model Precision Recall F1-score Support Accuracy Random Forest 0.75 0.75 0.71 51 0.75 SVM (Support Vector Machine) 0.56 0.63 0.58 51 0.63 Na¨ve Bayes 0.56 0.63 0.58 51 0.63 Decision Tree 0.55 0.56 0.54 51 0.56 Logistic Regression 0.57 0.65 0.60 51 0.65 TABLE II: Performance comparison of classification models
- CONCUSION
This study focuses on the effectiveness of machine learning algorithms in multilingual sentiment analysis tasks, with Random Forest being the most accurate (0.75). Although existing models are effective,there is a definite need more for efficient and accurate approaches, particularly for dealing with Multilingual languages like Telugu and Hindi. This highlights the models robustness in handling diverse textual data of multilingual sentiment, confirming its suitability for sentiment analysis and its ability to reduce class imbalance through ensemble learning, leading to more balanced and accurate predictions for tasks involving multiple languages.
REFERENCES
- X. Lin, Sentiment analysis of e-commerce customer reviews based on natural language processing, in Proceedings of the 2020 2nd international conference on big data and artificial intelligence, pp. 32 36, 2020.
- G. Usha and L. Dharmanna, Sentiment analysis on business data using machine learning, in 2021 Second International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE), pp. 16, IEEE, 2021.
- M. K. Ranjan, S. K. Tiwari, and A. M. Sattar, Automatic real time sentiment analysis of online shopping application: Generic model, Authorea Preprints, 2023.
- N. K. Singh, D. S. Tomar, and A. K. Sangaiah, Sentiment analysis: a review and comparative analysis over social media, Journal of Ambient
Intelligence and Humanized Computing, vol. 11, no. 1, pp. 97117, 2020.
- M. M. Aguero-Torales, J. I. A. Salas, and A. G. L¨ opez- Herrera,´ Deep learning and multilingual sentiment analysis on social media data: An overview, Applied Soft Computing, vol. 107, p. 107373, 2021.
- M. E. Alzahrani, T. H. Aldhyani, S. N. Alsubari, M. M. Althobaiti, and
A. Fahad, Developing an intelligent system with deep learning algorithms for sentiment analysis of ecommerce product reviews, Computational Intelligence and Neuroscience, vol. 2022, no. 1, p. 3840071, 2022.
- F. Shang, J. Shi, Y. Shi, and S. Zhou, Enhancing e-commerce recommendation systems with deep learning-based sentiment analysis of user reviews, International Journal of Engineering and Management Research, vol. 14, no. 4, pp. 1934, 2024.
- A. L. Karn, R. K. Karna, B. R. Kondamudi, G. Bagale, D. A. Pustokhin,
I. V. Pustokhina, and S. Sengan, Retracted article: Customer centric hybrid recommendation system for e-commerce applications by integrating hybrid sentiment analysis, Electronic commerce research, vol. 23, no. 1, pp. 279 314, 2023.
- A. Aribowo, H. Basiron, N. Herman, and S. Khomsah, An evaluation of preprocessing steps and tree-based ensemble machine learning for analysing sentiment on indonesian youtube comments, International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 5, 2020.
- P. Savci and B. Das, Prediction of the customers interests using sentiment analysis in e-commerce data for comparison of arabic, english, and turkish languages, Journal of King Saud University-Computer and Information Sciences, vol. 35, no. 3, pp. 227237, 2023.
- A. T. Rizkya, R. Rianto, and A. I. Gufroni, Implementation of the naive bayes classifier for sentiment analysis of shopee e-commerce application review data on the google play store, International Journal of Applied Information Systems and Informatics (JAISI), vol. 1, no. 1, 2023.
- I. Aattouri, H. Mouncif, and M. Rida, Modeling of an artificial intelligence based enterprise callbot with natural language processing and machine learning algorithms, IAES International Journal of Artificial Intelligence, vol. 12, no. 2, p. 943, 2023.
- H. Fang, G. Jiang, and D. Li, Sentiment analysis based on chinese bert and fused deep neural networks for sentencelevel chinese e-commerce product reviews, Systems Science & Control Engineering, vol. 10, no. 1, pp. 802810, 2022.
- M. F. bin Harunasir, N. Palanichamy, S.-C. Haw, and K.W. Ng, Sentiment analysis of amazon product reviews by supervised machine learning models, Journal of Advances in Information Technology, vol. 14, no. 4, pp. 857862, 2023.
- S. Riaz, S. Natha, A. A. Chandio, M. Leghari, and A. J. Syed, Sentiment analysis of multilingual roman text for e-commerce reviews using machine learning approaches, VFAST Transactions on Software Engineering, vol. 13, no. 1, pp. 131140, 2025.
