DOI : 10.17577/IJERTCONV14IS010083- Open Access

- Authors : Mrs. Jayashree J, Mr. Rohit Durgappa Kattimani, Ms. Khushal Bahubali Patil, Mr. Tejang Chintamani Dandekar, Mr. Rahul Suresh Nair
- Paper ID : IJERTCONV14IS010083
- Volume & Issue : Volume 14, Issue 01, Techprints 9.0
- Published (First Online) : 01-03-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Detection of Malicious Social Media Bots Using a Multimodal Explainable Framework
Mrs. Jayashree J
Assistant Professor Department of MCA AJ Institute of Engineering and Technology, Mangalore, India
Mr. Rohit Durgappa Kattimani Department of MCA
AJ Institute of Engineering and Technology, Mangalore, India
Ms. Khushal Bahubali Patil Department of MCA
AJ Institute of Engineering and
Technology,
Mr. Tejang Chintamani Dandekar Department of MCA
AJ Institute of Engineering and Technology, Mangalore, India
Mr. Rahul Suresh Nair Department of MCA
AJ Institute of Engineering and Technology, Mangalore, India
Abstract – Social media platforms are increasingly misused by fake or automated accounts known as bots. These bots spread misinformation, manipulate online visibility, and impersonate real users, posing serious threats to digital trust and platform safety. Traditional bot detection techniques often rely on single- source data (like user behavior or posts) and use complex, non- explainable models, making them ineffective across multilingual and cross-platform scenarios. This research aims to enhance bot detection by reviewing recent advancements and proposing a multimodal, explainable framework. We explore models that utilize diverse data such as account metadata, text content, and user interactions. Notable tools used include RoBERTa, CatBoost, and R-GCN, alongside explainable AI methods like SHAP and LIME to provide transparency in model predictions. Furthermore, we propose an improved architecture that integrates Apache Kafka for real-time bot stream processing with a Transformer-GNN hybrid model for enhanced feature learning. Our comparative analysis shows that combining behavioral, content, and graph features leads to higher detection accuracy than using individual sources alone. SHAP analysis further identifies key features influencing model decisions. The results demonstrate that integrating real-time data, explainability, and multimodal learning can produce robust, interpretable, and scalable bot detection systems. Our approach is not limited to X (formerly Twitter) but is adaptable to other platforms as well. This study supports the development of safer and more trustworthy AI-driven solutions for combating malicious social media bots.
Keywords Social media bots, Bot detection, SHAP, RoBERTa, CatBoost, R-GCN, GNN, Apache Kafka, Explainable AI, Online safety.
-
INTRODUCTION
Introduction: In today's world, Social media platforms like X (formerly Twitter), Instagram, Telegram are widely used for sharing information, connecting with others, and having public conversations. But these platforms are also being used by automated accounts called bots or spam bots, which are
misusing it. These bots can post, follow, like, comment, or even chat with the users without any human controlling them.Bots are dangerous because they can spread fake news, create false trends, change public opinion, or tick people into thinking theyre real users. They also send spam messages, harmful links, or unwanted content. As these boots become smarter, they behave more like real people, which makes them hard to detect using old methods. In the past, people used simple rules or checked basic account details like number of followers or posting frequency to find bots. Later Machine Learning (ML) and Deep Learning (DL) models were used to analyze user behaviour and connections. But many of these methods only focus on one type of data(like text or network connections), and they dont explain how they make decisions. That makes it hard to catch bots that work on different platforms, try to hide or use many languages. Bots are hard to catch because they keep changing how they act. They learn how to look more human and avoid detection. So, we need smarter systems that use different kinds of data and can explain how they find bots.
-
Impact of Bot Activity on Real World
Real-life incidents show how harmful bots can be. These automated accounts have influenced elections, spread health misinformation, and tricked social media users into scams. Below are some examples that highlight why bot detection systems are urgently needed:
-
COVID-19 Vaccine Myths: Bots shared false health information during the pandemic, leading to fear and confusion about vaccination.
-
Celebrity Scam Bots: Fake profiles of Elon Musk and other celebrities promoted Bitcoin scams, tricking people into sending money.
-
Election Misinformation (USA, 2020): Bots were used to spread fake news during the U.S. presidential
election, creating confusion and mistrust among voters.
-
Russian Bot Interference: Thousands of fake accounts controlled by groups like the Internet Research Agency were used to influence political discussions in the U.S.
These real-world events highlight the need for advanced and reliable bot detection systems to ensure digital safety and trust.
In this research, we aim to:
-
Find bots by looking how users behave, what they post, and who they interact with.
-
Compare different ML and DL models used in bot detection.
-
Improve bot detection by using models that give clear explanations of how they work.
-
Present a combination approach consisting of Transformer and Graph Neural Networks (GNNs) for learning from connections and content, as well as a real-time detection method that uses Apache Kafka for live data streaming.
-
Make AI decisions more comprehensible and reliable by using explainability tools such as SHAP.
-
-
This research is important because it helps build better and more trustworthy systems for detecting bots. It will make social media safer and help people trust what they see online
-
-
L TERATURE REVIEW
In order to combat misinformation, the paper offers a thorough multimodal framework for X bot detection that combines textual content analysis, graph-based network behavior, and user profile features. It draws attention to shortcomings in earlier research, including out-of-date datasets, sparse feature sets, and lack of reproducibility. In order to surpass earlier state-of-the-art models by 5.48% in accuracy, the suggested model employs the TwiBot-22 dataset and presents a feature-rich model named "The More The Merrier (TMTM)" that combines semantic and graph-based signals. The benefit of integrating feature-based, text-based, and graph-based techniques for a strong bot detection system is highlighted in the paper[1]. In order to detect spambots and fake followers on social networks, this study focuses on clear machine learning techniques. The paper suggests a framework that uses several explainable AI techniques, such as SHAP and LIME, to provide transparency in predictions rather than depending on black-box models. It emphasizes how interpretability aids in determining which characteristics are affecting classification choices. The model outperforms current techniques in terms of performance and reliability after being trained on the Cresci-15 and Cresci-17 datasets. It highlights how important explainability is to fostering confidence in automated bot detection systems, particularly during delicate occasions like elections. [2] Based on a review of 534 research articles that were narrowed down to 49 important documents, this review paper examines the state of social media bot detection techniques. It divides bot detection techniques into several paradigms, such as hybrid approaches, machine learning, and graph analysis. Challenges like bot concealment, methodological errors in previous research, and the developmen of advanced bot strategies are also covered in the paper. It draws attention to
how bots are used in public discourse distortion, political manipulation, and the dissemination of false information. Key research gaps and the necessity of flexible, open, and multi-method bot detection systems are highlighted in the review's conclusion. [3] In order to increase accuracy and reliability, this paper presents CB-MTE, a bot detection framework that combines text, graph, and metadata features. By combining structural patterns from graph embeddings, behavioral portraits, and semantic embeddings (through DistilBERT), it overcomes the drawbacks of single-source approaches. Effective feature fusion is ensured by a manifold learning step, and the final classification is done using CatBoost. On the TwiBot-22 dataset, the method performs noticeably better than both conventional and new models, particularly when it comes to identifying coordinated or dynamically changing bot behaviors. Adaptability across fields like politics, entertainment, and medicine is emphasized in the work. [4]
-
OPOSED METHODOLOGY
The The methodology used in this study aims to overcome the drawbacks noted in earlier studies and provide a useful framework that integrates real-time streaming detection, multimodal data inputs, and model interpretability. Data collection, preprocessing, feature extraction, model integration, and evaluation are the five primary steps in the process.
-
Data collection
The benchmark datasets TwiBot-20, TwiBot-22, Cresci-15, and Cresci-17, which include labeled examples of actual users and bots from X, are the main source of data for our study. These datasets enable thorough multimodal analysis and include user profiles, tweet content, and user interaction graphs. Apache Kafka was used to stream social media data in real-time, simulating the deployment of bot detection systems in real- world streaming scenarios.
Fig 3.1.1 Multimodal X Bot Detection
-
Preprocessing Data
A preliminary processing workflow designed to manage various data methods is applied to the gathered data. This includes:
-
Text cleaning (deleting mentions, hashtags, and URLs)
-
Formatting metadata (such as normalizing follower counts)
-
Graph construction (creating networks of user interactions)
-
Support for multiple languages through the use of tokenizers that work with transformer-based models (like RoBERTa)
Fig 3.2.2 X Data Preprocessing
-
-
Extraction of Features
We use three phases when it comes to feature extraction:
-
Model Evaluation
Standard metrics such as accuracy, precision, recall, F1-score, and ROC-AUC are used to assess models. Additionally, we evaluate model generalization across platforms and compare clarity using SHAP values. Kafka's real-time performance guarantees low latency and usefulness.
-
Textual features: semantic embeddings
RoBERTa transformer-based
-
Features based on graphs: Relational Graph Convolutional Networks (R-GCN) were used for extraction.
-
Tweet timing, follower/following ratio, retweet ratio, and other metadata features are included.
-
Fig 3.3.3 X Data Preprocessing
-
Design and Integration of Models
We suggest a hybrid Transformer-GNN architecture that combines the advantages of R-GCN for graphs and RoBERTa for text in order to overcome the shortcomings of current systems. A manifold learning layer is used to fuse the extracted features. CatBoost, which was selected for its performance and ability to handle heterogeneous data, is used for classification.
-
This hybrid model is applied to Kafka streaming data.
-
SHAP, which indicates which features resulted in a bot prediction, enables explainability.
-
Fig 3.4.4 X Bot Detection With Hybrid Transdormer-GNN
Fig 3.5.5 Model Evalution
-
-
E PERIMENTAL ANALYSIS
A collection of multimodal features taken from actual social media datasets, such as text content, user metadata, and network activity, were used to assess the suggested bot detection framework. To evaluate the accuracy and efficacy of several models from the literature in identifying bots, we implemented RoBERTa, CatBoost, R-GCN, and the suggested Transformer-GNN hybrid model. Using libraries like Scikit-learn, Transformers, CatBoost, NetworkX, and SHAP for explainability, the experiment was set up in a Python environment. The models were trained and tested using datasets like TwiBot-22, Cresci-15, and Cresci-17. Standard performance metrics, such as accuracy, confusion matrix, and feature importance, were used for evaluation.
-
Confusion Matrix Analysis
The confusion matrices shown below present the classification performance of our models on the test set.
Figure 4.1.1: Confusion Matrix without Normalization
Confusion matrix, without normalization [[13 0 0] [0 10 6]
[0 0 0]]Figure 4.1.2: Normalized Confusion Matrix Normalized Confusion Matrix
[[1. 0. 0. ] [0. 0.62 0.38] [0 . 0. 1. ]]The number of actual users and bots that were correctly or incorrectly classified is made easier to understand by these visualizations. Figure 2's normalized matrix provides a more transparent cross-class comparison.
-
Model Performance Comparison
Each model was evaluated based on overall accuracy. The Transformer-GNN hybrid outperformed the others, demonstrating its ability to learn both content and connection patterns effectively.
Figure 4.2.1: Accuracy Comparison of Different Models
Model Accuracy (%)
RoBERTa
88.2
CatBoost
89.1
R-GCN
90.5
Transformer-GNN
93.4
-
Feature Importance Analysis
For interpretability, SHAP (SHapley Additive exPlanations) was used to identify which features contributed the most to bot classification in the CatBoost model.
Figure 4.3.1: SHAP Feature Importance Plot
-
Feature Set Comparison
To analyze the impact of different types of features, we compared three scenarios:
Behavioral features only (e.g., tweet frequency, login times) Content features only (e.g., text embeddings)
Feature Set
Accuracy (%)
Behavioral Only
84.3
Content Only
85.1
Combined
93.4
Figure 4.4.1: Performance of Different Feature Sets
-
Discussion of Findings
The Transformer-GNN hybrid achieved the highest accuracy (93.4%), demonstrating the strength of combining graph structure with text semantics.
Explainability tools like SHAP clarified which features influenced predictions, increasing trust and transparency.
Using combined (multimodal) features consistently outperformed using content or behavior alone.
Challenges included model training time, data imbalance, and handling noise in real-time data streams.
Fig 4.5.1: Summary of Finding in Bot Detection Study
-
-
ONCLUSION
Using a multimodal approach that combines textual content, user behavior, and graph-based interactions, this study presents a reliable and explicable framework for identifying malicious bots on social media. The study shows that combining multiple sources of data greatly increases detection accuracy by using benchmark datasets like TwiBot- 22, Cresci-15, and Cresci-17, as well as advanced algorithms like RoBERTa, CatBoost, R-GCN, and a hybrid Transformer-GNN architecture. The suggested system guarantees high performance and transparency by utilizing Apache Kafka for real-time streaming and SHAP for model interpretability.
A accurate and scalable solution to online disinformation and account manipulation was provided by the end-to-end framework, which was implemented using Python and open- source libraries and included everything from real-time data ingestion to explainable prediction. Transparency into feature contributions was provided by SHAP-based insights, while the Transformer-GNN hybrid model had the highest accuracy of 93.4%. Future research can investigate unsupervised bot detection for unidentified bot behaviors, extend detection to platforms other than X, and improve streaming performance for even quicker bot response mechanisms.
machine learning, IEEE Access, vol. 13, pp. 5224652261,
Mar. 2025. DOI: 10.1109/ACCESS.2025.3551993
-
B. Rodi, Social media bot detection research: review of literature, Preprint available on arXiv, Mar. 2025.
[Online]. Available: https://arxiv.org/abs/2503.22838 -
M. Cheng, Y. Xiao, T. Huang, C. Lei, and C. Zhang,
CB-MTE: Social bot detection via multi-source heterogeneous feature fusion,Sensors, vol. 25, no. 11, pp. 3549, Jun. 2025.DOI: 10.3390/s25113549
REFERENCES
-
O. Arranz-Escudero, L. Quijano-Sanchez, and F. Liberatore,Enhancing misinformation countermeasures: a multimodal approach to X bot detection,
Social Network Analysis and Mining, vol. 15, no. 26, 2025.
DOI: 10.1007/s13278-025-01435-w
-
D. Javed, N. Z. Jhanjhi, N. A. Khan, S. K. Ray, A. Al- Dhaqm, and V. R. Kebande, Identification of spambots and fake followers on social network via interpretable AI-based
-
