Detection of Malicious Social Media Bots Using a Multimodal Explainable Framework

Mrs. Jayashree J; Mr. Rohit Durgappa Kattimani; Ms. Khushal Bahubali Patil; Mr. Tejang Chintamani Dandekar

doi:10.17577/IJERTCONV14IS010083

Techprints 9.0 - 2026 (Volume 14 - Issue 01)

Detection of Malicious Social Media Bots Using a Multimodal Explainable Framework

DOI : 10.17577/IJERTCONV14IS010083

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 24
Authors : Mrs. Jayashree J, Mr. Rohit Durgappa Kattimani, Ms. Khushal Bahubali Patil, Mr. Tejang Chintamani Dandekar, Mr. Rahul Suresh Nair
Paper ID : IJERTCONV14IS010083
Volume & Issue : Volume 14, Issue 01, Techprints 9.0
Published (First Online) : 01-03-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Detection of Malicious Social Media Bots Using a Multimodal Explainable Framework

Mrs. Jayashree J

Assistant Professor Department of MCA AJ Institute of Engineering and Technology, Mangalore, India

Mr. Rohit Durgappa Kattimani Department of MCA

AJ Institute of Engineering and Technology, Mangalore, India

Ms. Khushal Bahubali Patil Department of MCA

AJ Institute of Engineering and

Technology,

Mr. Tejang Chintamani Dandekar Department of MCA

AJ Institute of Engineering and Technology, Mangalore, India

Mr. Rahul Suresh Nair Department of MCA

AJ Institute of Engineering and Technology, Mangalore, India

Abstract – Social media platforms are increasingly misused by fake or automated accounts known as bots. These bots spread misinformation, manipulate online visibility, and impersonate real users, posing serious threats to digital trust and platform safety. Traditional bot detection techniques often rely on single- source data (like user behavior or posts) and use complex, non- explainable models, making them ineffective across multilingual and cross-platform scenarios. This research aims to enhance bot detection by reviewing recent advancements and proposing a multimodal, explainable framework. We explore models that utilize diverse data such as account metadata, text content, and user interactions. Notable tools used include RoBERTa, CatBoost, and R-GCN, alongside explainable AI methods like SHAP and LIME to provide transparency in model predictions. Furthermore, we propose an improved architecture that integrates Apache Kafka for real-time bot stream processing with a Transformer-GNN hybrid model for enhanced feature learning. Our comparative analysis shows that combining behavioral, content, and graph features leads to higher detection accuracy than using individual sources alone. SHAP analysis further identifies key features influencing model decisions. The results demonstrate that integrating real-time data, explainability, and multimodal learning can produce robust, interpretable, and scalable bot detection systems. Our approach is not limited to X (formerly Twitter) but is adaptable to other platforms as well. This study supports the development of safer and more trustworthy AI-driven solutions for combating malicious social media bots.

Keywords Social media bots, Bot detection, SHAP, RoBERTa, CatBoost, R-GCN, GNN, Apache Kafka, Explainable AI, Online safety.

INTRODUCTION

Introduction: In today's world, Social media platforms like X (formerly Twitter), Instagram, Telegram are widely used for sharing information, connecting with others, and having public conversations. But these platforms are also being used by automated accounts called bots or spam bots, which are

misusing it. These bots can post, follow, like, comment, or even chat with the users without any human controlling them.Bots are dangerous because they can spread fake news, create false trends, change public opinion, or tick people into thinking theyre real users. They also send spam messages, harmful links, or unwanted content. As these boots become smarter, they behave more like real people, which makes them hard to detect using old methods. In the past, people used simple rules or checked basic account details like number of followers or posting frequency to find bots. Later Machine Learning (ML) and Deep Learning (DL) models were used to analyze user behaviour and connections. But many of these methods only focus on one type of data(like text or network connections), and they dont explain how they make decisions. That makes it hard to catch bots that work on different platforms, try to hide or use many languages. Bots are hard to catch because they keep changing how they act. They learn how to look more human and avoid detection. So, we need smarter systems that use different kinds of data and can explain how they find bots.
This research is important because it helps build better and more trustworthy systems for detecting bots. It will make social media safer and help people trust what they see online
L TERATURE REVIEW

In order to combat misinformation, the paper offers a thorough multimodal framework for X bot detection that combines textual content analysis, graph-based network behavior, and user profile features. It draws attention to shortcomings in earlier research, including out-of-date datasets, sparse feature sets, and lack of reproducibility. In order to surpass earlier state-of-the-art models by 5.48% in accuracy, the suggested model employs the TwiBot-22 dataset and presents a feature-rich model named "The More The Merrier (TMTM)" that combines semantic and graph-based signals. The benefit of integrating feature-based, text-based, and graph-based techniques for a strong bot detection system is highlighted in the paper[1]. In order to detect spambots and fake followers on social networks, this study focuses on clear machine learning techniques. The paper suggests a framework that uses several explainable AI techniques, such as SHAP and LIME, to provide transparency in predictions rather than depending on black-box models. It emphasizes how interpretability aids in determining which characteristics are affecting classification choices. The model outperforms current techniques in terms of performance and reliability after being trained on the Cresci-15 and Cresci-17 datasets. It highlights how important explainability is to fostering confidence in automated bot detection systems, particularly during delicate occasions like elections. [2] Based on a review of 534 research articles that were narrowed down to 49 important documents, this review paper examines the state of social media bot detection techniques. It divides bot detection techniques into several paradigms, such as hybrid approaches, machine learning, and graph analysis. Challenges like bot concealment, methodological errors in previous research, and the developmen of advanced bot strategies are also covered in the paper. It draws attention to

how bots are used in public discourse distortion, political manipulation, and the dissemination of false information. Key research gaps and the necessity of flexible, open, and multi-method bot detection systems are highlighted in the review's conclusion. [3] In order to increase accuracy and reliability, this paper presents CB-MTE, a bot detection framework that combines text, graph, and metadata features. By combining structural patterns from graph embeddings, behavioral portraits, and semantic embeddings (through DistilBERT), it overcomes the drawbacks of single-source approaches. Effective feature fusion is ensured by a manifold learning step, and the final classification is done using CatBoost. On the TwiBot-22 dataset, the method performs noticeably better than both conventional and new models, particularly when it comes to identifying coordinated or dynamically changing bot behaviors. Adaptability across fields like politics, entertainment, and medicine is emphasized in the work. [4]
OPOSED METHODOLOGY

The The methodology used in this study aims to overcome the drawbacks noted in earlier studies and provide a useful framework that integrates real-time streaming detection, multimodal data inputs, and model interpretability. Data collection, preprocessing, feature extraction, model integration, and evaluation are the five primary steps in the process.
We use three phases when it comes to feature extraction:
Fig 3.3.3 X Data Preprocessing
Fig 3.4.4 X Bot Detection With Hybrid Transdormer-GNN

Fig 3.5.5 Model Evalution
E PERIMENTAL ANALYSIS

A collection of multimodal features taken from actual social media datasets, such as text content, user metadata, and network activity, were used to assess the suggested bot detection framework. To evaluate the accuracy and efficacy of several models from the literature in identifying bots, we implemented RoBERTa, CatBoost, R-GCN, and the suggested Transformer-GNN hybrid model. Using libraries like Scikit-learn, Transformers, CatBoost, NetworkX, and SHAP for explainability, the experiment was set up in a Python environment. The models were trained and tested using datasets like TwiBot-22, Cresci-15, and Cresci-17. Standard performance metrics, such as accuracy, confusion matrix, and feature importance, were used for evaluation.
The Transformer-GNN hybrid achieved the highest accuracy (93.4%), demonstrating the strength of combining graph structure with text semantics.

Explainability tools like SHAP clarified which features influenced predictions, increasing trust and transparency.

Using combined (multimodal) features consistently outperformed using content or behavior alone.

Challenges included model training time, data imbalance, and handling noise in real-time data streams.

Fig 4.5.1: Summary of Finding in Bot Detection Study
ONCLUSION

Using a multimodal approach that combines textual content, user behavior, and graph-based interactions, this study presents a reliable and explicable framework for identifying malicious bots on social media. The study shows that combining multiple sources of data greatly increases detection accuracy by using benchmark datasets like TwiBot- 22, Cresci-15, and Cresci-17, as well as advanced algorithms like RoBERTa, CatBoost, R-GCN, and a hybrid Transformer-GNN architecture. The suggested system guarantees high performance and transparency by utilizing Apache Kafka for real-time streaming and SHAP for model interpretability.

A accurate and scalable solution to online disinformation and account manipulation was provided by the end-to-end framework, which was implemented using Python and open- source libraries and included everything from real-time data ingestion to explainable prediction. Transparency into feature contributions was provided by SHAP-based insights, while the Transformer-GNN hybrid model had the highest accuracy of 93.4%. Future research can investigate unsupervised bot detection for unidentified bot behaviors, extend detection to platforms other than X, and improve streaming performance for even quicker bot response mechanisms.

machine learning, IEEE Access, vol. 13, pp. 5224652261,

Mar. 2025. DOI: 10.1109/ACCESS.2025.3551993

B. Rodi, Social media bot detection research: review of literature, Preprint available on arXiv, Mar. 2025.
[Online]. Available: https://arxiv.org/abs/2503.22838
M. Cheng, Y. Xiao, T. Huang, C. Lei, and C. Zhang,

CB-MTE: Social bot detection via multi-source heterogeneous feature fusion,Sensors, vol. 25, no. 11, pp. 3549, Jun. 2025.DOI: 10.3390/s25113549

REFERENCES
1. O. Arranz-Escudero, L. Quijano-Sanchez, and F. Liberatore,Enhancing misinformation countermeasures: a multimodal approach to X bot detection,
  
  Social Network Analysis and Mining, vol. 15, no. 26, 2025.
  
  DOI: 10.1007/s13278-025-01435-w
2. D. Javed, N. Z. Jhanjhi, N. A. Khan, S. K. Ray, A. Al- Dhaqm, and V. R. Kebande, Identification of spambots and fake followers on social network via interpretable AI-based

Feature Set	Accuracy (%)
Behavioral Only	84.3
Content Only	85.1
Combined	93.4

Techprints 9.0 - 2026 (Volume 14 - Issue 01)

Detection of Malicious Social Media Bots Using a Multimodal Explainable Framework

Detection of Malicious Social Media Bots Using a Multimodal Explainable Framework

Impact of Bot Activity on Real World

Data collection

Preprocessing Data

Extraction of Features

Model Evaluation

Design and Integration of Models

Confusion Matrix Analysis

Figure 4.1.1: Confusion Matrix without Normalization

Figure 4.1.2: Normalized Confusion Matrix Normalized Confusion Matrix

Model Performance Comparison

Figure 4.2.1: Accuracy Comparison of Different Models

Feature Importance Analysis

Figure 4.3.1: SHAP Feature Importance Plot

Feature Set Comparison

Figure 4.4.1: Performance of Different Feature Sets

Discussion of Findings

Fig 4.5.1: Summary of Finding in Bot Detection Study

Model Accuracy (%)
RoBERTa	88.2
CatBoost	89.1
R-GCN	90.5
Transformer-GNN	93.4