Fake News Detection In Social Media

DOI : 10.17577/IJERTV12IS060029

Download Full-Text PDF Cite this Publication

Text Only Version

Fake News Detection In Social Media

Preetham H, Prithviraj T Chavan, Pranav R and Prathik Vittal Department of Computer Science And Engineering

B.M.S. College of Engineering Bengaluru, Karnataka, India

Mr. Vikranth B M

Department of Computer Science And Engineering Visveswaraya Technological University, Belgaum Bengaluru, Karnataka, India

Abstract In this study, we propose an ensemble classifier that combines LSTM, Naive Bayes, random forest, and SVM algorithms to effectively detect fake news in social media. By analyzing key features extracted from article titles and content, such as click bait and misleading language, the system assigns a binary score to evaluate credibility. Experimental results demonstrate superior performance in terms of accu- racy, precision, and recall compared to individual classifiers. Our research offers a practical tool for individuals, social media platforms, and news organizations to combat fake news, promoting trustworthy information dissemination and fostering an informed online community with integrity in public discourse.

Index Terms Fake News detection, Word2Vec, Tokenisation, Lemmatization, Stemming, LSTM, Naive Bayes, Random For- est, SVM, Click bait, Computational efficiency, Real-time, Key- points.

  1. INTRODUCTION

    The rise of social media has revolutionized the way we access and consume news. With a substantial increase in the number of active social media users, platforms like Facebook, Twitter, and Instagram have become popular sources for news content. This shift in news consumption patterns has brought both benefits and challenges. While social media offers convenience and accessibility, it has also given rise to the proliferation of fake news or delib- erately misleading information. As a result, ensuring the accuracy and credibility of news articles shared on social media has become a critical concern.

    In countries like India, the percentage of active social

    media users has witnessed a significant surge, indicating a growing reliance on these platforms for news consump- tion. However, the standards and quality of news articles on social media often fall short compared to traditional news organizations. The ease of publishing and sharing news online, coupled with the speed of dissemination through social media, has paved the way for the dissem- ination of fake news, driven by motives such as financial gain or political manipulation.

    The detrimental effects of fake news on the public and

    the overall news ecosystem necessitate the development

    of efficient methods to identify and combat this issue. Social media platforms pose unique challenges for fake news detection, rendering traditional detection algorithms ineffective or irrelevant. Hence, there is a pressing need for innovative approaches that leverage artificial intelligence, natural language processing, and machine learning tech- niques to discern between genuine and fake news articles. In this paper, we present a comprehensive system that addresses the challenge of detecting fake news on social media platforms. Our system utilizes a hybrid approach, combining concepts from natural language processing and neural networks, to perform binary classification of news articles. By analyzing various linguistic features and employing different natural language processing models, we aim to accurately determine the credibility of news content. Additionally, our system enables users to classify news articles as fake or real and provides checks for the authenticity of the sources or websites publishing the

    news.

    The primary objective of this research is to improve the accuracy and reliability of fake news detection on social media platforms. By empowering users to make informed decisions about the credibility of news articles, we aim to mitigate the negative impact of fake news on public discourse and ensure a more trustworthy and informed online community.

  2. PROBLEM STATEMENT

    The proliferation of fake news in social media poses a significant threat to the integrity of information and the trustworthiness of news sources. The lack of reliable tools to detect and counter fake news leads to the spread of misinformation, which can have severe consequences, including the erosion of democracy, public health, and social stability. Therefore, there is a need for a system that can accurately detect and classify fake news. Through this project, we propose to work on different probabilistic and machine learning approaches to make a hybrid fake news detection technique.

  3. LITERATURE SURVEY

    In [1],the authors discuss various deep learning models, datasets, evaluation metrics, and future directions in the field. They emphasize the importance of combating the spread of fake news and highlight the applicability of deep learning techniques such as CNNs, RNNs, and Transform- ers. The review provides valuable insights into the current state-of-the-art methods and serves as a valuable resource for researchers and practitioners in the field of fake news detection.

    In [2], the study reviews existing research in the field and presents a novel approach using a CNN-LSTM ar- chitecture. The authors discuss the significance of stance detection in combating fake news and highlight the limitations of traditional methods. They present an in- depth analysis of the proposed CNN-LSTM model and its effectiveness in accurately classifying the stance of news articles. This literature survey provides insights into the advancements in deep learning for fake news detection and contributes to the development of more robust de- tection systems. In [3], the study provides a comprehensive literature survey of existing research in the field, highlighting the importance of fake news detection and the limitations of traditional methods. The authors propose the OPCNN- FAKE model, which leverages CNN architecture and opti- mization techniques to improve the accuracy of fake news classification. The survey sheds light on the advancements in CNN-based approaches for fake news detection and contributes to the ongoing efforts in building effective and efficient detection systems.

    In [4], the study provides an overview of the current research landscape, emphasizing the significance of fake news detection and the challenges posed by social media platforms. The authors propose a novel approach that combines linguistic analysis and knowledge-based tech- niques to improve the accuracy of fake news detection. The survey highlights the importance of integrating mul- tiple methodologies and knowledge sources in combating fake news and contributes to the development of effective detection mechanisms.

    In [5], the study reviews existing research in the field, highlighting the challenges of fake news detection and the importance of considering multiple modalities. The au- thors propose a novel network architecture that incorpo- rates cross-modal attention mechanisms to capture inter- dependencies between textual and visual information. The survey provides insights into the advancements in multi- modal approaches for fake news detection and contributes to the development of more accurate and robust detection systems.

    In [6], the study provides an extensive overview of existing research, categorizing and analyzing various ap- proaches used for fake news classification. The authors cover a wide range of techniques, including linguistic analysis, deep learning, machine learning, and ensemble

    methods. They discuss the strengths and limitations of each approach, along with implementation aspects. This survey serves as a valuable resurce for researchers and practitioners, providing insights into the diverse landscape of fake news classification techniques and offering guid- ance for future research in the field.

    In [7], the study explores the roles of different stake- holders, including governments, social media platforms, fact-checking organizations, and individuals, in address- ing the issue of fake news. The authors analyze various strategies and technologies employed to detect and mit- igate the spread of fake news, such as machine learning algorithms, natural language processing techniques, and user feedback systems. This survey serves as a valuable resource, offering insights into the multifaceted nature of fake news and providing a foundation for developing effective strategies and solutions to counter its impact.

    In [8], the study focuses on the unique challenges posed by the Arabic language and reviews existing research in the field. The author introduces JointBert, a novel approach that combines joint modeling and BERT- based language representation to enhance the accuracy of Arabic fake news detection. The survey highlights the advancements in Arabic fake news detection techniques and contributes to the development of specialized methods for address- ing the specific characteristics of the Arabic language in combating the spread of misinformation.

    In [9], the study explores the existing research in the field and reviews various ensemble techniques employed for detecting and combating fake news. The authors discuss the advantages of ensemble methods in improving the accuracy and robustness of fake news detection mod- els. The survey serves as a valuable resource, highlighting the advancements in machine learning ensemble tech- niques for fake news detection and offering insights into potential solutions for addressing the challenges posed by misinformation.

    In [10], the study examines the existing research land- scape and surveys the techniques and methodologies em- ployed to identify fake news spreaders, including cyborgs, bots, and humans. The authors discuss various features, data sources, and machine learning algorithms utilized for detecting and distinguishing between different types of fake news spreaders. The survey provides insights into the advancements in identifying and understanding the behavior of individuals involved in spreading fake news, contributing to the development of effective countermea- sures against the propagation of misinformation.

    In [11], the study explores existing research in the field and reviews various techniques and methodologies used for fake news detection. The authors discuss the importance of feature selection and optimization in im- proving the performance of fake news detection systems. The survey provides insights into the advancements in feature-based classification approaches and their effec- tiveness in accurately identifying and categorizing fake news. This research serves as a valuable resource, guiding

    the development of optimized classification methods for combatting the spread of misinformation.

    In [12], the study explores existing research in the field and reviews different approaches and methodologies employed for detecting deceptive reviews. The authors discuss the importance of utilizing deception theories in developing a unified detection model for fake online re- views. The survey provides insights into the advancements in identifying and distinguishing between genuine and fake reviews, contributing to the development of more accurate and robust detection methods to combat the prevalence of deceptive practices in online platforms.

    In [13], the study explores existing research in the field and reviews various techniques and methodologies utilized for fake news detection. The authors highlight the importance of incorporating multi-view attention mech- anisms to capture diverse information sources and im- prove the accuracy of fake news detection models. The survey provides insights into the advancements in utilizing attention networks for analyzing social media content and identifying misleading or fabricated information. This research contributes to the development of effective ap- proaches for combating the spread of fake news on social media platforms.

    In [14], the study explores existing research in the field and reviews different techniques and methodologies employed for detecting and verifying the authenticity of media content. The authors emphasize the significance of combining NLP techniques with blockchain technology to enhance the accuracy and transparency of fake media detection systems. The survey provides insights into the advancements in leveraging NLP and blockchain for com- bating the spread of misinformation through manipulated media. This research contributes to the development of robust and trustworthy methods for identifying and pre- venting the dissemination of fake media.

  4. METHODOLOGY

    This section outlines the methodology for building a fake news detection system using LSTM, SVM, Naive Bayes, and Random Forest classifiers. The proposed sys- tem follows a series of steps, including data collection, pre-processing, feature extraction, model training, ensem- ble classifier creation, evaluation, refinement, and deploy- ment.

    1. Data Collection

      To develop an effective fake news detection system, a diverse and representative dataset of news articles is essential. The dataset should encompass various topics and sources, including both real and fake news articles. Collecting such a dataset involves leveraging existing publicly available datasets, web scraping, and manual annotation. The size of the dataset should be sufficiently large to ensure robust model training and evaluation.

    2. Pre-Processing

      The collected dataset requires pre-processing to remove irrelevant information and convert the raw text data into a suitable format for subsequent analysis. This involves several steps, such as removing URLs, hashtags, and emoticons, tokenizing the text into individual words, con- verting words to lowercase, removing stop words, applying stemming or lemmatization to reduce word variations, and eliminating special characters or punctuation marks. The pre-processed text data serves as the input for feature extraction and model training.

      A psudeocode for pre processing :

      1. Remove irrelevant information (e.g., URLs, hashtags, emoticons)

      2. Tokenize the text data into individual words

      3. Convert words to lowercase and remove stop words

      4. Apply stemming or lemmatization to reduce word variations

      5. Remove any remaining special characters or punctu- ation marks

      6. Return the preprocessed data

    3. Feature Extraction

      Feature extraction plays a crucial role in capturing the essence of the pre- processed text data. In this step, rele- vant features are extracted to represent the news articles and serve as input for the machine learning models. Various techniques can be employed, such as continuous bag-of- words, skip-grams, or word embeddings. These techniques aim to capture the semantic information and contextual relationships between words in the news arti- cles.

      A psudeocode for feature extraction(using Word2Vec) :

      1. Initialize a Word2Vec model with specified parame- ters

      2. Train the Word2Vec model on the preprocessed text data

      3. Obtain word embeddings for each word in the data

      4. Average the word embeddings to obtain feature vectors for each news article

      5. Return the feature vectors

    4. Model Training

      Once the feature vectors are obtained, individual LSTM, SVM, Naive Bayes, and Random Forest models are trained on the dataset. Model training involves partitioning the dataset into training and validation sets, tuning the hy- perparameters of each model, and optimizing their per- formance. The training process seeks to leverage the extracted features to learn the patterns and characteristics of real and fake news articles.

    5. Ensemble Classifier Creation

      To enhance the performance and robustness of the fake news detection system, an ensemble classifier is created by combining the predictions from the individual models. This integration of multiple classifiers leverages their di- verse capabilities and helps mitigate individual model bi- ases. The ensemble classifier can employ techniques such as majority voting or weighted averaging to aggregate the predictions and make the final determination of whether a news article is real or fake.

    6. Evaluation

    The performance of the ensemble classifier is evaluated using appropriate evaluation metrics, such as accuracy, precision, recall, and F1 score. The evaluation process involves assessing the systems ability to correctly classify real and fake news articles. To ensure reliable evaluation, the

  5. PROPOSED SYSTEM

    In this paper, we propose a comprehensive system for detecting fake news in social media using a hybrid en- semble classifier approach. By leveraging the strengths of multiple machine learning algorithms, we aim to improve the accuracy and robustness of fake news detection while mitigating the risk of overfitting.

    The proposed system consists of several key compo- nents. Firstly, we preprocess the social media data, elimi- nating irrelevant information and formatting the text data for machine learning algorithms. We employ Word2Vec algorithms to extract relevant features from the text data, which are crucial for accurate fake news detection.

    To achieve accurate classification, our system utilizes four distinct machine learning algorithms: Long Short- Term Memory (LSTM), Naive Bayes, Random Forest, and Support Vector Machine (SVM). Each classifier employs a unique approach: Naive Bayes utilizes probabilistic mod- eling, Random Forest employs Decision Trees, SVM utilizes a mathematical function, and our LSTM network leverages unsupervised learning techniques to identify patterns and features common to fake news titles.

    The output of each classifier is combined using a weighted voting scheme, where the weights are deter- mined based on individual classifier performance on a validation set. This ensemble classifier approach ensures a robust and accurate final classification result.

    Our proposed system not only offers improved perfor- mance compared to single classifiers but also addresses the challenge of fake news detection in social media, where conventional techniques often fall short. By com- bining the strengths of different algorithms, we aim to provide a reliable tool for identifying and combatting the spread of fake news, thereby contributing to the integrity of information and promoting trust in social media platforms.

    To evaluate the effectiveness of our proposed system, extensive experiments will be conducted on real-world

    datasets, comparing its performance against state-of-the- art methods. The results will demonstrate the superiority of our hybrid ensemble classifier approach in detecting fake news, paving the way for more effective measures to combat misinformation and promote information credi- bility in the era of social media.

  6. PROSPECTIVE IMPLEMENTATION PLAN

    The prospective implementation plan for the project involves a systematic approach to developing a robust fake news detection system. The plan encompasses var- ious stages, starting with the collection of a diverse and representative dataset of news articles. This dataset will serve as the foundation for training and evaluating the models.

    Once the dataset is obtained, the next step is to pre- process the data, removing irrelevant information and converting the text into a format suitable for analysis. Techniques such as text cleaning, tokenization, and stem- ming will be applied to ensure the quality and consistency of the data.

    The project will involve training multiple models, in- cluding Long Short- Term Memory (LSTM), Support Vector Machine (SVM), Naive Bayes, and Random Forest. Each model will be trained using the extracted features and fine-tuned to optimize its performance. The training pro- cess will involve splitting the dataset into training and validation sets, evaluating the models performance using appropriate metrics, and iteratively refining the models to achieve the best possible accuracy and generalization. To enhance the predictive power and reliability of the system, an ensemble classifier will be constructed by combining the predictions of the individual models. Techniques such as majority voting or weighted averaging will be employed to make the final classification decision.

    Following the model development phase, the plan en- tails the implementation of a real-time deployment plat- form. This platform will enable the detection of fake news in social media as it is encountered, providing timely and accurate results to users. The platform will be designed for scalability, efficiency, and ease of use.

    Throughout the implementation process, rigorous test- ing and evaluation will be conducted to ensure the ef- fectiveness and robustness of the developed system. Per- formance metrics such as accuracy, precision, recall, and F1 score will be used to assess the systems performance against a benchmark dataset.

    Additionally, documentation of the implementation process, including the methodologies, algorithms, and code, will be prepared to ensure reproducibility and fa- cilitate future enhancements or extensions of the system. Effective communication and collaboration among team members will be essential throughout the implementation process. Regular meetings, progress updates, and feedback sessions will be conducted to ensure the project stays on track and meets the specified objectives.

  7. CONCLUSIONS

In conclusion, the prevalence of fake news in todays society and its rapid dissemination through social media platforms highlight the urgent need for effective detection methods. This survey paper explores the use of natu- ral language processing (NLP) techniques and machine learning algorithms to tackle the problem of fake news detection in social media. We propose an ensemble clas- sifier that combines the strengths of LSTM, Naive Bayes, Random Forest, and SVM algorithms to accurately identify fake news in social media feeds. The availability of public datasets, such as those found on platforms like Kaggle, facilitates the training of models. The process involves tokenization, data purification, data preprocessing, and text encoding. Through a combination of supervised and unsupervised learning approaches, each piece of text corpus is classified using a binary score representing the credibility of the article. This comprehensive survey provides insights into the state-of-the- art techniques used in fake news detection and offers valuable guidance for future research in this field.

ACKNOWLEDGMENT

We would like to thank Mr. Vikranth for his valuable comments, suggestions to improve the quality of the paper and for helping us review our work regularly. We would also like to thank the Department of Computer Science and Engineering, B.M.S. College of Engineering for providing us with opportunity to encourage us to write this paper.

REFERENCES

[1] M. F. Mridha, A. J. Keya, M. A. Hamid, M. M. Monowar and M. S. Rahman, "A Comprehensive Review on Fake News Detection With Deep Learning," in IEEE Access, vol. 9, pp. 156151-156170, 2021, doi: 10.1109/ACCESS.2021.3129329.

[2] M. Umer, Z. Imtiaz, S. Ullah, A. Mehmood, G. S. Choi and B. -W. On, "Fake News Stance Detection Using Deep Learning Architecture (CNN-LSTM)," in IEEE Access, vol. 8, pp. 156695-156706, 2020, doi: 10.1109/ACCESS.2020.3019735.

[3] H. Saleh, A. Alharbi and S. H. Alsamhi, "OPCNN-FAKE: Opti- mized Convolutional Neural Network for Fake News Detection," in IEEE Access, vol. 9, pp. 129471- 129489, 2021, doi: 10.1109/AC- CESS.2021.3112806.

[4] N. Seddari, A. Derhab, M. Belaoued, W. Halboob, J. Al-Muhtadi and

A. Bouras, "A Hybrid Linguistic and Knowledge-Based Analysis Ap- proach for Fake News Detection on Social Media," in IEEE Access, vol. 10, pp. 62097-62109, 2022, doi: 10.1109/ACCESS.2022.3181184.

[5] L. Ying, H. Yu, J. Wang, Y. Ji and S. Qian, "Multi-Level Multi-Modal Cross-Attention Network for Fake News Detection," in IEEE Access, vol. 9, pp. 132363-132373, 2021, doi: 10.1109/ACCESS.2021.3114093.

[6] D. Rohera et al., "A Taxonomy of Fake News Classification Tech- niques: Survey and Implementation Aspects," in IEEE Access, vol. 10, pp. 30367-30394, 2022, doi: 10.1109/ACCESS.2022.3159651.

[7] A. Gupta et al., "Combating Fake News: Stakeholder Interventions and Potential Solutions," in IEEE Access, vol. 10, pp. 78268-78289, 2022, doi: 10.1109/ACCESS.2022.3193670.

[8] W. Shishah, "JointBert for Detecting Arabic Fake News," in IEEE Access, vol.

10, pp. 71951-71960, 2022, doi: 10.1109/AC- CESS.2022.3185083.

[9] I. Ahmad , Md. Yousaf, S. Yousaf and Md. Ovais Ahmad, Fake News Detection Using Machine Learning Ensemble Methods, in Elsevier Journal, vol 3, pp. 70882- 70901, 2022, doi:10.1155/2020/8885861.

[10] W. Shahid, Y. Li, D. Staples, G. Amin, S. Hakak and A. Ghorbani, "Are You a Cyborg, Bot or Human?A Survey on Detecting Fake News Spreaders," in IEEE Access, vol. 10, pp. 27069-27083, 2022, doi: 10.1109/ACCESS.2022.3157724.

[11] Ravish, R. Katarya, D. Dahiya and S. Checker, "Fake News Detection System Using Featured-Based Optimized MSVM Classification," in IEEE Access, vol. 10, pp. 113184-113199, 2022, doi: 10.1109/AC- CESS.2022.3216892.

[12] M. Abdulqader, A. Namoun and Y. Alsaawy, "Fake Online Re- views: A Unified Detection Model Using Deception Theories," in IEEE Access, vol. 10, pp. 128622- 128655, 2022, doi: 10.1109/AC- CESS.2022.3227631.

[13] S. Ni, J. Li and H. -Y. Kao, "MVAN: Multi-View Attention Networks for Fake News Detection on Social Media," in IEEE Access, vol. 9, pp. 106907-106917, 2021, doi: 10.1109/ACCESS.2021.3100245.

[14] Z. Shahbazi and Y. -C. Byun, "Fake Media Detection Based on Natural Language Processing and Blockchain Approaches," in IEEE Access, vol. 9, pp. 128442- 128453, 2021, doi: 10.1109/AC- CESS.2021.3112607.