🏆
International Scientific Platform
Serving Researchers Since 2012

Fake Review Detection System

DOI : 10.17577/IJERTCONV14IS010094
Download Full-Text PDF Cite this Publication

Text Only Version

Fake Review Detection System

Sonika. L

Student , St Joseph Engineerigng College , Mangaluru

Nishmitha. J

Assistant Professor , St Joseph Engineering College , Mangaluru

Abstract – With the growth of e commerce platforms, reviews submitted online about businesses have become a vital factor impacting consumer decisions and corporate practices. Nevertheless, the increase in fraudulent or deceptive reviews aimed at influencing public opinion and obtaining unfair advantages has raised significant issues regarding integrity and reliability. In this article, we present a system designed to detect false reviews, employing natural language processing (NLP) techniques to tackle this problem. After tokenization, eliminating biases, and refining lemmatization, the system converts the text into a vector using TF IDF (term frequency- inverse document frequency). Systems utilizing artificial neural networks (ANNs) analyze these vectors to determine if a review is genuine or fraudulent. This application operates online, featuring a strong backend that employs MySQL for managing user data and presenting information, complemented by Bacund and TensorFlow Flask for its automated learning functions. Users can register for the application, access the platform, and receive live notifications related to classification. Training for the detection systems uses real-life instances, including Yelp reviews. Experimental results show that combining linguistic and behavioral elements improves the precision of identifying fraudulent reviews. This approach aids businesses in maintaining a strong reputation and boosting consumer confidence by automating the identification of manipulated content. Consequently, it elevates the overall quality and credibility of user generated content, providing e commerce platforms with an effective and adaptive solution to the challenges posed by reviews.

.Keywords: Fake Review Detection, Natural Language Processing (NLP), Artificial Neural Network (ANN), TF-IDF, Machine Learning, E-commerce, Review Fraud, Text Classification, Flask, MySQL, TensorFlow, User-Generated Content, Review Authenticity, Behavioral Analysis, Scalable Web Application

I.INTRODUCTION

In the current digital economy, e-commerce platforms have completely changed the way customers value and buy products and services. This change requires online opinions. This is because it works from digital to mouth and is an important measure of the level of the product, reliability of the service, and the general pleasure of the customer. Customers generally think that these users will generate opinions that are as reliable as their personal recommendations and use them to inspire purchasing decisions. Not only does notifications help businesses to purchase solutions, but also assist them in enhancing their marketing strategies, refining their products, and strengthening their relationships with their customers. However, the serious and expanded issue caused

by the establishment of Internet observation platforms is the widespread false or misleading assessment.

These false notes often created by marketing specialists sponsored by rivals and automated systems are designed to influence public perception. They can unfairly harm the reputation of their competitors and inflate the legitimacy of their products. Research shows that the majority of internet journals are listed on sites such as Yelp.Amazon and Tripadvisor-Can are incorrect and undermine the legitimacy of the landscape in the e-commerce sector. To address this complexity and ensure its reliability, this research is proposed by an automated learning-based web tool that combines examiner behavior with analysis of natural language therapy (NLP).

To clean and normalize the examination of the exam, the system uses common methods of preparing the text, such as integerization, tokenization, and deleting stop words. The text is then converted into a vector of digital properties using the frequency of confidence (TF – IDF) that measures the meaning of terms in different texts. Creates a classifier that uses an artificial neural network (ANN) to identify genuine and fake reviews, based on specific characteristics.This approach stands out as it considers behavioral metadata, such as the duration of views, frequency of publication, and scrutiny, alongside textual features. This enhancement allows the model to more effectively pinpoint subtle trends linked to fraudulent activity. Utilizing Flask as Bacand, the design ensures user authentication, data storage, and the integration of TensorFlow with MySQL processes in a web application format, fully equipped to enable real-time prediction capabilities.This algorithm has been created using real data from platforms such as Yelp Journals and has shown a significant improvement in classification accuracy by combining text analysis with behavioral elements. E-commerce platforms strengthen the reliability of user- generated content, rebuild consumer trust, reduce the impact of fraud through validation in purchasing systems, and encourage the use of trustworthy, scalable, and effective solutions for this issue.

  1. Literature Survey

    The publication was thought to draw attention to the drawbacks of previous systems, including low accuracy, large complexity, and limited use of examiner behavioral variables, focusing on the increasing implications of false identification of e-commerce misrepresentation. To detect rogue models, traditional methods relied primarily on simple analysis of lexical similarity or inadequate moods. Recent research includes automated learning algorithms like KNN, Naive Bayes, and Logistic Regression,

    ,Random Forest, SVM, XGBoost, and AdaBoost. In many cases, user behavior (length, frequency, temporary lamp visualization), TF-ODF vectors, mood, and content validation are used. Some models achieve more than 97% of accuracy, demonstrating deep language properties (e.g. LIWC, POS labels, subjectivity assessments) and have been demonstrated to significantly improve classification efficiency. As part of the discussion of the difficulty of creating a balanced and real

    dataset, the literature also considers the use of midfielders and deep training models (CNN, LSTM).

    Vector Support Machine (SVM) is a reliable classifier for analyzing moods and false reviews, leaving you with continuous research.In addition to acknowledging the growing anxiety about fraudulent or deceptive estimations that affect product reputation, the authors of this study highlight the important role that internet reviews have in the impact of consumption solutions. The cake of previous research The primary section is divided into two parts, according to the literature, text and behavioral components. Text functions analyze journals of linguistic content using methods such as mood, n-gram, and word packs, while behavioral traits focus on important behaviors such as writing style, criticism, and emotional models. SVM, Naive Bayes, KNN, Overall Decision and Approach are one of the controlled autolearning classifiers used in a particular number of publications previously considered, most the focus was initially placed exclusively on content. Nevertheless, subsequent research has shown that incorporating behavioral data significantly improves detection accuracy. The authors show blanks in the literature, as previous research rarely goes deep to extract specific behavioral traits from criticism. Their article fills this gap by developing new behavioral traits and demonstrating beneficial effects on the performance of false reviews combined with traditional text functions.

    The results highlight how multifaceted features and automated learning engineering can separate honest notes from fraud. Their identity verification has become an important theme in the research, as false opinions influence customer trust and company reputation. To detect deceptive content in online views, research evaluated research explores various methods of teaching and mood analysis methodology. Elmurgi and Gherbi (2018) use controlled classifiers, particularly SVM, Naive Bayeses, KNN, KSTAR, and decisions to classify moods at the level of documentary for film visualization sets. They believe that SVM is the most accurate. Logistics regression, Random Forest, SVM Networks, Inaraes, Goel et al. Use of classifiers such as (2021) Combine exam centers (Unigrams, Bigrams, gays, etc.) to improve classification of false reviews (e.g., examiner identifiers, rejection of notation). SVM and deep learning models get excellent results. Kurkute et al. (2020) highlights the advantages of SVM compared to text methods, provides future and multi-ME time orientations, and has a detailed evaluation of detection models. It also highlights the use of SVMs to predict false reviews. Each study highlights the importance of complex engineering for features that cover both textual and behavioral data, and shows that models such as SVM regularly reach great accuracy in distinguishing authenticity and fraud criticism. Every one of these research studies has been demonstrated as the creation of important automated learning – specially controlled methods – to increase the accuracy of false control detection systems.

  2. PROPOSED METHOD

    1. 1.DATA COLLECTION

      The algorithm was trained on a dataset of tagged product reviews that were classified as either authentic or fraudulent. A uniform preprocessing procedure was implemented toeach review in order to guarantee uniformity and reduce noise. To start, all text was changed to lowercase in order to remove case sensitivity. To make the data cleaner, regular expressions were used to eliminate non-alphabetic characters, such as punctuation and numbers. Then, using the NLTK stopword list, popular English stopwords like "the," "is," and "and" were removed they due to having little semantic relevance in categorization. Finally, the WordNet lemmatizer was utilized toapply lemmatization, which reduces words to their root or base form.

      By preserving just the most significant and distinctive tokens, this preprocessing method improves the caliber of features that are recovered for the model training.

      2 .FEATURE EXTRACTION USING TF IDF

      Following preprocessing, the technique of Term Frequency- Inverse Frequency (TF IDF) vectorization was employed to analyze documents.transform the cleaned textual data into numerical form. By giving phrases that are uncommon or unique in the dataset greater weights and decreasing the influence of frequently appearing words with lower discriminatory power,TF IDF successfully captures the relevance of words within individual reviews. Based on linguistic trends, this change enables the model to better distinguish among real and fraudulent reviews. The same feature representation is preserved during the training and deployment stages thanks to the serialization of the trained TF IDF vectorizer using Python's pickle module and its subsequent reloading in the deployed application for real-time input transformation.

      1. MODEL ARCHITECTURE

        TensorFlow and Keras were used to the critiques serve as the foundation for this process. An Artificial Neural Network (ANN) is utilized. An input layer that corresponds to the dimensionality of the TF IDF vectors produced from the preprocessed text is the first component of the model design. The network then learns intricate patterns and nonlinear in the dataset, one or more hidden layers apply ReLU (Rectified Linear Unit) activation functions. The concluding layer functions as a classifier that evaluates data based on a clearly defined database schema table sigmoid activation function to identify if a review is original (OR Genuine) or computer-generated (CG Fake). To guarantee accurate classification, this lays the groundwork for the process dataset and assessed using important performance measures like recall, accuracy, and precision.

      2. WEB APPLICATION DEVELOPMENT

        The Flask framework was used to create an interactive and user- friendly web application for the bogus review detecting system. A secure user registration and login system is part of the application, and user credentials are safeguarded by using the SHA-256 technique to hash passwords before being stored in the database. Users are given a straightforward and easy-to-use interface to provide product reviews for examination after authenticating. After being submitted, the review text is vectorized using the preloaded TF-IDF model after being processed through the same preprocessing workflow as during model training. The trained Artificial Neural Network (ANN), which produces a classification, receives the resultant feature vector. The system then provides the user with real-time feedback via a smooth and safe platform by displaying the result, indicating whether the review is "CG (Computer Generated Fake)" or "OR (Original Generated Genuine)."

      3. DATABASE INTEGRATION

      Within the system, a MySQL relational database is used for user administration and authentication. User ID, username, email, and securely hashed passwords using the SHA-256 algorithm are all stored in the user table of the database schema, which is described in the fake-review.sql file. This methodical technique guarantees that all private information is shielded from unwanted access. By preserving user session data and facilitating secure access management, the database makes possible functions like duplicate registration prevention, login persistence, and logout handling. By facilitating smooth communication between the front end and the backend database, integration with the Flask web framework promotes a dependable and secure user experience.

    2. ARCHITECTURE DIAGRAM

      The diagram for detecting false criticism illustrates a systematic way to find misleading or false journals using automated learning and natural language (NLP) treatment. The dataset containing various user tests these analyses undergo several preprocessing phases, which consist of collecting pertinent data points, and they follow a specified series of preprocessing techniques, including the extraction of the capability to convert textual data into digital formats that are adapted to an auto- learning model (for example, using TF-IDF or word insertion). Uniform or stem to reduce words in the basic or root form. Tokenization that divides text into words or clear tokens. A set of data processed is marked to distinguish between genuine fraudulent criticisms. This procedure may involve manual annotation or using pre-made data.

      The data is then divided into subsets for testing and training, allowing prediction models to be created and evaluated. After analysis and classification of fresh, invisible journals, the formed model produces detection results that classify each review as authentic or fraudulent. This architecture provides a structured conveyor to increase the reliability of online view systems and automate detection of false criticism.

      The type of automated learning model used (such as logistics regression, SVM, depth learning methods such as LSTM and BERT), and the inclusion of performance assessments This encompasses precision and F1. indicators may also be improved.

      FIG 1.ARCHITECTURE DIAGRAM

    3. DATA PREPROCESSING

      A model that guarantees careful pre-training preparation, adjustment, and high-quality entry data was successfully finished wth the aid of the observational text. First,case-based Contradictions were removed by converting tests that were not converted to sub-orders. Regular expressions are used in place of alphabetical symbols such as punctuation, numbers, and special characters to reduce noise. The phrases were then separated into distinct words using tokenization. Stop nltk\list Common English terms with minor meanings are\", \"

      \" IS, \",

      \", \", and so forth.Then use To ensure that the various forms of the same word are processed methodically, WordNet Lemmatizer will light each word in that dictionary or basic form (e.g. \"run\"\"run\"). A Vectorstriser known as Document Frequency (TF-IDF) was then used to transform the cleaned text into a digital vector of sines. By allocating weights to each word based on the frequency of reviews linked to the frequency of all data combined, this method highlights the significance of the term and lessens the influence of words that are used too frequently.

      Following data processing and vectorization, the entry was used to create the artificial neural network model (ANN). To make sure that the views given to actual users were taken into consideration, this permanent pipeline prior to treatment was also incorporated into the comprehensive web application.

    4. MACHINE LEARNING MODELS FOR PREDICTIVE ANALYSIS

    An artificial neural network (ANN), a controlled, automated learning model, is the foundation of predictive search engines. He was trained to distinguish between real and fake online scams. This model uses deep learning techniques to find complex models in textual data. To aid in learning, the input function is extracted from a set of data that has been annotated with genuine fraudulent criticism using TF- IDF (frequency document-term frequency) vectorization.

    The model is able to capture the relative meaning of words in all journals and concentrate on significant portions of the text thanks to this transformation. The dimensions of the TF IDF function are matched by the ANN design's input layer. A final sigmoid activation function is used for output layer binary classification after one or more hidden layers employing the REREAD activation function have been followed. Whether the test was indeed developed by a computer or by the user is determined by the likelihood rating derived from the sigmoid result. We use a gradient-based adaptive optimizer to improve the model after training binary cross-entropy as a loss function in order to achieve effective convergence.After being incorporated into a web platform, this paradigm is used for real-time extraction.

    Every time the user represents the exam, the system performs the same preprocessing and vectorization steps before sending an entry to the artificial neural network (ANN) designed for prediction. An interpretive label that indicates whether the exam is likely authentic or fake will then be provided to users. In terms of KPIs like review, accuracy, and precision, the model's execution was comprehensive. This is a reliable way to quickly validate your research on your ability to differentiate between behavioral scientists and little languages. Because ANNs can identify nonlinear relationships in data, user reviews show how well the system classifies free-form, unstructured text instruments.

  3. RESULT

    Due to false checks, a combination of qualitative research on deployment behavior and quantitative performance indicators were used to validate the effectiveness of the proposed detection system. The marked data is vectorized and preprocessed using TF IDF. The artificial neural network (ANN) was then trained to distinguish between real and fake criticism.The model was formed using carefully tuned hyper-tuning parameters in several eras, resulting in performance continued during the training phase. With 94% and 92% training and check accuracy, Anne, Anne shows almost experience and powerful generalization ability. The quality of that prediction was assessed using a specific number of classification parameters. The model is clear 92% of accuracy, 91% accuracy, 90% reviews, and 90.5% F1 indicators were achieved in a series of tests. These results show that the model accurately distinguishes between incorrect and authentic classes and shows the balance. This was also confirmed by the confusion matrix. This showed that the model correctly predicted most of the two groups with low-speed false positives (false reviews are mistakenly marked as authentic) and false negative speed (false reviews are mistakenly marked as authentic).

    This means that, in addition to identifying misleading information, the model deserves confidence in maintaining authentic comments. This is important to maintain user trust on a real platform. The accuracy and losses of training and validation sets for all eras are presented in curve form .These sites demonstrate the stability of the ANN, indicating that the model formation was generally stable and that the accuracy of training and validation did not differ significantly. The model was integrated into a bottle-based interface that enabled users to make predictions instantly during the web application's testing.In real time, the input data was processed, vectorized and processed. The planned reaction time is less than 1

    second, indicating the applicability of the model to achieve the real world.CG or REAL (generated original – authentic)\" was interpreted when the prediction was presented and allowed the end- user to understand the results. Prediction accuracy is evaluated for system safety and integrity. Thanks to the HHA- 256 hash for storing passwords, I used a secure entry system for systems supported by MySQL databases to control user authentication. To ensure limited access and integrity of data, the prediction interface was available only to authenticated users. The son of the son, the grandeur modifiers, the use of electronics that is less commonplace.

    FIG 2. CONFUSION MATRIX

  4. DISCUSSION

This study shows how a combination of natural learning methods and combination therapy (PNL) can effectively detect information in an online view.Using TF-IDF, we extracted artificial neural functions and networks (ANNs) for classification, and the system was able to accurately separate the true criticism from their fraud.In addition to technical performance, the solution is implemented as a secure web application with user authentication and data protection processed through a MySQL database, providing confidentiality. During experimental testing, the model received solid results in terms of accuracy, accuracy and magazine, demonstrating its ability to reliably detect both false and actual criticism.

The conclusion is simple and clear with simple shortcuts to understand labels such as "CG" or "REAL", and magazine criticism, simplicity and transparency such as"CG".Nevertheless, the quality and distribution of training data always plays a critical role in the overall performance of the system. The current model focuses primarily on text features, so false criticism that carefully mimics actual writing styles can avoid detection. Furthermore, although the system is well-performed on the tested dataset, differences in languages in sectors such as healthcare, finance, hospitality, and other such can reduce its accuracy. To be adaptable for wider use, the model must be thinly recycled or modified using larger and morediversedatasets.Future improvements can enhance classification by incorporating context of action, including examiner history, IP additions, and time markings. This study generally shows that a combination of deep learning, secure web technologies, and natural language processing (NLP) provides a reliable and forward-thinking solution to the growing problems of online disinformation. The modular achitecture not only facilitates future expansion but also offers you flexibility in real- world deployments across various e- commerce platforms.

V.CONCLUSION

This study successfully illustrates how to develop a system that combines TF-IDF vectorization, natural language processing (NLP), and in-depth artificial neural network (ANN) learning to identify fraudulent journals. Using Flask web applications and making sure MySQL authentication is in place, the system can consistently tell the difference between computer products (FALSE) and actual product numbers. "CG" tampering and "REAL" are real, and a tampered, clear, and user-friendly interface and output improve user reliability and system interpretation. Functional testing guarantees that your system makes precise predictions in real time and consistently displays data in the right forms.The platform's modular architecture and evolving architecture integrate into actual trading system systems and content mitigation. To further improve, future work will focus on the inclusion of auxiliary features such as examiner metadata (e.g. temporary brands, profiles, IP magazines) to improve detection of false and complex criticism. The expansion of multilingual support and the subtle frameworks of models for specific language disciplines (e.g., healthcare, finance) expand their applicability. Furthermore, the adoption of more advanced architectures, such as models based on LSTMs and transformers (such as BERT), can improve context understanding. Real-time API integration, toolbar reporting and regular recycling.

VI.REFERENCES

  1. Elmogy, Ahmed M., et al. "Fake reviews detection using supervised machine learning." International Journal of Advanced Computer Science and Applications 12.1 (2021).

  2. Ennaouri, Mohammed, and Ahmed Zellou. "Machine learning approaches for fake reviews detection: A systematic literature review." Journal of Web Engineering 22.5 (2023): 821-848.

  3. Elmogy, Ahmed M., et al. "Fake reviews detection using supervised machine learning." International Journal of Advanced Computer Science and Applications 12.1 (2021).

  4. Alsubari, S. Nagi, et al. "Data analytics for the identification of fake reviews using supervised learning." Computers, Materials & Continua

    70.2 (2022): 3189-3204.

  5. Le, Huy, and Ben Kim. "Detection of fake reviews on social media using machine learning algorithms." Issues in Information Systems 21.1 (2020): 185-194.

  6. Elmurngi, Elshrif, and Abdelouahed Gherbi. "Fake reviews detection on movie reviews through sentiment analysis using

    supervised learning techniques." International Journal on Advances in Systems and Measurements 11.1 (2018): 196-207.

  7. Gadewar, Amol, et al. "Online Fake Review Detection Based on Machine Learning Techniques." (2023).