AI-Based Plagiarism Detection System

Faizan Shaikh; Shakila Siddavatam

doi:10.17577/IJERTCONV14IS020037

NCRTCS - 2026 (Volume 14 – Issue 02)

AI-Based Plagiarism Detection System

DOI : 10.17577/IJERTCONV14IS020037

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 90
Authors : Faizan Shaikh, Shakila Siddavatam
Paper ID : IJERTCONV14IS020037
Volume & Issue : Volume 14, Issue 02, NCRTCS – 2026
Published (First Online) : 21-04-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

AI-Based Plagiarism Detection System

Faizan Shaikh

Masters Student Department of Computer Science

Abeda Inamdar Senior College Pune, India

Shakila Siddavatam

Head of Department of Computer Science Abeda Inamdar Senior College

Pune, India

Abstract – Plagiarism has become a serious concern in academic, research, and digital content environments due to the rapid growth of online information and the ease with which content can be reused. Most existing plagiarism detection tools rely on traditional techniques such as keyword matching and surface-level text comparison. While these methods are effective in detecting direct copying, they often fail to identify deeper forms of plagiarism where the original meaning remains unchanged despite modified wording. This limitation reduces the reliability of plagiarism evaluation and raises concerns related to academic integrity. This study presents an AI-based plagiarism detection system that focuses on analyzing the meaning of text rather than matching exact words. A design and development research approach was followed to build a system capable of comparing text at the sentence level across multiple languages. The system represents sentences in a shared semantic space, enabling meaningful comparison of content without relying on translation- based techniques. Experimental results show that the proposed system effectively detects paraphrased and cross-language plagiarism that conventional tools are unable to identify. The system highlights similar sentences and generates measurable outputs such as plagiarism percentage and unique content percentage. Performance evaluation indicates that the system operates efficiently on standard computer hardware, making it suitable for use in academic institutions. The study concludes that meaning-based plagiarism detection provides more accurate and reliable results than traditional keyword-based approaches. Future work may include expanding the reference dataset and integrating online source comparison to further improve detection accuracy.

Keywords Semantic Plagiarism Detection, Artificial Intelligence, Text Similarity, Multilingual Text Analysis, Sentence-Level Similarity Analysis

INTRODUCTION

In recent years, writing original academic content has become more difficult because most information is easily available online. Students and researchers regularly consult digital sources for understanding concepts, which sometimes results in ideas being reused without proper awareness of originality requirements. This has made plagiarism a frequent concern in academic work. Existing plagiarism detection tools mostly focus on matching words or sentence patterns. Such tools can identify copied text, but they struggle when the same idea is expressed in a different way. Content that is paraphrased, rearranged, or translated often passes undetected, even though

the meaning remains unchanged. To handle this limitation, plagiarism detection needs to focus on understanding meaning instead of comparing words. With recent progress in artificial intelligence, it is now possible to analyze text at a semantic level. Based on this idea, the proposed work develops an AI- based plagiarism detection system that identifies semantic and cross lingual plagiarism more reliably.

Addressing the weaknesses of conventional plagiarism detection systems requires a shift from simple text matching toward deeper interpretation of content. Recent progress in artificial intelligence has enabled techniques that examine contextual meaning and relationships between words and sentences. Such approaches make it possible to recognize similarity even when ideas are rewritten, reorganized, or expressed in another language. Inspired by these advancements, this work proposes an AI-driven plagiarism detection system that emphasizes semantic understanding and cross-lingual comparison. The objective is to deliver accurate plagiarism identification while ensuring that the system remains efficient and suitable for everyday academic use on standard computing platforms.

Problem Statement

Checking the originality in todays world has become challenging because the same idea can be expressed in multiple ways. Writers can rewrite information after referring to different sources, which changes the wording but not the underlying meaning. In such cases, plagiarism may still exist even though the text does not appear identical. Most existing detection tools depend on the factors like word matching, Basic Text Comparison techniques, these techniques will fail when content is paraphrased or translated to other languages, As a result, Such tools will often generate inaccurate and unreliable reports. Therefore, there is a need for an efficient tool for plagiarism detection which can focus on meaning-based analysis, can detect multiple languages and also should be lightweight enough for the standard computing machines.
Significance

This study is significant as it addresses the limitation of traditional plagiarism detection tools in identifying paraphrased

content where the meaning remains unchanged. By focusing on meaning-based analysis rather than keyword matching, the proposed approach improves plagiarism detection accuracy and supports fair academic evaluation.

Proposed Solution

The proposed solution is an AI-based plagiarism detection system that analyzes text based on semantic meaning rather than exact word matching. The system compares sentences using meaning-level representations, allowing it to detect paraphrased and restructured content accurately. It also supports multilingual text analysis for cross-language plagiarism detection. The approach is designed to be lightweight and efficient, making it suitable for academic use on standard computer systems.

LITERATURE REVIEW

Rou et al. proposed a plagiarism detection approach based on transformer-based deep learning models such as BERT and RoBERTa, focusing on sentence-level semantic representations to identify paraphrased plagiarism more accurately than traditional keyword-based methods [1]. Their results demonstrated improved detection performance; however, the approach required high computational resources, limiting its applicability in lightweight academic environments.

Bidgoli, Abdous, and Piroozfar introduced a hybrid method for cross-lingual semantic textual similarity using multilingual transformer models without relying on machine translation [2]. The proposed approach enabled effective similarity detection between sentences written in different languages and achieved high accuracy. Despite its effectiveness, the use of multiple transformer models increased system complexity, making real- time implementation challenging.

Another study by Vyas et al. presented an integrated semantic similarity and plagiarism checking system that combined semantic analysis, real plagiarism detection, online plagiarism detection, and textual entailment within a single framework [3]. The system improved overall plagiarism detection accuracy and provided detailed similarity analysis. However, the reliance on multiple models and features increased computational overhead, highlighting the need for a more efficient and simplified plagiarism detection solution.

2.1 Research Gap

Existing plagiarism detection studies demonstrate improved accuracy using semantic and transformer-based approaches, including cross-lingual analysis. However, most proposed systems are computationally complex and difficult to deploy in lightweight academic environments. There remains a research gap in developing an efficient, meaning-based plagiarism detection system that balances accuracy with practical usablity on standard computing systems.

METHODOLOGY (DEVELOPMENT PROCESS)

Methodology

This study adopts a development-oriented research methodology to design and implement an AI-based plagiarism detection system. The proposed system focuses on identifying plagiarism based on semantic similarity rather than direct word matching. The overall process is organized into sequential stages, including text acquisition, preprocessing, semantic representation, similarity measurement, and result generation. This structured workflow ensures clarity, efficiency, and suitability for academic plagiarism detection tasks.
Input and Text Processing

The process begins with accepting textual input from the user for analysis. The input text undergoes preprocessing to eliminate unnecessary characters, redundant spacing, and formatting inconsistencies. Following preprocessing, the cleaned text is segmented into individual sentences to enable fine-grained analysis. Additionally, the language of the input text is automatically identified, allowing the system to handle multilingual content effectively.
Semantic Analysis

Once preprocessing is completed, each sentence is transformed into a vector-based representation that reflects its semantic meaning. This transformation is achieved using transformer- based sentence embedding techniques. By encoding contextual information, the system can recognize similarities between sentences even when different vocabulary or sentence structures are used, which is essential for detecting paraphrased plagiarism.
Similarity Evaluation and Result Generation

The generated semantic vectors are compared with reference content using cosine similarity to quantify the degree of similarity between sentences. A predefined similarity threshold is used to classify content as plagiarized or original. Based on these comparisons, the system computes plagiarism and originality percentages and generates a comprehensive plagiarism report that is presented to the user.
Database Methodology

The system includes a database component to support structured data management during plagiarism analysis. The database is used to store reference documents required for similarity comparison, as well as generated plagiarism reports and analysis history. This storage mechanism enables efficient retrieval of reference content and maintains result consistency

across multiple analyses. A lightweight database design is adopted to minimize system overhead and ensure smooth operation in standard academic computing environments.

Fig. 1. System Architecture of the Proposed Plagiarism Detection System

4 TECHNOLOGIES USED

The proposed plagiarism detection system is developed using a combination of web technologies and machine learning techniques to support semantic text analysis and system

functionality. The selected technologies facilitate text input handling, backend processing, semantic similarity computation, and data storage within a unified framework. These technologies are chosen to maintain system simplicity while enabling accurate meaning-based plagiarism detection. Table 1 presents the primary technologies employed at different levels of the proposed system.

Category	Technology Used
Frontend Interface	HTML, CSS, JavaScript
Backend Framework	Django
Programming Language	Python
Semantic Modeling	Lightweight Transformer-based Sentence Embedding Model (MiniLM-based)
Similarity Measure	Cosine Similarity
Database	SQLite

Table 1. Technology Stack for AI Plagiarism Detection

User Interface

The proposed plagiarism detection system provides a simple and user-friendly web-based interface to ensure ease of use for academic users. The interface is designed to operate efficiently on desktop and laptop systems, allowing users to perform plagiarism checks without requiring technical expertise. Emphasis is placed on simplicity, clarity, and smooth navigation so that users can easily submit content and view analysis results.
User Interface Overview

The system consists of the following interfaces:

Registration Page Allows new users to create their account. Login Page– Allows registered users to log in securely to access the plagiarism detection system.

Dashboard Page Acts as the main control panel where users can navigate to different features of the system, such as text submission, File Upload, History, Statistics, Help.

Text / File Upload Page Enables users to either paste text directly or upload supported files for plagiarism checking. This page collects the input data and forwards it to the backend for processing.

Result Page Displays the plagiarism analysis results, including plagiarism percentage, unique content percentage, detected language, and highlighted plagiarized text for better understanding.
User InterFace And Screens

Fig. 2. Registration Page

Fig. 3. Login Page

Fig. 4. Dashboard

Fig. 5. Upload Page

Fig. 6. Result Page

DISCUSSION
Integration with Online and Academic Sources In future, the system can be extended to compare input text with online sources and academic repositories to improve plagiarism detection beyond locally stored reference content.

Support for Advanced Models and File Formats

The system can be enhanced by using more advanced transformer models and by supporting additional file formats such as PDF and DOCX for wider academic use.

Scalability and Platform Integration Future versions may focus on cloud deployment and integration with learning management systems to suport multi-user access and large-scale academic environments.
CONCLUSION

This research presented the design and implementation of an AI-based plagiarism detection system that focuses on identifying plagiarism through semantic similarity rather than exact word matching. In modern academic writing, plagiarism frequently appears in the form of paraphrasing or translated content, which reduces the effectiveness of traditional keyword-based detection techniques [1]. The proposed system addresses this challenge by analyzing text at the sentence level

and comparing semantic meaning using transformer-based representations.

The system follows a structured workflow that includes text preprocessing, language identification, semantic embedding generation, and similarity evaluation. By adopting a meaning- based approach, the system is capable of detecting similarity even when sentence structure or wording is modified, thereby improving detection accuracy compared to conventional methods [2].

Another important contribution of this work is its lightweight and practical design, which allows the system to operate efficiently on standard computing environments. The simple web-based interface further supports ease of use in academic settings. Although the system currently relies on local reference datasets and fixed similarity thresholds, it demonstrates effective performance for academic plagiarism analysis. Overall, this research highlights the potential of semantic techniques to enhance plagiarism detection and support academic integrity [3].
REFERENCES

R. Rou, A. S. Stoica, P. S. Popescu, and C. M. Mihescu, NLP based deep learning approach for plagiarism detection, International Journal of User-System Interaction, vol. 13, no. 1, pp. 4860, Jan. 2020.
B. Minaei Bidgoli, M. Abdous, and P. Piroozfar, A hybrid method for

cross-lingual semantic textual similarity, Research Square, Jul. 2023.
K. Vyas, A. Kumar, R. Banerjee, and D. B. Chakraborty, Semantic similarity and plagiarism checker, Research Square, Nov. 2023.

NCRTCS - 2026 (Volume 14 – Issue 02)

AI-Based Plagiarism Detection System

AI-Based Plagiarism Detection System

INTRODUCTION

Problem Statement

Significance

Proposed Solution

LITERATURE REVIEW

2.1 Research Gap

METHODOLOGY (DEVELOPMENT PROCESS)

Methodology

Input and Text Processing

Semantic Analysis

Similarity Evaluation and Result Generation

Database Methodology

4 TECHNOLOGIES USED

User Interface

User Interface Overview

Registration Page Allows new users to create their account. Login Page– Allows registered users to log in securely to access the plagiarism detection system.

Dashboard Page Acts as the main control panel where users can navigate to different features of the system, such as text submission, File Upload, History, Statistics, Help.

Text / File Upload Page Enables users to either paste text directly or upload supported files for plagiarism checking. This page collects the input data and forwards it to the backend for processing.

Result Page Displays the plagiarism analysis results, including plagiarism percentage, unique content percentage, detected language, and highlighted plagiarized text for better understanding.

User InterFace And Screens

Fig. 2. Registration Page

Fig. 3. Login Page

Fig. 4. Dashboard

Fig. 5. Upload Page

Fig. 6. Result Page

DISCUSSION

Strengths of the System

Challenges and Limitations

Future Scope

Integration with Online and Academic Sources In future, the system can be extended to compare input text with online sources and academic repositories to improve plagiarism detection beyond locally stored reference content.

Support for Advanced Models and File Formats

Scalability and Platform Integration Future versions may focus on cloud deployment and integration with learning management systems to suport multi-user access and large-scale academic environments.

CONCLUSION

REFERENCES