Framework for Pixel-Level Image Forgery Localization and Classification: A UNet and EfficientNet-B0 Approach

doi:https://doi.org/10.5281/zenodo.19945538

Volume 15, Issue 04 (April 2026)

Framework for Pixel-Level Image Forgery Localization and Classification: A UNet and EfficientNet-B0 Approach

DOI : https://doi.org/10.5281/zenodo.19945538

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 13
Authors : Shravani Nilesh Patil, Vibhavari Yashwant Chandrachud, Nikhil Sanjay Gaikwad, Rohan Saidas Rathod
Paper ID : IJERTV15IS042859
Volume & Issue : Volume 15, Issue 04 , April – 2026
Published (First Online): 01-05-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Framework for Pixel-Level Image Forgery Localization and Classification: A UNet and EfficientNet-B0 Approach

Shravani Nilesh Patil, Vibhavari Yashwant Chandrachud, Rohan Saidas Rathod, Nikhil Sanjay Gaikwad

Department of Computer Engineering Savitribai Phule Pune University (SPPU), Nashik, India

Abstract – In todays world it is very easy to manipulate images with software and artificial intelligence. This means that we can no longer trust what we see in pictures and videos. Digital image forgery-ranging from semantic alterations like splicing and copy-move to the generation of entirely synthetic deepfake content-poses a critical threat to forensic science, judicial integrity, and public trust. This paper presents a comprehensive, multi-tiered forensic system designed to address the dual challenges of binary classification and precise localization. Our methodology integrates a hy-brid deep learning architecture: a UNet-based segmentation engine for pixel-level localization and an EfficientNet-BO backbone for multi-class forgery identification. To make the system more sensitive to subtle changes, we implemented an Error Level Analysis(ELA) preprocessing stage. This helps the model see inconsisten-cies in JPEG Compression that are usually invisible to the human eye. We have trained our model using transfer learning on various datasets such as CASIA and CoMoFoD, to ensure that it can handle various types of tampering such as splicing, copy-move or AI-generated edits. To make the research practical we also built a full stack web application. We include AI outputs through GRAD-CAM heatmaps, which allow investigators to see exactly what led the model to its conclusion. Our analysis of sixteen contemporary studies shows that combining pixel-wise segmen-tation with optimized compound scaling provides a significantly stronger defense

against modern digital deception than traditional methods.

Keywords: Image Forgery Detection, Digital Forensics, Deep Learning, UNet, EfficientNet-B0.

Introduction

Visual information is the backbone of how we communicate today. We trust what we see in news reports, social media or in legal proceedings. However that trust is being challenged by the ease with which digital images can be manipulated. Whether its simple copy-move edit to hide an object or a complex splicing task designed to alter the

semantic meaning of a scene, image forgery is now a major challenge in digital forensics. More recently, the emergence of AI-generated content and deepfakes has complicated the field further, as these images are often times not detected by traditional methods.

Historically, forensic experts relied on handcrafted features, such as analyzing sensor noise patterns or JPEG compression. While these methods were brilliant in their way, they are often too fragile for todays forgeries. If a forger applies a simple anti-forensic measure like slight blur-those traditional methods often disappear. Our research rec-ognized that we can no longer on these features. Instead, we turned to deep learning. Convolutional Neural Networks(CNNs) are capable of learning the complex, non linear rules of a natural photograph, allowing them to spo even the most subtle pixel-level anomalies.

For a professional investigator, just knowing an image is fake is not enough; they need to see exactly where the tampering occurred. To address this, we integrated UNet, a model originally designed for medical image segmentation. Its symmetric architecture allows it to capture the global context of a photo while maintaining high-resolution detail, resulting in a precise tamper mask.

Our project is more than just a mathematical model, its functional, full-stack ap-plication utilizing Flask to build a secure backend, SQLite to manage data, creating an environment where an investigators can upload a file and receive a report. By incorpo-rating Grad-Cam and Error Level Analysis, we ensure our system is a transparent tool that can explain its findings. This paper provides a deep analysis of current literature, explains our hybrid methodologies and discusses the results of our implementation.

Literature Review

The field of digital forensics has evolved rapidly with the advent of Convolutional Neural Networks(CNNs). We conducted a thorough review of sixteen key research contributions to understand the current technological landscape and identify landscape and identify research gaps.

Technical Synthesis

The shift towards deep kearning is well-documented in the survery by zanardelli et al.(2023), who identified that CNNs are now the only viable way to keep up with GAN-generated forgeries. A breakthrough in efficiency was provided by Korsipati et al. (2025), who utilized EfficientNetV2 to achieve near-perfect results on benchmark datasets, prov-ing that attention-based models are superior for localized tampering.

A recurring theme in our reviewed papers is the challenge of data scarcity. High-quality, labeled forgery datasets are hard to come by. Researchers such as Jonker et al. (2024) and Qazi et al. (2022) successfully addressed this through Transfer Learning. By taking a model already trained on millions of natural images (ImageNet) and fine-tuning it for forensics, they achieved higher accuracy than training from scratch. Our project adopts this strategy to ensure our system remains robust even when faced with unseen image sources.

Author (Year)	Core Method-ology	Datasets Used	Key Find-ings	Research Gaps
Jonker et al. (2024)	Transfer Learn-ing	Multimedia	High precision in edits	Dataset speci-ficity
Korsipati et al. (2025)	EfficientNetV2 + SE	CASIA/NIST	AUC up to 1.000	Computationally heavy
Qazi et al. (2022)	CNN-based Deep Learning	Benchmark sets	Effective de-tection	Single forgery focus
Shallal et al. (2025)	Copy-Move Overview	Diverse sets	Localization priority	Compression issues
Khalaf et al. (2024)	CNN + Blockchain	Public datasets	Integrity tracking	High infras-tructure cost
Liang et al. (2025)	Soft Contrastive Loss	Real/Fake mixed	Robust against GANs	Needs real traces
Rehman et al. (2025)	CNN + SVM Hybrid	Patch-based	Stable deci-sion	High training time

Table 1: Comparative Literature Review

Methodology

Our research focused on building a multi-tier pipeline that combines traditional forensic preprocessing with modern deep learning. We prioritized explainability and localiza-tion as the core features of the system.
1. Error Level Analysis (ELA) Preprocessing
  
  The first line of defense in our system is ELA. When a JPEG image is saved, the entire frame is compressed at a uniform rate. If a forger inserts an object and saves it again, that new part will have a different compression history. We calculate the absolute difference
  
  i,j
  
  etween the original pixel Pi,j and a version re-saved at a known quality P (4):
  
  i,j
  
  Ei,j = |Pi,j P | ×
  
  The resulting ELA map provides a high-frequency visualization where tampered re-gions appear noticeably brighter, serving as a critical feature for our neural networks.
2. UNet Localization Engine
  
  To solve the localization problem, we used UNet. We designed the architecture with a contracting path to extract feature maps and a symmetric expanding path that uses skip connections to reconstruct the tampered region. This allows the model to map features xi directly to the output yup (13):
  
  yup = Concat(fup(xi), xskip)
  
  This skip-connection mechanism ensures that the boundary of the forgery remains sharp and accurate in the final output mask.
3. EfficientNet-B0 Classification
  
  For identifying the type of forgery, we chose EfficientNet-B0. Its core strength is Com-pound Scaling, which scales the depth (d), width (w), and resolution (r) using a single coefficient (12):
  
  d = , w = , r =
  
  This mathematical optimization allows the model to capture fine-grained textures while remaining lightweight enough to run on a standard web server.
4. Patch Based Analysis
  
  High-resolution images often mask tiny forgeries during the downsampling phase of a CNN. To combat this, we implemented a Patch-Based Detection strategy (10). We split the image into 224 × 224 patches and analyze each individually. This ensures that a localized editlike a small face swapis treated with the same importance as a large-scale background change.
System Architechture

To transition our theoretical models into a practical tool, our team designed a stream-lined, modular architecture that facilitates real-time forensic auditing. The architecture, illustrated in Fig. 1, is structured to handle the journey of a digital image from the users browser to our deep learning engine with minimal latency.The workflow begins at the Web Browser layer, where we implemented a React-based interface to manage image uploads via HTTP POST requests. Once the Flask Backend receives the file, it initiates a validation sequence to ensure data integrity. The heart of our system lies in the preprocessing and inference pipeline.

Figure 1: Proposed Architecture of Image Forgery Detection System using Transfer Learning and Web Deployment
- Data Transformation: The raw image is resized to a standardized 224 × 224 res-olution and converted into a multi-dimensional tensor. This normalization ensures
  
  that the model focuses on structural anomalies rather than variations in lighting or resolution.
- The EfficientNet-B0 Engine: We utilized a pre-trained ImageNet base, where the initial convolutional layers remain frozen to preserve foundational feature ex-traction capabilities. To prevent overfitting, we integrated a Dropout layer (0.5) before the final sigmoid output, which generates a raw prediction score. T
- Thresholding and Output: Our team implemented a Prediction Engine with a threshold of 0.5. If the score exceeds this value, the image is flagged as Tampered [!], accompanied by a confidence percentage.
Finally, the result is pushed back to the users dashboard and logged into our deployment environment. This architecture was intentionally kept lean to allow for deployment on cloud platforms like Hugging Face, ensuring that our forensic tool remains accessible and responsive for real-world use.
Implementation

The transition from a theoretical framework to a functional, field-ready forensic tool represented the most intensive phase of our research. Our team realized early on that even the most mathematically sound model is of little value to a forensic investigator if it remains inaccessible within a localized development environment. Consequently, we prioritized building a human-centric pipeline that balances high-computational demand with user-end simplicity.
1. Core Software Ecosystem and Backend Logic
  
  Our foundation is built upon a Python-based Flask backend, chosen for its modularity and surgical precision in managing memory-intensive model inferences. For data persistence, we integrated SQLAlchemy with a SQLite database, which acts as a forensic ledger for storing metadata and historical results. To ensure session integrity, our team implemented Flask-Bcrypt for secure hashing and Flask-Login, ensuring that sensitive forensic data remains protected during investigation.
2. Deployment of EfficientNet-B0 and Transfer Learning
  
  The brain of our system is the EfficientNet-B0 model, implemented via PyTorch. We selected this architecture for its revolutionary Compound Scaling, which balances depth and resolution more effectively than traditional CNNs (11). By utilizing Transfer Learn-ing, our team fine-tuned weights pre-trained on ImageNet using forensic datasets like CASIA and CoMoFoD. This approach allowed us to achieve high accuracy with a signif-icantly lower computational footprint.
3. Frontend Modernization and Explainability
  
  We transitioned to a React-based frontend to deliver a glassmorphism aesthetic, mirror-ing the sophisticated nature of AI research. This interface communicates asynchronously with the Flask API, providing real-time updates as the system generates Grad-CAM
  
  heatmaps (12, 13). By combining these heatmaps with UNet masks, we provide a three-point verification system. This ensures the AI is not a black box but a transparent tool that shows investigators exactly where and why an image was flagged.

Results

Our team conducted extensive testing using the CASIA, CoMoFoD, and Coverage datasets to evaluate the systems ability to both classify forgeries and localize tampered regions. The results highlight a significant performance gain achieved by combining Error Level Analysis (ELA) with our hybrid deep learning architecture.

Performance Metrics

To provide a comprehensive evaluation, we measured the system across four key metrics: Accuracy, Precision, Recall, and the F1-Score.

Forgery Category	Accuracy (%)	Precision	Recall	F1-score
Authentic	98.2	0.98	0.99	0.98
Copymove	94.5	0.94	0.93	0.93
Splicing	95.8	0.95	0.96	0.95
Retouching	92.1	0.91	0.90	0.90

Table 2: Performance Metrics for Different Forgery Categories

Impact of Error Level Analysis (ELA)

Our team realized that the models ability to detect unseen forgeries improved drasti-cally when ELA was utilized as a preprocessing step.
- Without ELA: The model struggled with high-quality splices, often dropping below 85
- With ELA:The detection of compression inconsistencies allowed the model to maintain an accuracy of 95
Localization Accuracy (UNet Segmentation)

The UNet model was evaluated based on its ability to generate accurate binary masks. We utilized the Intersection over Union (IoU) metric to determine how well the predicted mask overlapped with the actual ground truth of the forgery.
- Average IoU Score:0.89 across all tested datasets.
- Feature Extraction:The system successfully identified tampered regions as small as 16*16 pixels, proving the effectiveness of the Patch-Based Detection strategy.
Qualitative Analysis via Grad-CAM

The implementation of Grad-CAM allowed our team to visually verify the models focus.
- True Positives:In 96
- Explainablity:The heatmaps revealed that for AI-generated images, the model focuses on ringing artifacts around edges, whereas for copy-move forgeries, it focuses on statistical pixel repetition (12, 13).

Discussion

The performance of our hybrid framework highlights a critical shift in digital forensics: moving away from black box models toward transparent, multi-tier systems. Our team realized that while deep learning is powerful, it requires a human-in-the-loop approach to be truly effective in a forensic context.
1. Overcoming the Generalization Gap: A primary challenge we faced was the perfor-mance dip when moving from benchmark datasets like CASIA to real-world images from WhatsApp or Instagram. These platforms apply aggressive re-compression that acts as a natural anti-forensic layer. To counter this, we utilized heavy Data Augmentation, simulating JPEG noise and blur during training. This forced the EfficientNet-B0 model to ignore surface-level artifacts and focus on deeper struc-tural inconsistencies (1, 4).
2. The Hardest Classes: Copy-Move vs. Splicing: Our results showed that Copy-Move forgeries remain the most difficult to detect because the lighting and noise patterns match the original scene. By implementing Weighted Cross-Entropy Loss, we forced the model to penalize errors on these hard classes more heavily. This strategic adjustment significantly boosted our recall rates for seamless manipulations that traditional CNNs often overlook (11).
3. Transparency and Forensic Trust: The integration of Grad-CAM and ELA was not just a technical addition but a necessity for explainability. We found that in 96 percent of cases, the Grad-CAM heatmaps aligned perfectly with the UNet masks, proving the model was focusing on actual tampering rather than background noise (12, 13). This three-point verification allows an investigator to see the original image, the compression error, and the AIs logic simultaneously, building the trust required for forensic reporting.
Conclusion and Future Work

The culmination of this research demonstrates that the fight against digital deception requires a multi-tier approach that respects both traditional forensic mathematics and modern deep learning. Our team successfully built a system that does not merely label an image as fake but instead provides a comprehensive, explainable narrative of the forgery. By integrating EfficientNet-B0 for high-speed classification and UNet for precise, pixel-level localization, we have developed a framework that bridges the gap between laboratory research and real-world forensic utility.

Our implementation of Error Level Analysis (ELA) proved to be a critical forensic signal, allowing our models to detect inconsistencies in the noise floor that standard RGB analysis would have overlooked. Furthermore, the inclusion of Grad-CAM heatmaps ensures that our system remains transparent, providing investigators with the why behind every what. We realized through this project that as manipulation tools become more democratic, our detection tools must become more intuitive and human-centered.

Moving forward, our team has identified several key areas to evolve this research:

Integration of Vision Transformers (ViTs): We plan to explore ViT archi-tectures to better capture global semantic inconsistencies, which are crucial for identifying Deepfakes where local pixel noise might be perfectly mimicked.
Blockchain-Based Integrity: Future iterations will aim to integrate a decentral-ized blockchain registry to allow verified authentic images to be fingerprinted on an immutable ledger (9).
Video Forensic Expansion: The logic used in our frame-by-frame analysis can be extended to digital video to combat the rising threat of temporal inconsistencies in deepfake videos.
Edge Deployment: We are looking into further model optimization using Ten-sorRT, allowing the forensic engine to run locally on mobile devices without needing a constant cloud connection.

Acknowledgment

The authors thank their institution for support.

References

S. Jonker, M. Jelstrup, W. Meng, and B. Lampe, Detecting Post Editing of Multi-media Images using Transfer Learning and Fine Tuning, ACM Trans. Multimedia Comput. Commun. Appl., vol. 20, no. 6, June 2024.
Y. Zhang, N. Chen, S. Qi, M. Xue, and Z. Hua, Detection of Recolored Image by Texture Features in Chrominance Components, ACM Trans. Multimedia Comput. Commun. Appl., vol. 19, no. 3, May 2023.
M. Zanardelli, F. Guerrini, R. Leonardi, and N. Adami, Image forgery detection: a survey of recent deep-learning approaches, Multimedia Tools and Applications, vol. 82, pp. 1752117566, 2023.
J. R. Korsipati, R. M. R. Yanamala, A. Pallakonda, R. D. Amar Raj, and K. K. Prakasha, Multi-resolution transfer learning for tampered image classification using SE-enhanced fused-MBConv and optimized CNN heads, Scientific Reports, vol. 15, no. 32717, 2025.
E. U. H. Qazi, T. Zia, and A. Almorjan, Deep Learning-Based Digital Image Forgery Detection System, Appl. Sci., vol. 12, no. 6, p. 2851, 2022.
I. Shallal, L. R. Haddada, and N. E. B. Amara, Image Forgery Detection with Focus on Copy-Move: An Overview, Real World Challenges and Future Directions, Appl. Sci., vol. 15, no. 21, p. 11774, 2025.
E. U. H. Qazi, T. Zia, M. Imran, and M. H. Faheem, Deep Learning-Based Digital Image Forgery Detection Using Transfer Learning, Intell. Autom. Soft Comput., vol. 38, no. 3, 2023.
R. Joshi et al., Forged image detection using SOTA image classification deep learn-ing methods for image forensics with error level analysis, in Proc. 13th ICCCNT, 2022.
L. I. Khalaf et al., Image Forgery Detection using Convolutional Neural Networks and Blockchain Technology, in Proc. Cognitive Models and Artif. Intell. Conf., Is-tanbul, May 2024.
Z. Liang et al., Transfer Learning of Real Image Features with Soft Contrastive Loss for Fake Image Detection, arXiv preprint arXiv:2403.16513v2, 2025.
M. Tan and Q. V. Le, EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, in Proc. 36th Int. Conf. Mach. Learn., Long Beach, CA, 2019.
R. R. Selvaraju et al., Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization, in Proc. IEEE Int. Conf. Comput. Vis., pp. 618626, 2017.
O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, in Proc. MICCAI, Springer, pp. 234241, 2015.

Framework for Pixel-Level Image Forgery Localization and Classification: A UNet and EfficientNet-B0 Approach

Implementation

Discussion

Conclusion and Future Work