GAN-based Super-Resolution for Image Enhancement using Multiple Loss Function

doi:https://doi.org/10.5281/zenodo.19594560

Volume 15, Issue 04 (April 2026)

GAN-based Super-Resolution for Image Enhancement using Multiple Loss Function

DOI : https://doi.org/10.5281/zenodo.19594560

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 7
Authors : Ishan Lodwal, Dev Singh Ahluwalia, Ananya Khurana, Sherry Verma
Paper ID : IJERTV15IS040757
Volume & Issue : Volume 15, Issue 04 , April – 2026
Published (First Online): 15-04-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

GAN-based Super-Resolution for Image Enhancement using Multiple Loss Function

Ishan Lodwal, Sherry Verma, Dev Singh Ahluwalia, Ananya Khurana

School of Engineering and Technology Sushant University

Gurgaon, India

Abstract – Single-image super-resolution (SISR) aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs and is critical for applications such as medical imaging, surveillance, and remote sensing, where loss of fine details can significantly impact downstream tasks. However, conventional interpolation-based methods fail to recover high- frequency textures, resulting in overly smooth and visually degraded outputs. To address this limitation, this paper proposes a Generative Adversarial Network (GAN)-based 4× super- resolution framework inspired by ESRGAN, incorporating a Residual-in-Residual Dense Block (RRDB)-based generator and a PatchGAN discriminator to enhance texture reconstruction and perceptual realism. A hybrid loss function combining Charbonnier, perceptual, edge-aware, SSIM, and adversarial losses is designed to jointly optimize pixel-level accuracy, structural consistency, and visual fidelity, thereby overcoming the limitations of single-objective optimization. Experimental evaluation on benchmark datasets including Set5, Set14, BSD100, Urban100, General100, and Manga109 demonstrates that the proposed method achieves competitive perceptual performance, particularly excelling in reconstructing high-frequency and structured patterns, validating the effectiveness of the proposed multi-loss GAN framework.

Keywords – Super-resolution, GAN, RRDB, perceptual loss, image enhancement.

INTRODUCTION

Images are often captured or stored at low resolutions due to limitations in imaging devices, bandwidth constraints, and storage requirements. Enhancing such images is critical for numerous applications including medical diagnostics, satellite imaging, forensic analysis, and video streaming.

Traditional interpolation techniques such as nearest-neighbor, bilinear, and bicubic interpolation are computationally efficient but inherently incapable of reconstructing fine textures and structural details, leading to blurred outputs [1].

Deep learning approaches, particularly convolutional neural networks (CNNs), have significantly improved super- resolution performance by learning complex mappings from LR to HR images [1]. Furthermore, Generative Adversarial Networks (GANs) enhance perceptual quality by generating realistic textures and high-frequency details [2], [3].

In this work, a GAN-based 4× super-resolution framework is proposed to improve image quality while preserving texture consistency and structural integrity.

Contributions:-

The main contributions of this paper are as follows:
1. Design of an RRDB-based generator integrated with a PatchGAN discriminator.
2. Formulation of a hybrid loss function combining multiple perceptual and structural objectives.
3. Extensive evaluation on multiple benchmark datasets.
4. Comparative analysis with state-of-the-art super- resolution models.
LITERATURE REVIEW

Recent advancements in single-image super-resolution (SISR) have been largely driven by deep learning-based approaches, particularly convolutional neural networks (CNNs) and generative adversarial networks (GANs). Early methods primarily focused on minimizing pixel-wise reconstruction error, while later approaches emphasized perceptual quality and texture realism.

Dong et al. [1] introduced SRCNN, the first end-to-end CNN- based super-resolution model, which significantly improved reconstruction accuracy in terms of PSNR. However, due to its reliance on pixel-wise loss functions, the generated images often lacked high-frequency details and appeared overly smooth.

To address the limitations of pixel-based optimization, Ledig et al. [2] proposed SRGAN, which incorporated adversarial learning and perceptual loss functions. This approach enabled the generation of sharper and more visually realistic images, although it introduced challenges such as training instability and the presence of artifacts.

Building upon this, Wang et al. [3] developed ESRGAN, which improved training stability and perceptual quality through architectural enhancements such as Residual-in- Residual Dense Blocks (RRDB) and refined adversarial

learning strategies. ESRGAN achieved state-of-the-art perceptual performance and serves as a strong baseline for modern super-resolution methods.

In parallel, Lim et al. [4] proposed EDSR, which focused on maximizing PSNR using deep residual networks. While EDSR demonstrated strong quantitative performance, it lacked perceptual realism due to the absence of adversarial training.

Additionally, Johnson et al. [5] introduced perceptual loss functions based on deep feature representations, enabling better texture reconstruction, while Zhang et al. [6] proposed LPIPS as a perceptual similarity metric aligned with human visual perception.

Despite these advancements, existing methods often struggle to simultaneously optimize pixel-level accuracy, structural consistency, and perceptual realism. Motivated by this limitation, the proposed approach builds upon ESRGAN [3] by incorporating additional structural constraints, including edge-aware and SSIM losses, to enhance texture preservation and overall image quality.
METHODOLOGY
1. System Overview
  
  The proposed framework follows a GAN-based architecture in which the generator produces super-resolved images from LR inputs, and the discriminator distinguishes between real HR images and generated images. This adversarial process encourages the generation of perceptually realistic outputs.
2. Dataset Preparation
  
  The model is trained on the DIV2K dataset (800 images) and Flickr2K dataset (2650 images), which were obtained via publicly available repositories on the Kaggle platform. Evaluation is conducted on standard benchmark datasets including Set5, Set14, BSD100, Urban100, General100, and Manga109.
  
  Low-resolution images are synthetically generated using bicubic downsampling along with additional degradations such as Gaussian blur, noise injection, and JPEG compression to simulate real-world scenarios.
3. Network architecture
  1. Generator
    
    The generator is based on RRDB blocks, consisting of:
    - Initial convolution layer
    - 16 RRDB blocks
    - PixelShuffle upsampling layers
    - Final reconstruction layer
    RRDB combines dense connections and residual learning to stabilize training and improve feature representation [3].
  2. Discriminator
    
    A PatchGAN-based discriminator is employed, consisting of convolutional layers with spectral normalization. It outputs a probability map that evaluates local image patches, thereby enforcing high-frequency realism.
4. Loss function formulation
  
  To achieve both perceptual quality and structural accuracy, a hybrid loss function is used.
  1. Charbonnier Loss
    
    The Charbonnier loss is a robust variant of the L1 loss that reduces sensitivity to outliers and ensures stable pixel-levelreconstruction.
  2. Perceptual Loss
    
    The perceptual loss computes the difference between high- level feature representations extracted from a pretrained VGG network, enabling better texture and detail reconstruction.
    
    Fig. 1. Overall system pipeline of GAN-based super- resolution model
  3. Edge-Aware Loss
    
    The edge-aware loss enforces consistency in image gradients, helping preserve sharp edges and structural boundaries in the reconstructed image.
  4. SSIM Loss
    
    The SSIM loss measures structural similarity between images, ensuring preservation of luminance, contrast, and structural information.
  5. Adversarial Loss
    
    The adversarial loss encourages the generator to produce realistic images by minimizing the difference between generated and real image distributions.
  6. Total Loss Function
  BSD100
  
  25.54
  
  0.6774
  
  0.2477
  
  Urban100
  
  23.72
  
  0.9452
  
  0.1797
  
  Manga109
  
  27.37
  
  0.9681
  
  0.0789
  
  E. Training Details
  
  General100
  
  28.65
  
  0.8631
  
  0.1235
  
  The total loss is a weighted combination of all individual loss components, balancing pixel accuracy, perceptual quality, and structural fidelity.
  1. Epochs: 600
  2. Batch size: 16
  3. Adversarial training: Introduced after 10,000 training steps and gradually increased over 80,000 steps using step- based scheduling
  4. Loss weighting: Dynamic weighting strategy where pixel loss weight decreases to 0.15, perceptual loss increases up to 0.6, adversarial loss increases up to 3 × 10³, while edge- aware and SSIM losses are fixed at 0.05
  5. Discriminator training: Updated after adversarial phase begins and at alternating mini-batch intervals with label smoothing (real: 0.81.0, fake: 0.00.2)
  6. Evaluation metrics: PSNR, SSIM, and LPIPS [6]
  7. Model selection: Best model selected based on lowest validation LPIPS
  8. Checkpointing: Model checkpoints saved every 10 epochs, with additional saving of the best-performing model
RESULTS AND DISCUSSION
Dataset

Our Model

ESRGAN

MOBOSR

Set5

29.07

29.80

30.69

Set14

26.35

25.51

26.81

Urban100

23.72

23.74

24.49

BSD100

25.54

25.21

26.04

Tables II and III compare the LPIPS performance of the proposed model with ESRGAN [1] and MOBOSR [2], where ESRGAN serves as a widely adopted GAN-based baseline and MOBOSR represents a recent optimization-based approach.

Table II. Comparison of LPIPS with ESRGAN and MOBOSR

Dataset

Our Model

ESRGAN

MOBOSR

Set5

0.1058

0.0750

0.0745

Set14

0.1854

0.1341

0.1359

Manga109

27.37

27.36

28.21

General100

28.65

28.86

29.66

Table III. Comparison of PSNR with ESRGAN and MOBOSR

The proposed method achieves competitive LPIPS scores compared to ESRGAN and MOBOSR [3], particularly excelling on Manga109 and Set5 datasets. While slightly lower in some natural datasets, the performance remains within acceptable industry benchmarks.

The quantitative comparison of the proposed model with ESRGAN and MOBOSR on standard benchmark datasets is presented in Table 3. The evaluation is conducted using PSNR (×4) on Set5, Set14, Urban100, BSD100, Manga109, and General100 datasets. As observed, the proposed model achieves competitive performance, outperforming ESRGAN on Set14, BSD100, and Manga109, while maintaining comparable results on Set5 and Urban100. However, MOBOSR consistently achieves the highest PSNR across all datasets, indicating its superior reconstruction capability. Despite this, the proposed model demonstrates balanced performance across diverse datasets, highlighting its effectiveness and generalization ability for image super- resolution tasks.
CONCLUSION AND FUTURE WORK

This paper presented a GAN-based 4× super-resolution framework utilizing an RRDB generator and PatchGAN discriminator with a hybrid loss function. Experimental results demonstrate strong perceptual performance, particularly for high-frequency datasets.

Future work includes:

Exploring transformer-based architectures such as SwinIR
Integrating diffusion-based models
Deploying the system for real-time applications

REFERENCES

C. Dong et al., Learning a Deep Convolutional Network

for Image Super-Resolution, ECCV, 2014.
C. Ledig et al., Photo-Realistic Single Image Super-

Resolution Using a GAN, CVPR, 2017.
X. Wang et al., ESRGAN: Enhanced Super-Resolution GANs, ECCVW, 2018.
B. Lim et al., Enhanced eep Residual Networks for SISR, CVPRW, 2017.
J. Johnson et al., Perceptual Losses for Real-Time Style Transfer and Super-Resolution, ECCV, 2016.
R. Zhang et al., The Unreasonable Effectiveness of Deep Features as a Perceptual Metric, CVPR, 2018.
E. Agustsson and R. Timofte, NTIRE 2017 Challenge on

Single Image Super-Resolution, CVPRW, 2017
X. Zhang et al., "Perceptual-Distortion Balanced Image Super-Resolution," arXiv:2409.03179, 2024.

	BSD100	25.54	0.6774	0.2477
	Urban100	23.72	0.9452	0.1797
	Manga109	27.37	0.9681	0.0789
E. Training Details	General100	28.65	0.8631	0.1235

Dataset	PSNR (dB)	SSIM	LPIPS
Set5	29.07	0.8547	0.1058
Set14	26.35	0.8422	0.1854

Urban100	0.1797	0.1228	0.1324
BSD100	0.2477	0.1617	0.1719
Manga109	0.0789	0.0647	0.0675
General100	0.1235	0.0876	0.0894

Dataset	Our Model	ESRGAN	MOBOSR
Set5	29.07	29.80	30.69
Set14	26.35	25.51	26.81
Urban100	23.72	23.74	24.49
BSD100	25.54	25.21	26.04