PermaFusionNet: A Hybrid Deep Learning Framework for Multi-Modal Permafrost Degradation Mapping using SAR and Optical Remote Sensing Data

Rohit Kumar Singh; Suneel Kumar; Rakesh Sahu

doi:10.17577/IJERTCONV14IS050076

IIRA 5.0 - 2026 (Volume 14 - Issue 05)

PermaFusionNet: A Hybrid Deep Learning Framework for Multi-Modal Permafrost Degradation Mapping using SAR and Optical Remote Sensing Data

DOI : 10.17577/IJERTCONV14IS050076

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 34
Authors : Rohit Kumar Singh, Suneel Kumar, Rakesh Sahu
Paper ID : IJERTCONV14IS050076
Volume & Issue : Volume 14, Issue 05, IIRA 5.0 (2026)
Published (First Online) : 24-05-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

PermaFusionNet: A Hybrid Deep Learning Framework for Multi-Modal Permafrost Degradation Mapping using SAR and Optical Remote Sensing Data

Rohit Kumar Singh¹, ²*, Suneel Kumar³, Rakesh Sahu

¹ Department of Computer Science & Engineering, IFTM University, Moradabad

² Department of Computer Science & Engineering, Moradabad Institute of Technology, Moradabad ³ Department of Computer Applications, IFTM University, Moradabad

School of Computing Science and Engineering, Bennett University, Greater Noida

ABSTRACT

Permafrost degradation, driven by accelerating climate change, poses severe environmental and infrastructural risks across Arctic and sub-Arctic regions. Accurate and large-scale monitoring is critical yet remains challenging due to the heterogeneous nature of permafrost landscapes and the limitations of single-sensor observations. In this paper, we propose PermaFusionNet, a novel hybrid deep learning architecture that integrates multi-temporal Synthetic Aperture Radar (SAR) data from Sentinel-1 with multispectral optical imagery from Sentinel-2 for comprehensive permafrost degradation mapping. Our architecture combines a dual-branch convolutional encoder for spatial feature extraction, a cross-modal attention module for adaptive fusion of SAR and optical features, and a temporal recurrent decoder leveraging Long Short-Term Memory (LSTM) units to capture seasonal and inter-annual degradation dynamics. Experiments conducted over the Tibetan Plateau and Siberian lowlands demonstrate that PermaFusionNet achieves an overall accuracy of 91.3% and an F1-score of 0.887 on permafrost degradation classification, outperforming state-of-the-art methods by 4.7 percentage points. Ablation studies confirm the contribution of each architectural component, and qualitative results show robust detection of thermokarst lake expansion, retrogressive thaw slumps, and active layer deepening zones.

Keywordspermafrost degradation, deep learning, SAR, multi-modal fusion, remote sensing, Sentinel- 1, Sentinel-2, climate change monitoring.

INTRODUCTION

Permafrost, defined as ground that remains frozen for at least two consecutive years, underlies approximately 24% of the Northern Hemisphere's land surface [1]. As global temperatures rise at nearly twice the global average rate in the Arctic, permafrost is thawing at unprecedented rates, releasing vast stores of carbon dioxide and methane, destabilizing terrain, and threatening critical infrastructure across Russia, Canada, and the Tibetan Plateau [2]. Remote monitoring of permafrost degradation is therefore an urgent scientific and societal priority.

Traditional approaches to permafrost assessment rely on in situ borehole measurements and surface temperature sensors, which, while accurate, are spatially sparse and logistically expensive to maintain over the vast extents of permafrost regions [3]. Satellite remote sensing offers a scalable alternative, and recent advances in deep learning have substantially improved the capacity to extract meaningful information from large-scale geospatial data. However, existing methods typically leverage either SAR

or optical sensors in isolation, failing to exploit the complementary physical information available from multi-modal observations.

SAR data are sensitive to soil moisture, surface roughness, and subsidence associated with ground ice melt key signatures of permafrost degradation and are unaffected by cloud cover, enabling year- round monitoring [4]. Optical imagery, in contrast, provides rich spectral information on surface vegetation, standing water, and exposed soil, allowing detection of thermokarst features and land cover change [5]. Fusion of these two modalities represents a largely untapped opportunity for improved permafrost mapping.

In this work, we present PermaFusionNet, a hybrid deep learning framework specifically designed for multi-modal permafrost degradation mapping. Our main contributions are:
- A dual-branch encoder architecture that independently extracts SAR backscatter and optical spectral features before adaptive cross-modal attention-based fusion.
- A temporal LSTM decoder that models multi-year time-series dynamics to distinguish progressive degradation from seasonal freeze-thaw signals.
- A curated benchmark dataset covering two permafrost zones: the Tibetan Plateau and the West Siberian Lowlands, including ground-truth labels derived from field surveys and high-resolution ancillary data.
- Comprehensive evaluation against five state-of-the-art baselines, with extensive ablation studies validating each design choice.
  
  The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 describes the study area and dataset. Section 4 details the proposed architecture. Section 5 presents experimental results. Section 6 discusses implications, and Section 7 concludes.
RELATED WORK
Multi-modal fusion strategies broadly fall into early (feature concatenation), late (decision-level), and intermediate (attention-based) fusion paradigms. For permafrost-specific tasks, Wang et al. [13] combined Sentinel-1 SAR and Landsat optical data using a simple early fusion CNN, achieving moderate accuracy on thermokarst detection. More recently, Chen et al. [14] proposed a cross-attention transformer for SAR-optical fusion in urban change detection, motivating our adoption of a similar attention mechanism adapted to the permafrost domain. To our knowledge, PermaFusionNet is the first architecture to jointly address spatial multi-modal fusion and temporal sequence modeling for permafrost degradation mapping.

STUDY AREA AND DATASET

Study Areas

We selected two geographically and climatically distinct permafrost zones. The Tibetan Plateau (26 40°N, 73105°E) represents high-altitude continuous and discontinuous permafrost at elevations above 4,000 m. The West Siberian Lowlands (5870°N, 6090°E) constitute one of the largest expanses of lowland peatland permafrost on Earth. Together, these regions encompass a range of terrain types, degradation intensities, and seasonal dynamics.
Remote Sensing Data

SAR dat were acquired from the Sentinel-1A/B C-band SAR constellation (5.6 cm wavelength) in Interferometric Wide (IW) swath mode, providing dual-polarization (VV and VH) backscatter composites at 10 m spatial resolution. Time series spanning 20182023 were compiled, with monthly composites generated to reduce speckle noise. Optical data were sourced from Sentinel-2 MultiSpectral Instrument (MSI) Level-2A surface reflectance products, providing 13 spectral bands at 1060 m resolution. Cloud-free seasonal composites were constructed using the median pixel compositing approach. Both datasets were co-registered to a common 10 m UTM grid.
Ground Truth and Labeling

Ground truth labels for four degradation classes were compiled from: (1) field survey data collected during 20192022 campaigns (1,247 GPS-located observation points); (2) high-resolution Planet imagery (3 m) interpreted by domain experts; and (3) existing permafrost inventory maps. The four classes are: (a) Stable permafrost, (b) Active-layer deepening, (c) Thermokarst / retrogressive thaw slumps, and (d) Complete degradation / talik formation. Table 1 summarizes the dataset statistics.

Table 1: Dataset Statistics

Class	Training	Validation	Test	Total Patches
Stable Permafrost	3,840	960	1,200	6,000
Active Layer Deepening	2,560	640	800	4,000
Thermokarst / RTS	1,920	480	600	3,000
Complete Degradation	1,280	320	400	2,000
Total	9,600	2,400	3,000	15,000

PROPOSED METHOD: PERMAFUSIONNET
The network is trained end-to-end using AdamW optimizer with initial learning rate 1×10, weight decay 1×10, and cosine annealing schedule over 100 epochs. We employ a combined loss of cross- entropy and Dice loss (weighted equally) to address class imbalance. Data augmentation includes random horizontal/vertical flips, rotation (±180°), and speckle noise injection for SAR channels. Batch size is 16, with gradient clipping at norm 1.0.

EXPERIMENTAL RESULTS

Baselines

We compare PermaFusionNet against five baselines: (1) Random Forest (RF) on hand-crafted features,

(2) Single-modal SAR U-Net, (3) Single-modal Optical U-Net, (4) Early Fusion CNN (EF-CNN) [13], and (5) Cross-Attention Transformer (CA-Transformer) [14].

Quantitative Results

Table 2 reports overall accuracy (OA), mean F1-score, and per-class F1 on the held-out test set.

Table 2: Comparison with State-of-the-Art Methods

Method	OA (%)	mF1	F1- Stable	F1- Active	F1- Thermo	F1- Degrad.
Random Forest	71.4	0.698	0.812	0.681	0.634	0.665
SAR U-Net	78.9	0.771	0.844	0.763	0.715	0.762
Optical U-Net	76.2	0.744	0.821	0.731	0.694	0.730
EF-CNN [13]	82.3	0.809	0.873	0.798	0.771	0.794
CA-Transformer [14]	86.6	0.848	0.901	0.845	0.813	0.833
PermaFusionNet (Ours)	91.3	0.887	0.934	0.882	0.861	0.871

PermaFusionNet achieves an OA of 91.3% and mF1 of 0.887, outperforming the strongest baseline (CA-Transformer) by 4.7 OA points and 0.039 mF1. The most pronounced gains are observed for the thermokarst class (F1: +0.048), which is particularly challenging due to its heterogeneous morphology and spectral similarity to standing water bodies. The temporal LSTM component is especially beneficial here, as it captures the rapid expansion dynamics of thermokarst lakes across seasons.

Ablation Study

Table 3 reports results for progressively ablated variants of PermaFusionNet to isolate the contribution of each design component.

Table 3: Ablation Study Results

Configuration	OA (%)	mF1	OA	mF1
	91.3	0.887
w/o temporal LSTM (single timestep)	86.7	0.841	-4.6	-0.046
w/o cross-modal attention (simple concat)	88.1	0.858	-3.2	-0.029
w/o dual-branch (single shared encoder)	89.0	0.866	-2.3	-0.021
w/o skip connections	87.4	0.849	-3.9	-0.038

All components contribute positively, with the temporal LSTM yielding the largest individual gain (+4.6 OA), confirming the critical importance of modeling multi-year degradation trajectories rather than treating each time step independently.

DISCUSSION

The strong performance of PermaFusionNet on the thermokarst class has direct implications for large- scale permafrost carbon cycle modeling, as thermokarst features are a primary conduit for greenhouse gas release from thawing organic soil. Our temporal modeling component is particularly valuable in distinguishing seasonal freeze-thaw cycles which cause reversible SAR backscatter changes from true multi-year degradation trends, a critical disambiguation that purely spatial or single-timestep methods cannot reliably achieve.

The cross-modal attention mechanism provides interpretable modality weighting maps. Inspection reveals that in winter months, when snow cover saturates optical bands, the network automatically up- weights SAR features. Conversely, during summer, optical spectral indices dominate in vegetated tundra zones, while SAR dominates in areas of high soil moisture associated with active layer thaw. This adaptive behavior is a key advantage over static early fusion baselines.

Limitations of the current work include reliance on labeled ground truth that, despite extensive field surveys, remains geographically clustered around accessible sites. Future work will explore semi- supervised and self-supervised pre-training strategies on unlabeled SAR-optical time series, as well as extension to pan-Arctic scale mapping using distributed computing pipelines. Integration of additional sensors such as ALOS-2 L-band SAR and ICESat-2 altimetry holds promise for improved detection of subsidence in ice-rich permafrost zones.
CONCLUSION

We have presented PermaFusionNet, a hybrid deep learning architecture for permafrost degradation mapping from multi-modal SAR and optical satellite imagery. By combining a modality-specific dual- branch encoder, cross-modal attention fusion, and a temporal ConvLSTM decoder, our model achieves state-of-the-art performance on a multi-site benchmark spanning the Tibetan Plateau and West Siberian Lowlands. With an overall accuracy of 91.3% and mF1 of 0.887, PermaFusionNet advances the frontier of AI-driven cryosphere monitoring and offers a scalable, automated tool for tracking permafrost degradation in a warming world.

REFERENCES

Obu, J. et al. (2019). Northern Hemisphere permafrost map based on TTOP modelling. Earth- Science Reviews, 193, 299316.
Biskaborn, B.K. et al. (2019). Permafrost is warming at a global scale. Nature Communications, 10(1), 264.
Smith, S.L. et al. (2022). Permafrost monitoring and the need for a global network. Frontiers in Earth Science, 10, 893000.
Zwieback, S., & Berg, A.A. (2019). Fine-scale SAR soil moisture estimation in the subarctic tundra. IEEE TGRS, 57(9), 65456556.
Grosse, G. et al. (2013). Vulnerability and feedbacks of permafrost to climate change. Eos, Transactions AGU, 94(51), 469476.
Kim, Y. et al. (2012). Freezethaw status and active layer thickness retrieved from AMSR-E. IEEE TGRS, 50(11), 43514362.
Short, N. et al. (2014). Application of InSAR to Arctic permafrost slope. Canadian Journal of Earth Sciences, 51(6), 559570.
Nitze, I. et al. (2018). Remote sensing quantifies widespread abundance of permafrost region disturbances across the Arctic and Subarctic. Nature Communications, 9(1), 5423.
Ma, L. et al. (2019). Deep learning in remote sensing applications: A meta-analysis and review. ISPRS JPRS, 152, 166177.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. MICCAI, LNCS 9351, 234241.
Rußwurm, M., & Körner, M. (2018). Multi-temporal land cover classification with sequential recurrent encoders. ISPRS IJGI, 7(4), 129.
Wang, Y. et al. (2022). SatViT: Pretraining transformers for Earth observation. IEEE GRSL, 19, 5612305.
Wang, L. et al. (2021). Thermokarst lake detection using multi-temporal SAR and optical data fusion. Remote Sensing of Environment, 265, 112676.
Chen, H. et al. (2023). Cross-attention transformer for SAR-optical change detection. IEEE TGRS, 61, 5204418.
Shi, X. et al. (2015). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. NeurIPS, 28, 802810.

IIRA 5.0 - 2026 (Volume 14 - Issue 05)

PermaFusionNet: A Hybrid Deep Learning Framework for Multi-Modal Permafrost Degradation Mapping using SAR and Optical Remote Sensing Data

PermaFusionNet: A Hybrid Deep Learning Framework for Multi-Modal Permafrost Degradation Mapping using SAR and Optical Remote Sensing Data

ABSTRACT

Keywordspermafrost degradation, deep learning, SAR, multi-modal fusion, remote sensing, Sentinel- 1, Sentinel-2, climate change monitoring.

INTRODUCTION

RELATED WORK

Remote Sensing for Permafrost Monitoring

Deep Learning for Geospatial Analysis

Multi-Modal Fusion Approaches

STUDY AREA AND DATASET

Study Areas

Remote Sensing Data

Ground Truth and Labeling

Table 1: Dataset Statistics

PROPOSED METHOD: PERMAFUSIONNET

Architecture Overview

Dual-Branch Spatial Encoder

Cross-Modal Attention Fusion

Temporal LSTM Decoder

Training Protocol

EXPERIMENTAL RESULTS

Baselines

Quantitative Results

Table 2: Comparison with State-of-the-Art Methods

Ablation Study

Table 3: Ablation Study Results

DISCUSSION

CONCLUSION

REFERENCES