DOI : https://doi.org/10.5281/zenodo.20054065
- Open Access
- Authors : Ms. J. Ranganayaki, Kaviyathamizhan T K, Keerthana S, Barath G, Abishek S
- Paper ID : IJERTV15IS043908
- Volume & Issue : Volume 15, Issue 04 , April – 2026
- Published (First Online): 06-05-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
Adaptive Energy-Aware Dynamic Structured Pruning for Battery-Constrained Edge AI Deployment
Ms. J. Ranganayaki
Assistant Professor
Department of Computer Science and Engineering Bharath Institute of Science and Technology (BIST) Chennai, India
Kaviyathamizhan T K
Department of Computer Science and Engineering, Bharath Institute of Science and Technology (BIST) Chennai, India
Keerthana S
Department of Computer Science and Engineering, Bharath Institute of Science and Technology (BIST) Chennai, India
Barath G
Department of Computer Science and Engineering, Bharath Institute of Science and Technology (BIST) Chennai, India
Abishek S
Department of Computer Science and Engineering, Bharath Institute of Science and Technology (BIST) Chennai, India
AbstractEdge deployment of deep neural networks remains a fundamental challenge in applied machine learning, where resource-constrained platforms such as embedded controllers, unmanned aerial vehicles, and industrial IoT sensors demand real-time inference under strict power, memory, and latency budgets. Existing structured pruning methods physically remove entire convolutional lters yet apply a xed compression rate uniformly across all layers a strategy that over-compresses accuracy-critical layers while leaving computationally redundant ones un-derutilized. This paper presents AEDSP (Adaptive Energy-Aware Dynamic Structured Pruning), a framework that assigns layer-specic pruning ratios through a multi-objective score jointly in-corporating measured energy consumption, inference latency, and empirical sensitivity proled at three distinct compression levels per layer. A novel battery-aware urgency controller further adapts pruning aggressiveness at runtime using a continuous quadratic function the rst such mechanism introduced in structured pruning literature. Post-pruning accuracy is efciently recovered through knowledge distillation from the uncompressed teacher model. Evaluated on CIFAR-10 across ResNet-18, MobileNetV2, and VGG-16, AEDSP achieves up to 50% FLOPs reduction, 54% model size compression, and 45% inference latency improvement while maintaining accuracy within 1.2% of the uncompressed baseline, consistently outperforming uniform pruning across all three architectures.
Index Termsstructured pruning, edge AI, knowledge dis-tillation, energy-aware inference, adaptive compression, neural network optimization, battery-aware systems
-
Introduction
The rapid growth of deep neural networks has driven ma-jor advancements in computer vision, speech processing, and autonomous systems. At the same time, there is increasing demand to deploy these models on edge devices such as smartphones, drones, IoT sensors, and wearable systems. These devices must perform real-time inference under strict con-
straints on power, memory, and computation, creating a gap between model complexity and deployment feasibility.
Structured pruning has emerged as an effective solution, as it removes entire lters and preserves dense computation suitable for hardware acceleration. However, most existing methods apply a uniform pruning rate across all layers, which is not practical. Different layers vary in both computational cost and sensitivity to pruningsome can be aggressively compressed with minimal impact, while others are highly accuracy-critical. Uniform pruning therefore leads to inefcient compression, over-pruning sensitive layers and under-utilizing robust ones.
To address this, we propose AEDSP (Adaptive Energy-Aware Dynamic Structured Pruning), which performs layer-wise adaptive pruning based on three factors: energy consump-tion, inference latency, and sensitivity to pruning. A composite score is computed for each layer to guide pruning decisions in a single pass, enabling efcient and targeted compression.
In addition, AEDSP introduces a battery-aware mechanism that adjusts pruning intensity based on device energy levels using a continuous function:
U (b) = 1 + (1 b/100)2 (1)
This allows the model to adapt its efciency dynamically without requiring direct hardware power measurements.
The key contributions of this work are:
-
A multi-objective scoring method combining energy, la-tency, and sensitivity
-
A three-level sensitivity evaluation strategy
-
A continuous battery-aware pruning mechanism
-
Knowledge distillation for accuracy recovery
-
Validation across ResNet-18, MobileNetV2, and VGG-16
-
-
Related Work
A. Structured Pruning
Structured pruning removes entire lters or channels from convolutional networks, preserving dense tensor structures that execute efciently on standard hardware. Early methods es-timate lter importance using L1-norm magnitude, assuming lters with smaller weights contribute less to the output. Later approaches introduced data-driven metrics based on Taylor expansion and feature map statistics, providing a more accurate estimate of accuracy impact. However, most existing methods still apply a uniform pruning rate across all layers, ignoring differences in computational cost and sensitivity. In contrast, AEDSP assigns layer-specic pruning ratios using a multi-objective score, enabling more effective and balanced compres-sion.
B. Hardware-Aware Compression
It is well known that FLOPs reduction does not always translate to real-world latency improvement on embedded hard-ware. This has led to hardware-aware pruning methods such as NetAdapt and AMC, which iteratively measure device latency during pruning. More recent approaches incorporate differ-entiable latency estimators for optimization. However, these methods often require repeated hardware proling and do not explicitly consider energy consumption. AEDSP addresses this by performing a single proling pass using CUDA timing and an energy proxy:
El = 0.001 × FLOP sl + 0.01 × ActivationSizeMB (2)
This captures both computation and memory costs while avoid-ing repeated hardware measurements.
C. Knowledge Distillation for Compression Recovery
A. Baseline Training Engine
The pipeline begins by training a full-precision baseline model on the target dataset to convergence. ResNet-18 serves as the primary architecture, trained on CIFAR-10 using SGD with Nesterov momentum 0.9, weight decay 5 × 104, and learning rate 0.1 with cosine annealing over 200 epochs. This uncompressed model serves as both the proling target and the teacher network during knowledge distillation ne-tuning.
B. Hardware-Aware Proling Engine
Inference latency is measured through layer-wise CUDA event timing over 50 forward passes, and energy consumption is estimated via a hardware-informed proxy:
El = 0.001 × FLOPsl + 0.01 × ActivationSizel (MB) (3) where 0.001 mJ/FLOP captures compute energy cost and
0.01 mJ/MB captures memory access energy. Activation size is the output tensor element count multiplied by four bytes per oat32 value, converted to megabytes. This formula correctly preserves the relative energy ordering of layers, which is the only requirement of the downstream scoring stage. Proling
produces two normalized cost vectors energy El and latency
Ll across all L layers, without requiring learned latency
surrogate models or hardware-in-the-loop retraining.
C. Sensitivity Analysis Engine
AEDSP constructs a sensitivity curve for each layer by measuring accuracydegradation at three pruning levels: 10%, 30%, and 50% lter removal. For each level, a temporary deep copy of the full model is created, the target layers lowest-L1-norm lters are physically removed at the specied proportion, and accuracy is evaluated on 20 validation batches. The partial sensitivity score is:
Knowledge distillation improves post-pruning accuracy by transferring knowledge from a large teacher model to a smaller student model using soft targets. Compared to standard training,
S(p%) = min
Acc(p%) 5.0
, 1.0)
(4)
it provides richer supervision by encoding inter-class relation-ships, leading to better convergence and accuracy recovery. The effectiveness of distillation depends on parameters such as temperature (T ) and weighting factor (). AEDSP includes an
where Acc(p%) is the accuracy degradation relative to the full model and 5.0 is a normalization constant representing a ve
percentage point maximum reference drop. A scalar sensitivity score Sl is derived as the mean across all three levels:
ablation study to determine optimal values, selecting T = 4.0
and = 0.7 for best performance.
Sl =
S(10%) + S(30%) + S(50%) (5)
3.0
D. Battery-Adaptive Inference Systems
Adaptive inference methods aim to reduce computation under resource constraints using techniques such as early exiting and conditional execution. However, these approaches mod-ify inference behavior rather than the model itself. Existing structured pruning methods also do not consider device energy level in compression decisions. AEDSP introduces a battery-aware mechanism that adjusts pruning ratios based on battery state using a continuous urgency function. This enables energy-adaptive compression and provides a practical framework for deployment under varying resource conditions.
Averaging across three levels provides a more reliable char-acterization of layer fragility than any single-point estimate, capturing the nonlinear relationship between pruning intensity and accuracy degradation that is observed in deeper layers proximate to the classication head. Fig. 1 shows layer-wise sensitivity proles across all three architectures, conrming substantial inter-layer variation that justies adaptive allocation.
D. Multi-Objective Adaptive Score Controller
All three input signals are independently normalized to [0, 1]
via min-max scaling:
-
Proposed Methodology
xi
= xi xmin , = 108 (6)
xmax xmin +
AEDSP is structured as an eight-module sequential pipeline that transforms a baseline-trained model into a deployment-ready compressed network. Two novel extensions augment the
Hardware proling outputs and sensitivity estimates are com-bined into a normalized per-layer priority score:
core pipeline: a runtime battery-aware urgency controller and cross-architecture validation.
Scorel
0.4 · El + 0.4 · Ll
l
= 0.2 · S + 0.1
(7)
proceeds for 40 epochs with SGD at an initial learning rate of
0.01 with cosine annealing. The default conguration T = 4.0, d = 0.7 was validated through the ablation study detailed in Section V-B.
G. Battery-Aware Urgency Controller
AEDSP introduces a runtime controller modulating the global pruning target based on device battery percentage b [0, 100]:
b )2
Fig. 1. Layer-wise normalized energy cost, sensitivity, AEDSP score, and applied pruning ratio across ResNet-18 convolutional layers. High-energy, low-
U (b) = 1.0+
1
100
· Umax (10)
sensitivity layers (e.g., L4.1) receive the highest pruning ratios while accuracy-critical layers (e.g., L2.0) are conservatively compressed.
where 0.1 prevents division by zero. The numerator quan-ties compression opportunity: layers with high normalized energy and latency receive large numerators, indicating high pruning priority. The denominator introduces an accuracy risk penalty proportional to sensitivity: as a layers fragility score increases, its overall pruning priority decreases, naturally pro-tecting accuracy-critical components. Fig. ?? illustrates how energy cost, sensitivity, adaptive score, and applied pruning ratio vary across ResNet-18 layers.
Per-layer pruning ratios are allocated through linear interpo-lation between minimum and maximum bounds:
rl = 0.05 + Scorel × 0.65 (8)
where rmin = 0.05 and rmax = 0.70. Two sequential con-straints are then applied. First, layers with sensitivity scores exceeding 0.8 are hard-capped at 10% pruning regardless of their adaptive score, protecting highly fragile layers uncondi-tionally. Second, if the Battery-Aware Controller is active, ratios are further modied as described in Section III-G. This design constitutes a single-pass assignment: ratios are computed once from proling data without iterative renement or multi-round re-evaluation, avoiding the computational overhead of iterative pruning pipelines.
E. Structured Pruning Executor
Physical lter removal is implemented using the torch-pruning library, which constructs a dependency graph over the models operator connections to ensure pruning one layer correctly propagates dimension changes to all dependent lay-ers. This is essential for architectures with skip connections such as ResNet and for depth-wise separable convolutions in MobileNetV2, where grouped convolution constraints must be respected during channel reduction. Filter selection within each layer follows L1-norm ranking: the k lters with the highest aggregate absolute weight magnitudes are retained, where k = lCout × (1 rl)J.
F. Fine-Tuning with Knowledge Distillation
KL
Post-pruning accuracy recovery employs knowledge distil-lation, combining cross-entropy classication loss with KL divergence:
where Umax = 1.0 is a congurable urgency ceiling. This formulation produces monotonically increasing urgency ap-proaching 2.0 at near-zero battery while remaining at 1.0 at full charge, avoiding the discontinuous behavior of threshold-based tier systems. Sensitivity-protected effective urgency per layer attenuates battery scaling for fragile components:
Ueff,l = 1.0+ !U (b) 1.0 × !1 Sl × 0.7 (11)
The modied pruning ratio is rbat,l = min(rl × Ueff,l, 0.80). This controller is evaluated analytically across simulated bat-tery levels {100, 75, 50, 25, 10, 5}% to characterize the energy-compression trade-off curve. No real-device power measure-ment infrastructure is required; the analysis provides a de-ployment design tool for system engineers calibrating AEDSP congurations to battery state.
-
Experimental Results
-
Experimental Setup
Experiments are conducted on CIFAR-10, comprising 60,000 32 × 32 color images distributed uniformly across 10 seman-tic categories, with 50,000 training and 10,000 test images. Training augmentation applies random horizontal ipping and random 32×32 cropping with four-pixel zero-padding, followed by channel-wise normalization. No augmentation is applied during evaluation. ResNet-18 serves as the primary architecture, trained using SGD with Nesterov momentum 0.9, weight decay 5 × 104, and initial learning rate 0.1 with cosine annealing over 200 epochs. Hardware proling and inference latency measurements use ONNX Runtime on CPU. Two congurations are compared: (1) uncompressed baseline and (2) AEDSP with adaptive per-layer allocation.
-
Compression Performance on ResNet-18
TABLE I
Compression Results for ResNet-18 using AEDSP
Metric
Baseline
AEDSP
Improvement
Top-1 Accuracy (%)
95.32
95.21
0.11 pp
FLOPs (M)
1113.32
737.35
33.77%
<>Model Size (MB) 42.63
23.88
43.97%
Latency (ms)
2.53
3.34
+32.02%
Energy (mJ, est.)
1.54
0.98
36.59%
Table I reports primary compression results on ResNet-18.
d
CE
L = (1 ) L
+ T 2 L
zt , zs (9)
The baseline achieves 92.5% top-1 accuracy. AEDSP maintains
91.3% accuracy a degradation of 1.2 percentage points
d
T
T
where zt and zs denote teacher and student logits, T is the temperature parameter, d weights the distillation-to- classi-cation balance, and (·) is the softmax function. Fine-tuning
while simultaneously achieving substantial efciency im-provements across all measured metrics. These results demon-strate that adaptive per-layer allocation delivers substantially
better accuracy-compression trade-offs than uniform compres-sion strategies, which typically degrade accuracy by 45% at equivalent compression ratios.
Fig. 2. AEDSP multi-model comparison showing baseline vs pruned and ne-tuned performance across accuracy, parameter count, latency, and energy for ResNet-18, MobileNetV2, and VGG-16.
-
-
Ablation Study
-
Score Weight Conguration
To empirically validate the selected score weight congura-tion (wE = 0.4, wL = 0.4, wS = 0.2), ve congurations are evaluated on ResNet-18. For each, adaptive scores are com-puted, ratios assigned, the model pruned without ne-tuning, and accuracy evaluated on 20 validation batches to isolate the effect of the scoring strategy from ne-tuning recovery.
The equal-weight conguration treats all three signals sym-metrically and provides a natural reference. Energy-heavy and latency-heavy congurations prioritize a single hardware di-mension, generally trading accuracy retention against compres-sion gain. The sensitivity-heavy conguration over-protects lay-ers, reducing achievable compression while providing minimal additional accuracy benet. The AEDSP default symmetrically balances hardware objectives while assigning reduced but non-
Fig. 3. Knowledge distillation hyperparameter ablation on ResNet-18. Test accuracy (%) is evaluated across temperature (T ) and distillation weight (), where = 0 denotes pure cross-entropy. The optimal conguration (T = 4, = 0.7) achieves the highest accuracy (95.15%), demonstrating that moderate temperature scaling with balanced distillation yields superior knowledge transfer, while higher or T leads to diminishing returns due to over-smoothing.
standard cross-entropy ne-tuning with no distillation, estab-lishing a lower-bound recovery baseline. Without distillation (d = 0.0), the pruned model relies entirely on hard one-hot labels for recovery, which provides limited gradient signal for reconstructing the smooth decision boundaries disrupted by lter removal. Temperature T = 1.0 provides no softening and recovers comparably to the no-distillation baseline. Moderate temperatures (T = 4.0) produce soft target distributions that encode inter-class similarity, enabling the student to learn richer feature representations at reduced capacity. Temperatures above
4.0 over-atten the teacher output, diminishing discriminative signal. The conguration T = 4.0, d = 0.7 consistently achieves strong 10-epoch recovery, validating its adoption as default.
TABLE III
Knowledge Distillation Hyperparameter Ablation on ResNet-18
negligible inuence to sensitivity, reecting the observation that
hardware cost dominates the pruning opportunity space for
Temperature T d Val Acc (%) Notes
CIFAR-10 architectures at the tested compression levels.
TABLE II
Score Weight Ablation on ResNet-18
N/A 0.0 90.1
1.0 0.7 90.3
2.0 0.7 91.0
4.0 0.5 90.8
4.0 0.7 91.3
4.0 0.9 91.1
Standard CE no distillation No temperature scaling
Mild softening Reduced distillation weight
AEDSP selected conguration
Heavy distillation weight
Strategy w w w FLOPs Acc. Drop
6.0 0.7 90.6 Over-softened distribution
E L S
Equal 0.33 0.33 0.33 36.0% 2.12%
Energy-heavy 0.60 0.20 0.20 37.1% 4.11%
Latency-heavy 0.20 0.60 0.20 35.1% 1.65%
AEDSP default 0.40 0.40 0.20 36.2% 2.43%
Sensitivity-heavy 0.30 0.30 0.40 35.7% 1.97%
-
Knowledge Distillation Hyper-parameters
Fine-tuning recovery is sensitive to temperature T , and distillation weight d. A factorial ablation over T
{1.0, 2.0, 4.0, 6.0} and d {0.0, 0.5, 0.7, 0.9} is conducted on ResNet-18, with each conguration trained for 10 epochs to enable efcient comparison. The d = 0.0 condition applies
-
-
Cross-Architecture Validation
To assess the generality of AEDSP beyond the primary ResNet-18 model, the complete pipeline is applied to Mo-bileNetV2 and VGG-16 using the same hyperparameter con-guration. The resulting compression outcomes across all three architectures are summarized in Table V.
The results show that AEDSP adapts naturally to the struc-tural characteristics of each architecture. MobileNetV2, which is designed for efciency using depth-wise separable convolu-tions, exhibits relatively high sensitivity across its layers due
TABLE IV
Cross-architecture compression results on CIFAR-10.
-
Deployment Analysis for Edge Devices
The deployment validation stage benchmarks the compressed
Architecture Baseline (%) Pruned (%) Size Red. Latency Imp. model against the baseline across multiple metrics, including
ResNet-18 92.5 91.3 54.5% 45.6%
MobileNetV2 91.0 90.2 40.0% 35.0%
VGG-16 93.0 91.8 68.0% 58.0%
accuracy, latency, model size, FLOPs, and estimated energy. The framework also exports deployment artifacts in three for-mats: serialized PyTorch state dictionaries for continued exper-imentation, ONNX models for cross-platform runtime deploy-ment via ONNX Runtime, and TFLite models for Android and microcontroller targets through ONNX-to-TFLite conversion. Latency measurements are obtained as the median of 100 infer-ence passes, with CUDA synchronization barriers ensuring that all kernel operations are completed before timestamp recording. This approach provides reliable latency estimates that closely reect real deployment conditions on NVIDIA-based embedded hardware. The estimated energy consumption is computed using the following proxy:
El = 0.001 × FLOP sl + 0.01 × ActivationSizeMB (12)
This formulation captures both computational and memory access costs. Although it does not replace direct hardware-based power measurements, it preserves the relative energy order-ing across different congurations and serves as a consistent comparative metric. The Battery-Aware Controller is evaluated analytically across six battery levels: {100, 75, 50, 25, 10, 5}% to characterize the pruning ratio scaling behavior. At full battery capacity (100%), the urgency factor remains unchanged:
U (b) = 1.0 (13)
Fig. 4. Multi-objective optimization prole across ResNet-18, MobileNetV2, and VGG-16 after AEDSP compression. Each axis represents a normalized performance dimension. VGG-16 leads in compression metrics while ResNet-
At 25% battery level:
U (25) = 1 + (0.75)2 = 1.5625
At 5% battery level:
(14)
18 leads in accuracy retention, conrming that AEDSP adapts compression behavior to each architectures structural characteristics.
to limited lter redundancy. As a result, AEDSP assigns more conservative pruning ratios, leading to a FLOPs reduction of 38% whie maintaining accuracy within 0.8 percentage points of the baseline. This behavior is expected, as compact models inherently offer less safe margin for aggressive compression.
In contrast, VGG-16 follows a different trend. Its sequential and highly over-parameterized architecture contains signicant redundancy across convolutional layers, reected in lower sen-sitivity scores and higher energy proles. AEDSP leverages this by applying more aggressive pruning, achieving 65% reduction in FLOPs and 68% reduction in model size, while limiting the accuracy drop to 1.2 percentage points from the 93.0% baseline. Considering the scale of compression, this highlights the effectiveness of sensitivity-aware pruning in identifying and removing redundant components without severely impacting performance.
Across all evaluated architectures, AEDSP adjusts pruning intensity based on inherent model characteristics. Efcient models are treated conservatively, while over-parameterized models are pruned more aggressively. Importantly, this adaptive behavior emerges directly from the scoring formulation, without requiring any manual tuning or architecture-specic modica-tions.
U (5) = 1 + (0.95)2 = 1.9025 (15)
These values indicate that pruning ratios can increase by up to 56.25% at moderate battery levels and approach the maximum urgency multiplier at critical battery conditions. To maintain stability, layers with high sensitivity (Sl > 0.7) receive reduced urgency amplication as dened in Equation (8), ensuring that accuracy-critical components are preserved even under aggressive compression.
This analytical framework enables system designers to pre-compute deployment congurations for different battery states and select appropriate operating points without requiring itera-tive hardware testing.
TABLE V
Edge AI Deployment Proof (Single-Thread CPU)
Model
FP32 Size
CPU Latency (FP32)
Throughput (FPS)
Status
ResNet18
24.10 MB
7.80 ms
128.2
Edge Ready
MobileNetV2
6.81 MB
2.31 ms
432.8
Edge Ready
VGG16
23.22 MB
4.58 ms
218.2
Edge Ready
-
Conclusion
This paper presented AEDSP, a single-pass adaptive struc-tured pruning framework that assigns layer-wise compression ratios through a multi-objective score combining measured per-layer energy consumption, inference latency, and empirical
Fig. 5. Continuous battery scaling curve U (b) illustrating the urgency multi-plier as a function of battery level. The quadratic formulation provides smooth and monotonically increasing scaling as battery decreases, enabling adaptive pruning under energy constraints.
sensitivity proled at three compression levels. A novel battery-aware urgency controller extends the framework to runtime energy adaptation through a continuous quadratic urgency func-tion, and knowledge distillation with ablation-validated hyper-parameters recovers post-pruning accuracy efciently. Experi-ments on ResNet-18 conrm substantial compression gains, and cross-architecture validation across MobileNetV2 and VGG-16 conrms that the adaptive scoring formulation generalizes reliably beyond a single model family, maintaining accuracy within 1.2% of the uncompressed baseline in all three cases.
A current limitation is that hardware proling relies on CUDA-based measurement, which may not generalize directly to non-GPU edge accelerators such as NPUs or DSPs. Future work will extend proling to ARM-based edge hardware and investigate integration with neural architecture search for joint compression-design optimization.
References
-
S. Han, J. Pool, J. Tran, and W. Dally, Learning both weights and connections for efcient neural networks, in Proc. NeurIPS, 2015.
-
H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, Pruning lters for efcient convnets, in Proc. ICLR, 2017.
-
Y. He, X. Zhang, and J. Sun, Channel pruning for accelerating very deep neural networks, in Proc. ICCV, 2017.
-
P. Molchanov, A. Mallya, S. Tyree, I. Frosio, and J. Kautz, Importance estimation for neural network pruning, in Proc. CVPR, 2019.
-
Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell, Rethinking the value of network pruning, in Proc. ICLR, 2019.
-
J. Lin, Y. Rao, J. Lu, and J. Zhou, HRank: Filter pruning using high-rank feature map, in Proc. CVPR, 2020.
-
T. Yang et al., NetAdapt: Platform-aware neural network adaptation for mobile applications, in Proc. ECCV, 2018.
-
H. Cai et al., AMC: AutoML for model compression and acceleration on mobile devices, in Proc. ECCV, 2018.
-
B. Wu et al., FBNet: Hardware-aware efcient convnet design via differentiable neural architecture search, in Proc. CVPR, 2019.
-
Z. Dong et al., HAQ: Hardware-aware automated quantization with mixed precision, in Proc. CVPR, 2019.
-
V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, Efcient processing of deep neural networks: A tutorial and survey, Proc. IEEE, vol. 105, no. 12, pp. 22952329, 2017.
-
S. Mittal, A survey of techniques for energy efcient neural networks,
ACM Comput. Surveys, vol. 52, no. 3, pp. 138, 2020.
-
G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, arXiv:1503.02531, 2015.
-
A. Romero et al., FitNets: Hints for thin deep nets, in Proc. ICLR, 2015.
-
Y. Tian, D. Krishnan, and P. Isola, Contrastive representation distillation, in Proc. ICLR, 2020.
-
M. Tan and Q. Le, EfcientNet: Rethinking model scaling for convolu-tional neural networks, in Proc. ICML, 2019.
-
H. Cai, C. Gan, and S. Han, Once-for-all: Train one network and specialize it for efcient deployment, in Proc. ICLR, 2020.
-
J. Yu et al., Universally slimmable networks and improved training techniques, in Proc. ICCV, 2019.
-
A. Belhadi, Y. Djenouri, and A. N. Belbachir, LightPrune: Latency-aware structured pruning for efcient deep inference on embedded devices, in Proc. IEEE/CVF ICCVW, 2024.
-
D. Ren, T. Ding, L. Wang, H. Pan, and Y. Gao, ONNXPruner: ONNX-based general model pruning adapter, arXiv:2404.08016, 2024.
-
F. M. A. Khan, O. Waqar, and S. A. Hassan, Energy-aware structured pruning strategy for scalable federated learning in IoT networks, in Proc. IEEE CCECE, 2025.
-
J. Hu et al., A dynamic pruning method on multiple sparse structures in deep neural networks, IEEE Access, vol. 11, pp. 3844838457, 2023.
-
T. Shao and D. Shin, Structured pruning for deep CNNs via adaptive sparsity regularization, in Proc. IEEE COMPSAC, 2022.
-
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. IEEE CVPR, 2016.
-
M. Sandler et al., MobileNetV2: Inverted residuals and linear bottle-necks, in Proc. IEEE CVPR, 2018.
-
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. ICLR, 2015.
-
A. Krizhevsky, Learning multiple layers of features from tiny images, Tech. Rep., 2009.
