Reinforcement Learning-Based Adaptive Edge Task Ofﬂoading for Smart City Applications

doi:https://doi.org/10.5281/zenodo.20338966

Volume 15, Issue 05 (May 2026)

Reinforcement Learning-Based Adaptive Edge Task Ofﬂoading for Smart City Applications

DOI : https://doi.org/10.5281/zenodo.20338966

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 5
Authors : Khalid, Prashant Baghmar
Paper ID : IJERTV15IS051596
Volume & Issue : Volume 15, Issue 05 , May – 2026
Published (First Online): 22-05-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Reinforcement Learning-Based Adaptive Edge Task Ofoading for Smart City Applications

Khalid

Lecturer, Department of Electronics Engineering Government Polytechnic College Mandore, Jodhpur, Rajasthan, India

Prashant Baghmar

Lecturer, Department of Electronics Engineering Government Polytechnic College Jodhpur, Rajasthan, India

AbstractThe increasing deployment of Internet of Things (IoT) devices in smart city environments has generated sig-nicant demand for low-latency and energy-efcient compu-tational frameworks. Although edge computing reduces com-munication delay by processing tasks closer to end users, efcient task ofoading remains a major challenge due to dy-namic network conditions, limited edge resources, and varying computational workloads. To address these issues, this paper proposes a Reinforcement Learning (RL)-based adaptive edge task ofoading framework for smart city applications. The proposed system utilizes a Deep Q-Network (DQN) agent to intelligently allocate computational tasks among local devices, edge servers, and cloud infrastructure according to real-time resource availability, bandwidth conditions, and server work-load. The objective of the proposed framework is to minimize execution latency, reduce energy consumption, and improve edge resource utilization. The framework was implemented and evaluated in a Google Colab-based simulation environment under varying IoT trafc conditions. Experimental results demonstrated that the proposed RL-based approach achieved lower latency, higher task success rate, and improved workload balancing compared with conventional local execution, random ofoading, cloud-only execution, and greedy edge selection methods. The obtained results conrm the effectiveness of re-inforcement learning for intelligent edge resource management in next-generation smart city networks.

Index TermsEdge Computing, Reinforcement Learning, Smart City, Task Ofoading, Deep Q-Network, IoT, Resource Allocation.

Introduction

The rapid growth of Internet of Things (IoT) devices and smart city infrastructures has signicantly increased the demand for real-time data processing and low-latency com-munication systems [4], [5]. Smart city applications such as intelligent trafc management, surveillance monitoring, environmental sensing, healthcare systems, and emergency response services continuously generate massive volumes of data that require efcient computational resources for

timely processing [3], [7]. Traditional cloud computing architectures often experience high communication delay, bandwidth congestion, and increased energy consumption due to centralized processing, making them less suitable for latency-sensitive smart city applications [5], [12].

Edge computing has emerged as a promising solution to address these limitations by bringing computational re-sources closer to end users [5], [12]. By processing tasks at nearby edge servers instead of distant cloud data centers, edge computing reduces communication latency, improves response time, and enhances resource utilization [4], [10]. However, efcient task ofoading in dynamic edge environ-ments remains a major challenge due to uctuating network conditions, limited computational resources, varying work-load distribution, and heterogeneous IoT trafc patterns [1], [6].

Recently, Reinforcement Learning (RL) has gained con-siderable attention for solving dynamic optimization prob-lems in wireless networks and edge computing systems [1], [2]. RL enables intelligent agents to learn optimal decision-making policies through continuous interaction with the environment without requiring explicit system modeling [8], [9]. This adaptive learning capability makes RL highly suitable for edge task scheduling and resource allocation problems [14].

Several recent studies have explored RL-based task of-oading strategies in mobile edge computing systems [8], [9]. Although existing approaches demonstrate improved re-source allocation performance, many frameworks still suffer from scalability limitations, inefcient workload balancing, and increased latency under dynamic smart city trafc con-ditions [1], [2]. Therefore, the development of intelligent and adaptive edge task ofoading mechanisms remains an important research challenge.

Motivated by these challenges, this paper proposes a Reinforcement Learning-based adaptive edge task ofoading framework for smart city applications. The proposed system employs a Deep Q-Network (DQN) agent to dynamically allocate computational tasks among local devices, edge servers, and cloud infrastructure based on real-time network conditions and resource availability. The proposed frame-work aims to minimize execution latency, reduce energy

consumption, and improve task success rate while ensuring efcient edge resource utilization.

The major contributions of this work are summarized as follows:
1. Development of an RL-based adaptive edge task ofoading framework for smart city environments.
2. Integration of Deep Q-Network learning for intel-ligent and dynamic task allocation.
3. Optimization of latency, energy consumption, and workload balancing in edge computing systems.
4. Comparative performance evaluation against con-ventional ofoading approaches under varying net-work conditions.
The remainder of the paper is organized as follows. Section II presents the proposed methodology and system model. Section III describes the simulation setup and ex-perimental conguration. Section IV discusses the obtained results and performance analysis. Finally, Section V con-cludes the paper and outlines future research directions.
Methodology

This section presents the proposed Reinforcement Learn-ing (RL)-based adaptive edge task ofoading framework for smart city applications. The framework is designed to intel-ligently allocate computational tasks among local devices, nearby edge servers, and centralized cloud infrastructure in order to minimize execution latency, energy consumption, and network congestion.
1. System Architecture
  
  The proposed system as shown in Fig. 1 consists of three major layers:
  1. IoT Device Layer
  2. Edge Computing Layer
  3. Cloud Layer
  In the smart city environment, multiple Internet of Things (IoT) devices continuously generate computational tasks such as trafc monitoring, environmental sensing, surveillance analytics, and smart healthcare requests. These tasks are forwarded to the edge orchestration module, where the RL agent determines the optimal execution location.
  
  The edge layer contains multiple heterogeneous edge servers positioned near end users to provide low-latency computation. The cloud layer serves as a centralized re-source with high computational capability but larger com-munication delay.
  
  The RL agent dynamically learns the optimal ofoad-ing strategy according to the current network state, server workload, and communication conditions.
2. Task Generation Model
  
  Let the smart city network contain N IoT devices rep-resented as
  
  D = {d1, d2, d3,…, dN }
  
  Each device generates a computational task Ti charac-terized by:
  
  Ti = (Si, Ci, Li)
  
  where:
  - Si denotes input data size (MB),
  - Ci repreents required CPU cycles,
  - Li indicates latency sensitivity.
  Tasks are generated dynamically following a Poisson distribution to simulate real-time smart city trafc condi-tions.
3. Edge Computing Model
  
  Assume the edge layer contains M edge servers repre-sented as
  
  E = {e1, e2, e3,…, eM }
  
  Each edge server has limited computational resources:
  
  Rj = (fj, qj, bj)
  
  where:
  - fj is CPU processing frequency,
  - qj is current queue length,
  - bj is available bandwidth.
  The RL agent continuously monitors these parameters before making ofoading decisions.
4. Task Ofoading Strategy
  
  For each generated task, the agent selects one of the following execution modes:
  1. Local execution
  2. Edge server execution
  3. Cloud execution
    
    The decision variable is dened as:
    
    at {0, 1, 2,…,M }
    
    where:
    - at = 0 indicates local execution,
    - at = j indicates ofoading to edge server ej,
    - at = M +1 indicates cloud execution.
  Figure 1. Proposed RL-Based Adaptive Edge Task Ofoading Framework for Smart City Applications.
5. Latency Model
  
  The total latency for task execution consists of transmis-sion delay and computation delay.
  
  The transmission delay is computed as:
  - denotes effective switched capacitance,
  - f is processing frequency.
  For ofoaded tasks, transmission energy is calculated as:
  
  Etx = Pt × Ttx
  
  Ttx
  
  = Si
  
  B
  
  where Pt denotes transmission power.
  
  where B denotes communication bandwidth. The computation delay is expressed as:
  
  T = Ci
  
  comp f
  
  Thus, the total execution latency becomes:
  
  Ttotal = Ttx + Tcomp + Tq
  
  where Tq represents queue waiting time.
6. Energy Consumption Model
  
  The total energy consumption becomes:
  
  Etotal = Elocal + Etx
7. Reinforcement Learning Formulation
  
  The adaptive task ofoading problem is modeled as a Markov Decision Process (MDP).
  1. State Space. The state at time step t is represented as:
    
    Energy consumption is calculated to evaluate the ef-ciency of task execution.
    
    where:
    
    st = (qt, bt, ft, lt)
    
    For local execution:
    
    Elocal = f 2Ci
    
    where:
    - qt represents edge server queue status,
    - bt denotes available bandwidth,
    - ft indicates computational resource availability,
    - lt denotes task latency requirement.
  2. Action Space. The action space corresponds to pos-sible ofoading destinations:
    
    A = {Local, Edge1, Edge2,…,Cloud}
  3. Reward Function. The reward function is designed to minimize latency and energy consumption while maxi-mizing successful task execution.
    
    The reward is dened as:
    
    Rt = (Ttotal + Etotal)
    
    where:
    - controls latency importance,
    - controls energy importance.
      
      Higher rewards correspond to better ofoading deci-sions.
8. Deep Q-Network (DQN) Based Learning
  
  A Deep Q-Network (DQN) is employed to learn the optimal ofoading policy.
  
  The Q-value update equation is given by:
  
  Q(st, at) Q(st, at)+(rt+ max Q(st+1, a)Q(st, at)i
  
  where:
  - is learning rate,
  - is discount factor,
  - rt denotes immediate reward.
  The neural network approximates the optimal Q-function and continuously improves the ofoading policy through iterative interactions with the environment.
9. Simulation Environment
  
  The proposed framework is implemented in Google Colab using Python-based simulation tools. The simulation environment includes:
  - NumPy for numerical computation,
  - Gymnasium for RL environment modeling,
  - PyTorch for DQN implementation,
  - Matplotlib for visualization,
  - Scikit-learn for performance evaluation.
  The smart city environment is simulated with varying numbers of IoT devices, edge servers, communication band-widths, and task arrival rates.
10. Performance Evaluation Metrics
  
  The proposed framework is evaluated using the follow-ing metrics:
  1. Average task execution latency
  2. Energy consumption
  3. Task success rate
  4. Edge server utilization
  5. Network throughput
  6. RL convergence reward
  The obtained results are compared with conventional task ofoading approaches such as local-only execution and random ofoading strategies.

Simulation Setup and Experimental Cong-uration

This section describes the simulation environment, net-work conguration, reinforcement learning parameters, and evaluation settings used to validate the proposed adaptive edge task ofoading framework.

Simulation Environment

The proposed framework is implemented using Google Colab with Python-based scientic and machine learning libraries. The simulation environment emulates a smart city edge computing scenario consisting of multiple IoT devices, edge servers, and a centralized cloud server.

The implementation utilizes the following software tools and libraries:
- Python 3.10
- NumPy
- PyTorch
- Gymnasium
- Matplotlib
- Scikit-learn
The RL environment is designed to simulate dynamic task arrivals, varying communication bandwidth, queue con-gestion, and heterogeneous edge server capacities.

Network Conguration

The simulated smart city network consists of multiple IoT devices connected to nearby edge servers through wire-less communication links. The edge servers are connected to a centralized cloud infrastructure through high-speed backbone communication.

The simulation parameters are summarized in Table 1.

TABLE 1. Simulation Parameters

Parameter	Value
Number of IoT Devices	50200
Number of Edge Servers	5
Cloud Server	1
Task Arrival Distribution	Poisson
Task Size	110 MB
CPU Cycles per Task	1001000 MHz
Bandwidth Range	5100 Mbps
Edge CPU Frequency	24 GHz
Cloud CPU Frequency	10 GHz
Simulation Episodes	500
Discount Factor ()	0.95
Learning Rate ()	0.001
Relay Buffer Size	10000
Batch Size	64

IoT Task Modeling

The IoT devices generate computational tasks dynam-ically according to real-time smart city events. Each task contains varying computational complexity and latency re-quirements.

The generated tasks include:
- Trafc monitoring tasks
- Video surveillance analytics
- Environmental sensing data
- Emergency alert processing
- Smart healthcare monitoring
The task generation process follows a stochastic Pois-son distribution in order to emulate realistic urban trafc conditions.
Edge Server Conguration

Each edge server is modeled with limited computational resources and dynamic queue conditions. The available computational capacity changes continuously according to incoming task load.

The processing capacity of each edge server is repre-sented as:

Cedge = fedge × t

where:
- fedge denotes CPU processing frequency,
- t represents available execution time.
The queue state of each edge node is updated dynami-cally after every ofoading decision.
Communication Model

where:
- R denotes achievable transmission rate,
- B represents channel bandwidth,
- SNR denotes signal-to-noise ratio.
The communication latency varies according to band-width availability and network trafc intensity.
Reinforcement Learning Conguration

The adaptive ofoading framework employs a Deep Q-Network (DQN) agent to learn optimal ofoading policies.

The DQN architecture consists of:
- Input Layer
- Two Fully Connected Hidden Layers
- Output Action Layer
  
  The hidden layers utilize Rectied Linear Unit (ReLU) activation functions for nonlinear feature extraction.
  
  The RL agent follows an E-greedy exploration strategy:
- Initial exploration rate (E) = 1.0
- Minimum exploration rate = 0.01
- Exploration decay factor = 0.995
Experience replay is incorporated to stabilize the learn-ing process and improve convergence performance.
Baseline Comparison Methods

The proposed RL-based adaptive task ofoading frame-work is compared with the following baseline approaches:
1. Local Execution Only: All tasks are processed locally without ofoading.
2. Random Ofoading: Tasks are randomly assigned to available edge servers.
3. Cloud-Only Execution: All tasks are transmitted to the centralized cloud server.
4. Greedy Edge Selection: Tasks are assigned to the nearest edge server with minimum queue length.
Performance Evaluation Criteria

The effectiveness of the proposed framework is evalu-ated using multiple performance indicators.
1. Average Latency. The average execution delay of all tasks is computed as:
  
  Wireless communication between IoT devices and edge servers experiences varying transmission conditions due to network congestion and bandwidth uctuations.
  
  The achievable data transmission rate is computed using
  
  where:
  
  Lavg
  
  N
  
  1 L
  
  = Ti
  
  N
  
  i=1
  
  Shannons capacity equation:
  
  :contentReference[oaicite:0]index=0
  - N represents total number of tasks,
  - Ti denotes execution latency of task i.
2. Average Energy Consumption. The overall energy efciency is evaluated as:
  
  Eavg
  
  N
  
  1 L
  
  = Ei
  
  N
  
  i=1
  
  where Ei represents energy consumed by task i.
3. Task Success Rate. The successful task completion ratio is calculated as:
  
  Success Rate = Nsuccess × 100
  
  Ntotal
  
  where:
  - Nsuccess denotes successfully completed tasks,
  - Ntotal represents total generated tasks.
Experimental Workow

The complete experimental procedure consists of the following steps:
1. Smart city environment initialization
2. IoT task generation
3. State observation by RL agent
4. Adaptive task ofoading decision
5. Task execution at selected node
6. Reward calculation
7. DQN parameter update
8. Performance metric evaluation
The training process continues iteratively until the RL agent converges to an optimal ofoading policy.

Results and Discussion

This section presents the performance evaluation of the proposed Reinforcement Learning (RL)-based adaptive edge task ofoading framework. The obtained results are analyzed in terms of latency reduction, energy efciency, task success rate, and learning convergence under varying smart city network conditions.
1. Training Convergence Analysis
  
  The Deep Q-Network (DQN) agent was trained for 500 simulation episodes to learn the optimal task ofoading policy. During training, the cumulative reward gradually increased as the agent learned efcient resource allocation strategies.
  
  Fig. 2 illustrates the convergence behavior of the RL agent.
  
  Figure 2. Training reward convergence of the proposed DQN-based of-oading framework.
  
  Initially, the reward values uctuate signicantly due to exploration of different actions. As training progresses, the reward stabilizes and converges toward an optimal policy, indicating successful learning of adaptive ofoading deci-sions.
2. Latency Performance Analysis
  
  Task execution latency is one of the most critical per-formance indicators in smart city edge computing systems. The proposed RL-based framework dynamically selects ex-ecution nodes according to network congestion and server workload, thereby minimizing overall delay.
  
  Fig. 3 compares the average latency obtained by different task ofoading approaches.
  
  Figure 3. Comparison of average execution latency for different ofoading strategies.
  
  The proposed framework achieves the lowest execution latency compared with local execution, random ofoading, and cloud-only execution methods. This improvement is mainly due to intelligent edge resource selection and adap-tive decision-making capability of the RL agent.
  
  The cloud-only strategy experiences the highest latency because of long-distance communication overhead, whereas local execution suffers from limited device computational capability.
3. Energy Consumption Analysis
  
  Energy efciency is another important requirement in edge-enabled smart city environments. The proposed framework minimizes unnecessary transmissions and selects nearby edge resources to reduce communication energy consumption.
  
  Fig. 4 presents the average energy consumption under different task execution strategies.
  
  Figure 4. Average energy consumption comparison among different meth-ods.
  
  The RL-based framework emonstrates lower energy consumption compared with cloud-centric approaches. Since the RL agent learns efcient ofoading policies, the number of long-distance transmissions is reduced considerably.
4. Task Success Rate Analysis
  
  The task success rate indicates the reliability of the pro-posed edge computing framework under dynamic network conditions.
  
  The task success rate is computed as:
  
  Success Rate = Nsuccess × 100
  
  Ntotal
5. Edge Server Utilization Analysis
  
  Efcient utilization of edge resources is essential for maintaining balanced workload distribution across the net-work.
  
  Fig. 5 illustrates the edge server utilization for different ofoading strategies.
  
  Figure 5. Edge server utilization under different task allocation strategies.
  
  The proposed adaptive framework distributes computa-tional tasks more evenly among edge nodes, thereby pre-venting server overload and reducing queue congestion.
6. Impact of Number of IoT Devices
  
  To evaluate scalability, the number of IoT devices was varied from 50 to 200 devices. The corresponding latency performance is illustrated in Fig. 6.
  
  where:
  - Nsuccess represents successfully completed tasks,
  - Ntotal denotes total generated tasks.
  Table 2 summarizes the task completion performance of different methods.
  
  TABLE 2. Task Success Rate Comparison
  
  Method
  
  Success Rate (%)
  
  Local Execution
  
  81.4
  
  Random Ofoading
  
  85.7
  
  Cloud-Only Execution
  
  88.3
  
  Greedy Edge Selection
  
  91.6
  
  Proposed RL-Based Framework
  
  96.8
  
  The proposed framework achieves the highest task suc-cess rate because the RL agent continuously adapts to vary-ing queue lengths, bandwidth conditions, and computational loads.
  
  Figure 6. Latency variation with increasing number of IoT devices.
  
  As the number of devices increases, all methods ex-perience increased latency due to higher network trafc. However, the proposed RL-based framework maintains sig-nicantly lower latency compared with baseline methods because of its adaptive task scheduling capability.
7. Discussion
  
  The obtained results demonstrate that the proposed RL-based adaptive task ofoading framework effectively im-
  
  proves edge computing performance in smart city environ-ments.
  
  The major observations from the experimental analysis are summarized as follows:
  1. The DQN agent successfully learns optimal task ofoading policies through continuous interaction with the environment.
  2. Adaptive ofoading signicantly reduces execution latency compared with traditional static allocation methods.
  3. Intelligent edge selection minimizes communica-tion overhead and improves energy efciency.
  4. Dynamic workload balancing enhances edge server utilization and reduces queue congestion.
  5. The proposed framework maintains stable perfor-mance even under increasing IoT trafc conditions.
  The integration of reinforcement learning with edge computing provides a promising solution for next-generation smart city infrastructures requiring intelligent, scalable, and low-latency computation services.
Conclusion and Future Scope

This paper presented a Reinforcement Learning (RL)-based adaptive edge task ofoading framework for smart city applications. The proposed framework utilized a Deep Q-Network (DQN) agent to dynamically allocate computa-tional tasks among local devices, edge servers, and cloud infrastructure based on network conditions, server work-load, and bandwidth availability. Unlike conventional static ofoading approaches, the proposed method continuously learned optimal task allocation policies to minimize exe-cution latency, reduce energy consumption, and improve edge resource utilization. Experimental results demonstrated that the RL-based framework achieved lower latency, higher task success rate, and improved workload balancing com-pared with local execution, random ofoading, cloud-only execution, and greedy edge selection methods. The adap-tive learning capability of the RL agent enabled efcient handling of dynamic IoT trafc conditions in smart city environments. Furthermore, the Google Colab-based imple-mentation provided a scalable and cost-effective simulation platform for evaluating intelligent edge computing strate-gies. Future work may focus on advanced reinforcement learning models, mobility-aware edge scheduling, multi-agent coordination, and integration of security-aware mech-anisms for improving the reliability and scalability of next-generation edge computing systems.

References

D. Hortelano, I. de Miguel, R. J. Dura´n, J. C. Aguado, N. Merayo,

L. Ruiz, A. Asensio, X. Masip-Bruin, P. Ferna´ndez, R. M. Lorenzo, and E. Abril, A comprehensive survey on reinforcement-learning-based computation ofoading techniques in edge computing systems, Journal of Network and Computer Applications, vol. 216, p. 103669, 2023.
P. Peng, X. Liu, and Y. Zhang, A survey on computation ofoading in edge systems: From the perspective of deep reinforcement learning approaches, Computer Science Review, vol. 53, p. 100656, 2024.
M. Chen, Y. Hao, K. Hwang, L. Wang, and L. Wang, Disease predic-tion by machine learning over big data from healthcare communities, IEEE Access, vol. 5, pp. 88698879, 2017.
Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, A survey on mobile edge computing: The communication perspective, IEEE Communications Surveys & Tutorials, vol. 19, no. 4, pp. 23222358, 2017.
W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, Edge computing: Vision and challenges, IEEE Internet of Things Journal, vol. 3, no. 5, pp. 637646, 2016.
S. Wang, J. Xu, N. Zhang, Y. Liu, and X. Shen, Dynamic ofoading for mobile edge computing with energy harvesting devices, IEEE Journal on Selected Areas in Communications, vol. 34, no. 12, pp. 35903605, 2016.
Q. Pham, F. Fang, V. N. Ha, M. Le, Z. Ding, L. B. Le, W. J. Hwang, and J. S. Kim, A survey of multi-access edge computing in 5G and beyond: Fundamentals, technology integration, and state-of-the-art, IEEE Access, vol. 8, pp. 116974117017, 2020.
Z. Ning, P. Dong, X. Wang, J. Rodrigues, and F. Xia, Deep reinforce-ment learning for vehicular edge computing: An intelligent ofoading system, ACM Transactions on Internet Technology, vol. 19, no. 2,

pp. 124, 2019.
J. Wang, J. Hu, G. Min, A. Y. Zomaya, and N. Georgalas, Fast adap-tive task ofoading in edge computing based on meta reinforcement learning, IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 1, pp. 242253, 2022.
S. Sardellitti, G. Scutari, and S. Barbarossa, Joint optimization of radio and computational resources for multicell mobile-edge comput-ing, IEEE Transactions on Signal and Information Processing over Networks, vol. 1, no. 2, pp. 89103, 2015.
Y. Sun, S. Zhou, and J. Xu, EMM: Energy-aware mobility man-agement for mobile edge computing in ultra dense networks, IEEE Journal on Selected Areas in Communications, vol. 35, no. 11, pp. 26372646, 2017.
M. Satyanarayanan, The emerence of edge computing, Computer, vol. 50, no. 1, pp. 3039, 2017.
X. Chen, Decentralized computation ofoading game for mobile cloud computing, IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 4, pp. 974983, 2015.
Z. Luo, Y. Wang, and H. Liu, Reinforcement learning-based com-putation ofoading in edge computing: Principles, methods, and challenges, Alexandria Engineering Journal, vol. 95, pp. 437460, 2024.
Q. Pham, L. B. Le, S. Chung, and W. Hwang, Mobile edge com-puting with wireless backhaul: Joint task ofoading and resource allocation, IEEE Access, vol. 7, pp. 1644416459, 2019.

Method	Success Rate (%)
Local Execution	81.4
Random Ofoading	85.7
Cloud-Only Execution	88.3
Greedy Edge Selection	91.6
Proposed RL-Based Framework	96.8

Reinforcement Learning-Based Adaptive Edge Task Ofoading for Smart City Applications

IoT Device Layer

Edge Computing Layer

Cloud Layer

Action Space. The action space corresponds to pos-sible ofoading destinations:

Reward Function. The reward function is designed to minimize latency and energy consumption while maxi-mizing successful task execution.

Local Execution Only: All tasks are processed locally without ofoading.

Random Ofoading: Tasks are randomly assigned to available edge servers.

Cloud-Only Execution: All tasks are transmitted to the centralized cloud server.

Greedy Edge Selection: Tasks are assigned to the nearest edge server with minimum queue length.

Average Latency. The average execution delay of all tasks is computed as:

Average Energy Consumption. The overall energy efciency is evaluated as:

Task Success Rate. The successful task completion ratio is calculated as: