Deep Reinforcement Learning Driven Energy Harvesting and Power Allocation for Sustainable Wireless Body Area Networks

Namirrah Fathima A; Dr. Kannan G

doi:10.17577/IJERTCONV14IS060165

ACSCON - 2026 (Volume 14 - Issue 06)

Deep Reinforcement Learning Driven Energy Harvesting and Power Allocation for Sustainable Wireless Body Area Networks

DOI : 10.17577/IJERTCONV14IS060165

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 0
Authors : Namirrah Fathima A, Dr. Kannan G
Paper ID : IJERTCONV14IS060165
Volume & Issue : Volume 14, Issue 06, ACSCON – 2026
Published (First Online) : 16-06-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Deep Reinforcement Learning Driven Energy Harvesting and Power Allocation for Sustainable Wireless Body Area Networks

Namirrah Fathima A Research Scholar

Department of Electronic and communication

B. S. Abdur Rahman Crescent Institute of Science and Technology

{email address: Fathima.namirrah00@gmail.com}

Supervisor : Dr. Kannan G Associate professor

Department of Electronic and communication

B. S. Abdur Rahman Crescent Institute of Science and Technology

{email address: kannan@crescent.education }

Abstract- WBANs (Wireless Body Area Networks) serve as essential tools for constantly monitoring health, but unreliable energy availability, unpredictable user movement, weak routing, and other security risks greatly limit the unrestricted and practical implementation of WBANs. A new form of DL-WBAN, based on using deep learning techniques for energy harvesting, routing, MAC and data security to ensure secure and energy-efficient operation has been developed. The data will initially be collected from the body using a collection of physiological sensors. The data collected will then be processed through a collection of physiological sensors. Then be processed through a CLSTM algorithm; the algorithm will provide previous sensor state information, current power consumption and provide the capability of adaptable power distribution. For postural movement, the Improved Mobility Aware Multi-criteria Decision Making (IMA-MCDM) dynamically selects stable next-hop forwarders. The Inspired Moth Flame Optimization (IMFO) then chooses relay nodes from multiple options based on their ability to reduce packet loss and latency. The Priority-Adaptive Fuzzy MAC (PA-MAC) and Markov Decision Process (MDP) dynamically assign time slots and power to each relay node, adapting to real-time conditions. The Hamilton Energy-Efficient Routing (HEER), using Linear Programming (LP), optimizes routing to minimize energy expenditure and delay when transmitting messages between relay nodes. The framework combines Dynamic Bio-key generation and Message Authentication Code (MAC) to securely transfer messages. Experimental results show the proposed WBAN framework achieves 25% higher throughput and 18% improved energy efficiency. This framework has shown an improvement over existing frameworks with respect to energy efficiency, communication reliability and data security for WBANs while considering the effects of body movement change.

Index terms- Bio-Sensor nodes, Cipher text, Energy consumption, Markov Decision Process (MDP), Security, and Wireless Body Area Networks (WBAN).

INTRODUCTION

Wireless Communication technologies have grown omnipresent in our social and personal lives, with the Internet of Things (IoT) playing an important role in this regard. In particular, well-being and healthcare are developing as important application domains that would benefit greatly from IoT improvements. WBANs, which are networks of sensors that gather data on human activities, play an important role in such IoT-based situations [1, 2]. One emerging networking and communication technology used for health monitoring is the WBAN. WBAN monitors and measures biometrics by integrating sensors with actuators on or within the human body. It transmits data wirelessly to a device outside the human body and facilitates data transfer between sensors and a collector using a short-range wireless communication protocol [3-5].

An efficient solution is provided by WBANs, which are made up of wearable devices that are a network created especially for deploying many biomedical sensor nodes (SNs) in, on, or around the human body. WBAN seeks to assess many physiological characteristics, such as temperature, oxygen saturation, and heart rate [6]. Together medical and non- medical applications, reporting services and cost-effective monitoring provided by the WBSN, thereby improving quality of life. The WBAN centralizes patient data for medical diagnosis. Data must reach the end station safely and reliably [7]. The main characteristic of the WBAN is that the signal route will regularly vary due to random changes in human position, necessitating significant adaptive capabilities from the WBAN system [8]. The WBANs power management, creating various sensors with low energy consumption, and fine- tuning network settings for optimal power management are additional significant challenges [9]. Energy efficiency is one

of the main issues, though, particularly for implanted sensors that need to operate for extended periods of time without requiring a battery change [10].

Consequently, the WBAN network becomes unreliable and may result in life-threatening situations if the biomedical devices are unable to function as needed because their batteries are completely depleted [11]. Another important area of interest in WBAN is energy harvesting from human activity and the surrounding environment, for which researchers have offered a variety of methods. WBANs are implemented using fundamental methods called Medium Access Control (MAC) protocols. Taking into account particular network needs, these protocols are crucial for coordinating network node broadcasts [12]. Quality of service (QoS) factors, such as low latency, high throughput, and low communication overhead, are important for time-critical applications, such as healthcare data about patients. Due to their mobility, WBAN users may experience disconnections that can affect the successful delivery of processed health data [13].

This research suggests an intelligent design of WBAN systems to increase performance in dynamic environments by improving delivery of data and ensuring reliability through more efficient use of energy.
1. Motivation & Objectives
  
  WBANs have to cope with many challenges that include limited battery power, patient movement, continually changing channels, and security problems, resulting in unreliable data transmission and lower system life. Some of these challenges include relay selection, energy management, routing consistency, and securely sending data, all of which can heavily influence the following:
  - Lack of Prior information: In the existing approach, they did not consider the past states that limit predictive accuracy and adaptability in WBAN.
  - Neglects Sensor Node Mobility: Current studies do not take into account sensor node movement as a result of movements in posture which can change how well a node is able to choose its best forwarder thus hindering a successful transmission of information.
  - Inefficient Relay node selection: Earlier research has been unable to determine what the best amount of relay nodes would be and this creates laborious methods of determining the number of relay nodes available further harming overall performance of the network as well as the total amount of energy used for relaying information through the network.
  - Unpredictable node traffic management: Instability in node traffic leads to ineffectiveness in static channel and time frame allocation impacts in suboptimal resource utilization.
  - High Propagation Delay: In the previous research, the multi-hop communication experiences a
    
    significant propagation delay, minimizing the effectiveness of real-time data transmission.
  - Energy constraints: One reason that WBAN has inherent problems is due to its energy capacity or lack of it, as well as the volatility of its data transfer; both of these issues contribute to an unstabl network.
  - Encoded Key Vulnerabilities: Due to the short time frame that an encoded key is used; it has the potential to be breached and its confidentiality or integrity will be compromised if it is compromised. For this reason, there are concerns about the confidentiality and integrity of the data that is sent due to the use of encoded keys.
    
    The proposed solution to address the issues arising from mobility of the body being inconsistent with traffic, will also increase the performance, reliability and security of data transported over WBAN as a result of integrating an overall architecture within a system. The proposed objectives are as follows:
  - To develop historical-aware methods to integrate the prior state of the sensor node for enhanced decision- making.
  - To develop a mobility-aware scheme to become accustomed to postural changes and enhance forwarder selection in WBANs.
  - To propose an effective approach to regulate the optimal number of relay nodes with minimized computation time.
  - To develop a dynamic traffic-aware approach to improve efficiency in static and time frame allocation.
  - To create strategies for reducing propagation delays for multi-hop communication.
  - In WBAN environments to enhance the energy efficacy and definite stable data transmission.
  - Against potential threats, to improve the key generation mechanism to enlarge key lifespan and strengthen the security.
2. Research Contribution
  
  Here, the suggesting a framework for the energy harvesting and power allocation that is governed by deep reinforcement learning with the ultimate goal of maximizing the sustainability of WBAN. Throughout this research, the aims are to maximize energy efficiency and reliability of data transfer, while also providing dynamic resource allocation in response to external conditions such as node uncertainties and limited energy available to nodes. The contributions of this work are as follows:
  - The data are collected from the biosensor in the WBAN.
  - CLSTM model implementation to dynamically predict the prior state and compute power for energy harvesting with minimum energy depletion in the network.
  - IMA-MCDM method for dynamic selection of best data forwarding paths based on posture changes, signal strength, and residual energy.
  - The IMFO algorithm is employed to dynamically classify and select the best relay nodes in order to minimize packet loss and retransmission latency.
  - The PA-MAC protocol, in combination with an MDP, is used for time slot optimization with respect to traffic priority, energy consumption, and throughput.
  - The H-E-E-R Protocol was developed along with an LP model directed toward minimizing energy consumption and multi-hop propagation delay.
  - Applying the Bio-Key with Message Authentication Code ensures protection against cyber threats for the preservation and tamper-proof transmission of patient data.
3. Paper Organization
There are several sections in this paper: Section II reviews earlier publications and identifies research gaps, while Section III discusses the limits of present methodologies. Section IV presents the mathematical formulation of the system model. The approach of the suggested model, including diagrams and mathematical representations, is covered in full in Section V. In Section VI, the experimental findings and a comparison of the suggested and current methods are presented. Section VII concludes the report and outlines future work.

LITERATURE SURVEY

In this section, we examine previously done research into WBANs including energy efficient communications, mobility aware routing, relay selection, MAC scheduling, and security for healthcare monitoring and previously explored to increase the networks life and reliability; however there has been comparatively little research related to the combination of these four areas such as, mobility, adaptive MACs, energy harvesting and secure data transmission which creates the motivation of this work.

In this research [14], to address the sample rate determination problem, a "deep deterministic policy gradient algorithm (DDPG)" is offered, which first defines the problem as a "Markov decision process (MDP)". This is called "deep reinforcement learning-based dynamic sampling" (DRDS). In the proposed approach, the governing parameter to determine the sample rate considers energy with information variability factors. The "long short-term memory" (LSTM) handles the forecasting of the data change rate in the next round with respect to the energy factor. To appropriately implement the complete DRDS in a centralized manner means transmitting every sensor information to the server, which raises certain

security concerns eventually; future work may go towards distributed approaches like federated reinforcement learning in WBAN for personal and secure applications.

In this article [15], they look at the problem of joint optimization of computation offloading and resource-allocation JCORA in mobile edge computing MEC for a health-care service provision context. They define this JACORA as an MDP and suggest a "deep deterministic policy gradient-based WBAN offloading strategy (DDPG-WOS)" that is intended to improve time delay and energy usage in interfering transferring channels. This strategy for improving the transmission capacity and reducing the single WBANs computational pressure relies on MEC. DDPG-WOS also proposes an improvement of the offloading strategy-making process by taking into account the channel state, transmission-quality compute capacity, and energy consumed. Future research will focus on addressing security and privacy issues in WBANs for the benefit of patient-centered health care. Improving data protection and access control would go a long way toward ensuring the reliability and security of medical services provided.

The relay selection scheme discussed here takes into consideration both the energy harvesting effectiveness and the demand. Each sensor decides on its capability to work as a potential relay node depending on the energy harvesting efficiency it experiences, and computes the power threshold for transmitting its data. Based on the instant network status, the coordinator optimizes the cooperative transmission strategies. From testing, the cooperative scheme has shown better performance over the classical single-hop scheme, concerning packet arrival and reception rates. In unsaturated networks, cooperation among the sensors may be further improved by enhancing the super frame structures to conserve energy and support efficient data transmission. Further, optimization of these processes may be explored using deep learning-based coordination techniques [16].

Authors in [17] developed a routing protocol named "Simple Energy Efficient and Bandwidth Aware Routing Protocol" for routing in WBANs. The major focus of their proposed scheme for reducing energy usage, maximizing network lifetime, and improving data routing dependability in WBANs is several network node metrics, like available bandwidth, energy harvesting, draining rate, energy, remaining energy, and the number of hops in route selection. SEBA also implements a special mechanism for the dynamic adjustment of the route according to energy usage. Such approaches minimize route failures, route discoveries, and maximize energy dissipation beteen all the sensor nodes. In future research, it will be important to overcome the limitation of considering only a limited number of parameters. Q-learning can thereby be incorporated to enhance Quality of Service and energy efficiency.

This research [18] project starts with improving energy efficiency in WBANs by using a novel weight-based algorithm called the "weight-based next neighbor selection algorithm (WBNN)". WBNN gives different weights to different routes, classifying routes based on the distance

between the sensor pairs. A large weight number is assigned to a large distance, while a low weight is assigned to a short distance. Afterward, the total weighted value of each route is calculated. During forwarding, the sensor nodes utilize paths that have the least weight, implying low energy consumption. Besides, other routing algorithms and clustering techniques that are utilized for data sharing at different levels of the networks and sensors will also be presented. Future works will try to overcome time delay and optimize the resources of WBANs using machine-learning algorithms.

This article proposes [19] the Energy Efficient Sustainable Network Using Network Optimization Technique (EES-NOT) to optimize the WBAN's energy efficiency. To improve the network's dependability and Quality of Service (QoS), EES-NOT is linked with the adaptive scheduling protocol for energy efficiency (ASPEE).A secured routing protocol for energy efficiency (SRPEE) also offers optimum route selection and lessens network congestion. The existing system does not effectively incorporate advanced technologies such as big data analytics, machine learning, deep learning, and blockchain. This deficiency restricts its interactivity and hampers overall network performance.

They suggest a WBAN system that is equipped with the QoS Effective Protocol (QoSEP) to efficiently route important packets to the target MS. To optimize the quality of the services provided, they employ Ant Colony Optimization for Critical Packet Routing to determine the shortest path with the lowest energy usage, minimal latency, and maximum throughput. Now, scheduling is carried out to increase the speed of emergency packet transmission while assuming precedence for the crucial packets. Lastly, they compare the suggested strategy to various current methodologies in order to assess its effectiveness. AI solutions for intrusion, adversary that is, tampering detection during the routing process can be connected to further enhance QoS-effective data transmission in WBAN [20].

This research [21] will present an effective cross-layer media access control technique for radio frequency-powered energy harvesting WBANs based on a revised structure of the superframe, giving scrupulous flexibility to the coordinator for rescheduling. The sensors used will harvest energy from radio frequency signals transmitted by the manager and adopt the time switching (TS) approach. The sensors will again be assigned to a sleep transmission power adjustment scheme, depending on the network environment and energy harvesting performance. Energy efficiency may be enhanced as it obtains more packets per joule. For improved performance in the network, the coordinator will determine the duration of the energy harvesting period to even out the channel resources and sensor energy requirements. In the future, work will focus on optimizing energy-efficient transmission by enabling the cooperation of sensors in the vicinity, where sensors close to each other act as relays to balance energy harvesting with that of network performance. This cooperative relay mechanism will allow the investigation of the sleep adjustment scheme.

The article proposes [22] the prioritization of traffic in WBAN systems by simultaneous load balancing for the service of general queues according to the IEEE 802.15.6 standard in their proposed method known as Traffic prioritized load balanced scheduling, TPLBS. In this work, the aim is to train the queues such that the packets' initial drop would improve WBAN throughput. Here, the approach is to take both priority and packet origin into account, ensuring no packet has to wait an undue long time in a specified queue before going to the access point or being forwarded. The simulation is carried out using Castalia to analyze and compare performance against competitors. The manoeuvres carried out assume that WBANs, personal devices (PDs), and access points (APs) are static, thus ignoring the effects of mobility. Future work should focus on transient link quality assessment alongside mobility impacts on WBAN systems' performance.

This research proposes [23] a new routing strategy, termed simple energy-aware and reliable routing (SEAR), which has the potential to transmit any data in a reliable manner over a WBAN. The incoming messages for the particular network are constructed based on the remaining energy (enduring vitality) of the sensor nodes, data importance, and hopcount from the sink node in order to adaptively select the appropriate forwarder node. In addition, the protocol describes a route reliability factor (RRF) to achieve the best transmission between any source sensor node (SN) and a sink node. RRF routes the data via paths, in the associated path having highest leftover energy with a minimum hop count. Therefore, SEAR will deliver 'single-hop and multi-hop' routing data transmissions in a more efficient manner, such that reliability is improved, which will minimize power usage by the SN, and make a significant contribution towards maximizing system lifetime. It is a daunting task to bring all of this into an integrated and secure framework for IoT, capable of facilitating adequate communication between WBAN and smart homes, while safeguarding the privacy and security of health information being transacted.

The next generation of the SCSS algorithm aims at an efficient way to identify the possible BSN candidates which can fill a spot on a slot under the superframe's Exclusive Access Period (EAP) slots. The selection being made for these BSNs incorporates MDP policy function and the runtime configurations. The approach uses DRL methodology to ensure maximum reward from the MDP processes. Overcoming the dead nodes situation will require the proposed algorithm to use energy harvesting strategies and sleep cycles. A theoretical analysis on throughput and loss performance will be made on the basis of "Two-state Markov Chain (TMC)" models. A fair proposal would be to combine reinforcement learning in future studies to optimize the proposed algorithm and examine scalability in complex WBANs by integrating advanced techniques such as machine learning or swarm intelligence for better performance [24].

In order to increase network performance, a prediction- based adaptive duty cycle (PADC) MAC protocol, known as PADC-MAC, has been suggested [25]. It uses a mathematical

formulation to include information about collected energy, both present and future. Additionally, a machine learning model that produces high prediction accuracy in dynamic harvesting scenarios, the nonlinear autoregressive (NAR) neural network, is used. Therefore, when there is an adequate influx of inward collecting energy, it allows the receiver node to function more violently. Future research should focus on extending PADC-MAC for multi-hop support in large-scale EH-WSNs and validating it using hardware testbeds for diverse data packet generation scenarios. Energy harvesting from the surrounding setting and DL forecast of upcoming energy levels are the two approaches used in this work [26] to overcome the problem of energy obtainability for cognitive radio devices. Investigating model performance across different frequency bands, like the WiFi band, is essential to assess adaptability and efficiency in diverse communication environments.

The author proposes [27] the "energy-efficient extensive game-theory-based algorihm" (ESEGA), a paradigm for controlling the nodes' transmission power according to the power of their neighbor sensor nodes and the co-channel interference from their coordinators. Reduction in total energy and Transmission Power Level is attained through optimization. Layer 2 coordinator operations are modeled with the new entrant incumbent game theory. The work reduces the Transmission Power Level of first-tier nodes based on the conditions of the second-tier coordinators. The major challenge is to minimize interference among multiple coordinators and efficient resource allocation in a WBAN with further deteriorated channel conditions by employing game theory techniques.

Authors in [28] suggest a technique called deep reinforcement learning-based dynamic sampling (DRDS), This addresses the sampling rate calculation first issue as a Markov decision process (MDP) and then suggests a deep deterministic policy gradient algorithm (DDPG) to resolve it. When calculating the sample rate, the suggested approach takes energy and data fluctuation into account. For the energy aspect, two factors are taken into account: the super-capacitor's voltage level and the quantity of surrounding light. Regarding the aspect of data variability, the long short-term memory (LSTM) algorithm is created to forecast the pace at which the data will change in the following cycle. Future research will examine the viability of employing distributed techniques, including federated reinforcement learning, in WBAN to meet security and privacy requirements. In this article [29], a Hybrid Clustering Approach for Extending WBAN Lifetime (HCEL), an energy-efficient routing system, is provided. HCEL uses a utility function to choose parent nodes according to received signal strength indicator (RSSI), proximity to the sink node, and residual energy (RE). A limited number of serving nodes and an energy threshold value are also included in the parent node selection procedure. Extending the lifespan of every node in the network is the primary objective. According to the proposed WBAN framework, node mobility will not affect the performance of the system during the mobility of the body, thus limiting the benefits provided by

WBAN technology when used for health care in an environment requiring a secure transmission of information. This study [30] proposes an Improved Quality of service aware Routing Protocol (IM-QRP) for WBAN-based HMS to remotely monitor elderly or chronically unwell patients in residential and hospital settings. Convolutional neural networks are utilized outside of WBAN environments to assess medical information for intelligent decision-making and diagnosis. The current proposed WBAN framework does not allow a full degree of interaction with its external environment and the other parts of the infrastructure that support the WBAN for the purposes of being able to do real-time monitoring and data management over a larger area because of its lack of an integrated IoT deployment.

A multi-channel MAC protocol with a Markov decision process is suggested in [31] in order to maximize the scarce energy resources and prolong the lifetime of devices inside the network. By utilizing distinct channels for communication between biomedical devices and access points (Aps), this protocol reduces latency and increases energy efficiency, throughput, and network longevity. Additionally, a Markov decision process is used to investigate the best communication techniques between biomedical devices and APS and to stochastically represent the systems transition phases. Additionally, a back-off technique, a time-slot allocation scheme, and an adaptive power allocation scheme are created to reduce delays, energy usage, and time-slot waste. Although there are currently no advanced resource optimization methods that have been incorporated into the WBAN system being proposed, these types of methods could improve overall WBAN system performance and system adaptability.

This article [32] presents a WBAN-IoT blockchain- assisted delay and energy-aware healthcare monitoring (B- DEAH) system. Dual sinks are used for both environmental and body sensors to transmit packets on a regular basis and in an emergency. This paper involves a number of steps; an expanded version of the PRESENT algorithm is suggested for patient key registration. Cluster creation and cluster head selection are implemented using the spotted hyena optimizer; the MOORA algorithm is then used to build cluster-based routing. The four Q curve asymmetric algorithm is used to deploy and authenticate the patient block agent (PBA) for data transmission. However, this work does not consider developing mobility-aware handoff protocols to support connection management for seamless movement when someone disconnects from one part of a moving body. Furthermore, no methods were developed to manage power to each body sensor using a duty cycling technique of MAC scheduling. A viable strategy is proposed to develop this study presents a framework for creating a sustainable health monitoring node [33]. A Health Monitoring Energy System (HeMeS) tool as a result, a prototype that incorporates environmental parameters, electronic load activity levels, and system size limitations is created using thorough analytical models and used to illustrate system design space exploration for different patient types. In order to illustrate system design

space exploration for different patient types, including environmental factors, electronic load activity levels, and system size constraints, the HeMeS tool prototype is developed using thorough analytical models. The current system has not examined the possible application of AI-based methods toward accomplishing dynamic energy balancing and maximizing system costs.

In this article [34], the authors discuss the architecture of a new cryptographic protocol that secures sensitive information transfer across a WBAN while guaranteeing three security services: integrity, secrecy, and authentication. In order to

determine whether the gadgets are on the same persons body, additionally, take into consideration a keyless sensor authentication mechanism. To ensure that the protocol is correctly constructed and offers adequate security, a formal study of the protocol is conducted using cryptographic protocol verification tools. Nonetheless, the proposed cryptographic protocol is analyzed strictly for its formal security and lacks an evaluation of the protocols performance in an actual WBAN environment. A summary of the previous work is provided in Table 1.

TABLE 1

Summary of existing works

References	Objectives	Algorithms or Methods used	Limitations
[14]	To resolve the dynamic sampling rate determination issue in WBANs while considering energy efficacy and data variability.	DRDS; DDPG formulated as MDP; LSTM.	It requires distributed or federated learning for privacy preservation.
[15]	For healthcare to optimize computation offloading and resource allocation in WBAN- assisted MEC.	DDPG-based WBAN Offloading strategy; MDP based JCORA optimization.	Its not address the security and privacy issues in WBAN-MEC integration.
[16]	Using cooperative communication with energy harvesting to enhance packet arrival and reception rates in WBAN.	Power threshold-based relay decision; energy-aware cooperative relay selection.	It further needs to improve the superframe structure and DL based coordination for unsaturated networks.
[17]	To extend network lifetime and minimize the energy consumption through effective routing in WBANs.	SEBA routing protocol; bandidth, energy harvesting, hop count metrics.	It considers routing parameters are limited and not exploring the Qos improvement using reinforcement learning.
[18]	In WBAN routing through distance cognizant neighbor selection to improve energy efficiency.	WBNN	It does not address the resource optimization and time delay.
[19]	To improve reliability, energy efficacy, and QoS in WBANs.	EES-NOT; ASPEE schedules and SRPEE routing.	Some advanced technologies are not integrated.
[20]	For critical data to offer QoS-efficient routing in healthcare applications of WBAN.	Ant Colony Optimization; QoSEP protocol.	Not considers adversarial attack mitigation using AI and intrusion detection.
[21]	Using MAC-layer optimization to enhance energy efficiency in RF-powered energy harvesting WBANs.	Cross-layer MAC protocol; revised superframe structure; time-switching energy harvesting.	Its not implementing the cooperative relay-based energy balancing.
[22]	Based on IEEE 802.15.6 to select traffic and balance load in WBANs.	TPLBS queue training and priority- based scheduling.	Accepts static WBAN topology and ignores the mobility and transient link quality.
[23]	In WBANs to attain stable and energy-aware routing.	SEAR; RRF	It remains the privacy preservation and integration with secure IoT frameworks are challenging.
[24]	For superframe slot allocation in WBANs to select effectively BSN candidates.	DRL-based SCSS algorithm; MDP policy function; Two-State Markov Chain analysis.	In the energy harvesting and sleep-cycle optimization requires the integration of RL.
[25]	Utilizing adaptive duty cycling to enhance network performance in energy-harvesting WBANs.	PADC-MAC; NAR neural network.	It lacks hardware test bed validation and multi-hop support.
[26]	In cognitive radio devices using prediction to overcome the energy availability issues.	Energy harvesting with deep learning- based energy forecasting.	Not evaluate the multiple frequency bands.
[27]	Consuming game theory to decrease transmission power and interference in WBANs.	ESEGA	In deteriorated channels, resource allocation and inter- coordinator interference are challenging.

[28]	Considering data variability and energy to dynamically adapt sampling rates in WBANs.	DRDS using DDPG and LSTM-based prediction.	This is not explores the distributed and privacy aware implementations.
[29]	Through energy-efficient routing to extend the lifetime in WBAN nodes.	HCEL, RSSI	It does not consider the node mobility, so it can affects the system performance during body movement.
[30]	For WBAN-based health monitoring systems, to enhance the QoS- aware routing.	IM-QRP, CNN	Real-time monitoring is limited.
[31]	To increase energy efficacy and prolong network lifetime in WBANs.	Multi-channel MAC protocol; MDP; adaptive power allocation	It does not integrate the advanced resource optimization techniques.
[32]	Utilize the block-chain assisted WBAN-IoT integration, to design a delay-and energy-aware healthcare monitoring system.	B-DEAH framework, PRESENT algorithm; MOORA algorithm	It does not implement the energy management of body sensors for MAC scheduling.
[33]	Utilizing optimized energy to develop the sustainable health monitoring node.	HeMeS tool, Analytical modeling, design space exploration.	It not explores the AI-based techniques for dynamic energy balancing.
[34]	In WBAN, to secure the sensitive communications with authentication and confidentiality.	Keyless sensor authentication, new cryptographic protocol	Only this protocol is estimated through formal security analysis.

PROBLEM STATEMENT

An overview and explanation of the explicit issue statements used in the most recent research are given in this section.

This study [35] entails a comprehensive and complicated pathway selection scheme which incorporates "Energy Enrichment Multi-Hop Routing (EEMR) protocol" and an original meta-heuristic approach towards network life enhancement optimization. The EEMR is a two-stage process. The first part, "Enhanced Flower Bee Optimization Algorithm (EFBOA)", is mainly aimed at network longevity enlargement for the WBAN via the creation of a cluster network. "Dynamic Local Hunting and Location Discarding (DLH-LD)" is the next step that picks the fastest route out of several options. The main limitation of this study is outlined below:-
- Their approach does not have a history since they consider only the present state of nodes, ignoring earlier states.
  
  Authors in [36], proposed Energy Aware Forwarder Selection Technique (EAFST), choose efficient forwarder nodes to increase the immovability period and network lifespan of WBAN. The sensors are arranged by EAFST in concentric circles with the coordinator at their center. The high-priority sensor does not relay data from other sensors; instead, it sends data straight to the coordinator. Nevertheless, EAFST uses residual energy, queue size, and route loss to choose an agent for low or medium-priority nodes that are in a lower ring. This considerably lowers data transmission latency and the energy load on high-priority nodes. To further reduce energy consumption, once the forwarder's leftover energy falls underneath the threshold value, the forwarder's duty is rotated between low and medium-priority nodes. Several issues identified in this research include:-
- Their scheme fails to account for sensor node mobility because of postural changes and impacts on forwarder selection.
- In their approach experiences high propagation delay due to multi-hop communication.
  
  This work [37] provides an adaptive cuckoo search-based method (ACSM) that places relay nodes using an adjustable step size proportionate to the fitness function and an efficient fitness function. Simulation results were used to compare the group of relay nodes produced by their suggested ACSM and other techniques. As demonstrated by the simulation results, the suggested method not only uses less energy than its alternatives, but it also equally allocates the load throughout the relay nodes. Outlined below is the major constraint encountered in this research:-
- In their research technique had challenges in identifying the optimal number of relay nodes is time consuming.
  
  In order to improve energy economy and provide real-time emergency data delivery, they suggest time-sharing multichannel MAC for WBAN in this research [38]. Designing an effective time-sharing multichannel MAC protocol for WBANs is the goal in order to achieve low interference and energy-efficient transmission. To enable them to wake up attheir designated times and send their data, each node has been statically assigned to the time slots and channels. Castalia is utilized to simulate the recommended method to examine and compare its performance with that of its alternatives. The essential drawback of this research is concise as follows:-
- In their research, they had a concern about unpredictable node traffic, which can lead to inefficiency in static channel and time frame allocation.
  
  They suggest [39] aiming at an energy-efficient and reliable routing protocol called EERR-RLFL. To EERR- RLFL, they develop a node rank division mechanism in considering heterogeneous nodes of WBANs. The sensor nodes are then arranged into ranks according to three classifications for effects on link quality. Next, they develop the algorithm for Link Quality Evaluation through Fuzzy Logic (FLLQE). It uses the fuzzy evaluation method of fuzziness for consideration of the full spectrum effects of several factors for estimating the superiority link of 2 nodes that will then deliver the foundation for route selection. During the procedure of data routing according to the above feature of FLLQE, they will utilize a hybrid data routing mode where the FN is determined
  
  In this term,E and V characterizes the wireless links and relay nodes. All bio-sensor nodes iEV continuously acquire physiological parameter readings and produce data packets Di(t) every time point (t), which they send through paths of multiple hops via relay assistance to a designated sink to conserve transmission energy and maximize the networks lifespan. A graph is used to illustrate the ever-changing connectivity between the nodes due to body positioning changes. Node is operating condition at time t is described as:-
  
  Si(t) = [Ei(t), Bi(t), Ai(t)] (2)
  
  Here, E (t)and ii (t) denotes residual energy and packet
  
  initially, then select the global optimized routing path through i i
  
  reinforcement learning. Some of the problems identified in this research are as follows:-
- A wireless body area network had challenges of limited energy availability and unreliable data transmission.
  
  In this transforming process, the unique key will be generated using the ECG values obtained from the MIT-BIH Arrhythmia database [40]. After creating four individual unique keys, they are utilized for encoded purposes. All the individuality and arbitrariness of the keys produced were shown by the test and the frequency validation within a block. The limitation of this research is described as follows:-
- In their research, the generated encoded keys have a limited lifespan, making them vulnerable to attacks if compromised.
Research solutions: The suggested WBAN architecture uses a CLSTM model to predict energy consumption and allocate energy dynamically, while an IMA-MCDM (AHP-SAW) scheme manages mobility-aware path selection for WBANs. The IMFO algorithm also aims to find relay nodes that will minimize the likelihood of packet loss. It is assumed that developing a PA-MAC with MDP will provide an efficient mechanism for dynamically allocating both time-slots and power. In addition, using LP, the HEER methodology can cut back on energy waste and service interruptions. Confidentiality in WBANs can also be assured through application of MeAC and Dynamic Bio-key.
SYSTEM MODEL

arrival rate. The buffer occupancy can be represented by Bi(t). The CLSTMs state input vector provides a mechanism for learning the temporal dependency of the systems energy consumption and traffic behavior. The three processes have the same decision logic by having the same logical image for power allocation, MAC scheduling and routing. The energy dynamic of the node i follows,

Ei(t + l) = Ei(t) – Pi(t)ri(t) + Hi(t) (3)

Hi(t), Pi(t), and ri(t) represent harvested energy, transmission power, and the MAC time slot respectively. The provided equation shows the relationship between energy consumed and energy stored. This method enables the balancing of energy resources, improving the efficiency of resource usage using both harvested energy to limit quick energy depletion and lengthen the nodes' lifetimes as well as using past energy usage to create the Pi(t) measure used in a CLSTM input. The post variation-induced movement is measured as follows,

8j(t) = /lrj(t) – rj(t – l)/l (4)

The vector rj(t) indicates the node's position. Overall stability of the link would be affected by the displacement caused by the movement of a mobile body; as the amount of displacement increases, there is also an increased chance of channel variation and packet loss. The IMA-MCDM process must employ a mobility-aware method to determine how to forward packets across an existing link. The stability factor for the posture of the node is defined as follows: –

< (t) = 1

(5)

The different elements of the proposed WBAN architecture are presented in this section using a system level, mathematical model and includes; data sensing, energy dynamics, mobility effects, relay selection, routing, MAC schedule and security. Below is a graph representing a WBAN.

= (V, E) (1)

j 1+8j(t)

kj n

The immediate quality of the connection between node k and candidate node j or 1/kj(t) is illustrated above at time t. This parameter is defined as the ratio of the power of the received signal Prx(t), to the variance of the noise power, a2,

which serves to characterize the wireless links SNR.

1/kj

Prx(t)

a-Z

kj

(t) =

n

(6)

equation confirms that the relay nodes with low delay, greater reliability, and adequate energy are preferred. Assigning time slots in the PA-MAC protocol is computed as:

Link reliability is measured by the signal-to-noise ratio, and it is an important component for both mobility-aware forwarding and relay selection. Links that have higher signal- to-noise ratios allow for fewer retransmissions and decreased energy consumption.

rem

Fj(t) = F(ryi(t), f]i(t), ai(t)) (11) Where the fuzzy output score represents Fj(t) and F(·)

refers to the fuzzy inference engine merging the three input ratios. Simultaneously estimates the multiple conditions for this

E (t) = Ej (t)

(7)

process. For every node, a scheduling preference level is

E

j max

j

Here, Ej(t) denotes the energy sustainability index of the sensor node, and it indicates the relative node remaining energy

level. Erem(t)and Emax It refers to residual energy and the

generated. The HEER routing protocol computes routing cost as follows:

ij

Cp = I,(i.j)Ep(eu1Etx + eu2Dij) (12)

j j

maximum energy capacity of the sensor node. This confirms

following decision-making is penalized for the inadequate residual energy nodes.

Here, the total routing cost of the path p is represented by Cp, where Dij denotes the propagation and queuing delay

0 (t) = I,3 V · z

(t)

(8)

on the link, and Etx denotes energy consumed for transmission

ij

from node i to node j. Generating the ciphertext at the node i

m=1 k

j,m

is:

In this equation, ej(t) denotes the overall mobility-aware score of the candidate node j at time instant t and m represents the index of the decision criterion. The notations Vm and zj,m(t) are represents AHP-derived weight of the mth decision criterion and n the normalized value of the mth metric for the node j at time t. In a single decision value, this weighted score incorporates the link quality, posture stability, and energy sustainability.

jt = argmax(0j(t)) (9)

j

Where jt does the selected optimal next-hop sensor node. The notation argmaxj(. ) represents an operation that returns the node index j yielding the maximum value. Upon posture changes, this selection is consistently updated and confirms reliable data forwarding with energy depletion and minimal packet loss.

Fr = a11/kj (t) + a2Ej(t) + a3<j(t) (10)

Here 1/kj(t), <j(t) and, Ej(t) are indicates the link quality, posture stability, and normalized residual energy. The weighting coefficients are referred to as a1, a2 and a3. This

Ci(t) = Eki(t)(Di(t)) (13)

Where Ci(t) symbolizes the ciphertext, Di(t) denotes the original physiological data packet collected by the sensor node. Utilizing the current Bio-Key Ki(t) encryption function Eki(t) (·

). During wireless transmission, this confirms the confidentiality of patient data. If packets are interrupted, having no static key prevents data from being recovered and evaluated.
PROPOSED METHODOLOGY

The research aim is to develop a deep reinforcement learning driven energy harvesting and power allocation framework to optimize the sustainability of wireless body area networks. The steps given below are the key steps taken into account. The overall architecture can be shown in Fig.1.

Data collection
Priority based Sensor Nodes
Mobility of sensor nodes on postural changes
Classifying the optimal Relay nodes
Dynamic power allocation
Energy harvesting based routing in WBAN
Security

Fig.1 Overall System Architecture

Data Collection

Firstly, there is a WBAN scenario which consists of sixteen bio-sensor nodes, placed in various locations on the patient's body, with the aim of monitoring vital signs. The body sensors monitor physiological signals measured as heartbeats, blood pressure, and oxygen levels in their network architecture, composed of a single Sink node, fixedly positioned and aggregating data from the sensors. Direct communication between sensor nodes and the sink would not be efficient, given the nature of wireless transmission, because of energy constraints. This is because we now introduce relay nodes and their candidate placement for fifteen to twenty across the body. These relay nodes help in the forwarding of data and aggregating results from individual sensors, which would have direct energy expenditure, while ensuring that communication still occurs at a low energy cost to each individual sensor. Fig.2 represents the Data collection.

Fig.2 Data collection
Priority of sensor nodes:-

We apply the Convolutional Long Short-Term Memory (CLSTM) model to optimally harvest and allocate powers inside the WBANs. In this paradigm of deep learning, historical states of sensor nodes are integrated at one specific model run, thus predicting and allocating power dynamically. The learning from previous energy usages results in devices with excessive energy consumption receiving the required amount of charge while avoiding the excessive drain of networks powered from batteries.

At the instant given time represented by t, i denotes the state vector ofthe operational condition of the sensor node.

si(t) = [Ei(t), Ri(t), Qi(t), Si(t)] (14)

In the above equation, the residual energy of nthe ode i is represented as Ei(t), Ri(t) and Qi(t) where symbolizes the data generation rate and buffer queue length. The received signal strength are represent by the Si(t). The communication status and energy of each node can be demonstrated by these parameters.

To capture the temporal behavior, historical states from the earlier T time slots are collected and organized as:

X(t) = {si(t – T + l),.., si(t)} (15)

To learn energy usage trends and communication patterns over

Ei(t + l) = Ei(t) – Econs(t) + Eharv(t) (22)

i i

time is permitted by the historical matrix. Among sensor nodes

to extract spatial correlations, the historical input X(t) is processed by the convolutional layer.

F(t) = a(X(t) * W + b) (16)

Where the convolution kernel that captures inter-node relationships is characterized W. This a(. ),b is the nonlinear activation function and bias term. This process classifies spatial dependencies such as correlated energy depletion through nearby sensors. F(t) is extracted features. This passes into the LSTM unit to model temporal dependencies. The internal memory is used by the updates of LSTM.

Ct = ft O Ct-1 + it O C-t (17)

The cell memory symbolizes as the Ct, which can be stored the long-term energy patterns, how much past and new information are collected and added controlling by the ftand it. C-t is the candidate energy trends learned from current inputs. The notation O denotes the element-wise multiplication. In the

hidden state, the LSTM output:

ht = ot O tanh (Ct) (18)

Here ot controls the flow of the output and ht is a sensor over time of learned energy behavior. Utilizing the hidden state ht, the CLSTM forecasts each sensor's residual energy in the next time slot:

In this equation, the harvested energy from ambient sources is represent by Eharv. This guarantees sustainable sensor operation without significant battery depletion.

i
Mobility of Sensor nodes under postural changes

One of the major challenges during WBAN deployment for patients is the effect of posture changes on the placement of the sensor nodes while the patient is moving about a hospital or home. The next important component to include in their mobility dynamics is the Improved Mobility Aware Multi- Criteria Decision Making (IMA-MCDM) method. This will use the Analytic Hierarchy Process and Simple Additive Weight (AHP-SAW) techniques to select the optimal next hop dynamically. Evaluating multiple factors, signal strength, link stability, and residual energy, assures reliable data forwarding during continuous body movement. The algorithm of IMA-MCDM-Based Mobility-Aware Next-Hop Selection can be represented in Algorithm 1.
1. Posture-Induced Mobility Modeling
  
  The relative position of sensor nodes can differ over time due to the patient's movements, such as sitting, walking, or lying down. A candidate forwarding node js mobility status at time instant r is represented as follows:
  
  Pj(r) = [xj(r), yj(r), zj(r)] (23)
  
  Ei(t + l) = f(ht) (19)
  
  In this equation, the predicted energy level of the sensor i is represented as Ei(t + l) , and the learned nonlinear mapping function of the CLSTM is symbolized by f(. ). During transmission, the real energy consumption of a sensor is figured as:
  
  i
  
  Econs(t) = Pi(t) · l::t (20) The transmission power and duration are symbolized
  
  Pi(t)and l::t. Sensors naturally consume extra energy with greater data transmission or low channel conditions.
  
  The transmission power for the following slot is disseminated as follows based on the estimated energy Ei(t + l),
  
  max
  
  Palloc(t) = min P , Ei(t+1)-Eth (21)
  
  In the above equation three-dimensional spatial position vector can be symbolized as Pj(r) and xj, yj, zj are symbolizes the spatial coordinates of the node j on the body. This spatial representation permits node displacement tracking across consecutive time slots.
  
  Here, the j node of physical displacement can be measured by 8j(t), where increased mobility and possible link stability are indicated by greater displacement formulated by En (4). A posture stability coefficient is defined in Eqn (5) in order to normalize the effect for routing decisions:
  
  Thus, the more a sensor node moves around because the patients body posture is changing, the lower that sensor nodes stability values will be. In contrast, those sensor nodes that move minimally will have higher <j(t) values, indicating they are situated in a stable area of the patients body. The coefficient
  
  < (t) offers normalizing effects of mobility on the capability to
  
  i l::t
  
  To prevent node failure, the maximum allowable power and minimum energy threshold characterizes the Pmax and Eth. Significant energy allocation is given to sensors that are anticipated to sustain higher energy depletion.
  
  At each sensor updated energy level is estimated as:
  
  j
  
  transmit data between sensor nodes that have maintained
  
  reliable communication paths during the patients movement.
2. Mobility-Aware link and Energy Evaluation
  
  In Eqn (6), the higher the value of 1/kj(t)is, the more robust and more reliable the wireless link. The lower the values of
  
  1/kj(t), the less reliable the wireless link will be due to
  
  4) SAW-Based Dynamic Next-Hop Selection
  
  To enable fair accumulation each mobility-related metric is stabilized:
  
  disturbances caused by body shadowing, attenuation of the wireless channel, or posture changes of the user. The link
  
  zj,m
  
  (t) = Zj,m(t)-min (Zm) max(Zm)-min(Zm)
  
  (28)
  
  robustness is used as a way to evaluate the best hop for next- hop selection while the node user is moving.
  
  During the duration of connectivity, the posture can be affected, and link sustainability is evaluated as:
  
  Tstable(t)
  
  Where zj,m(t) does the mth metric of the node j. Utilizing simple additive weighting, the overall mobility-aware forwarding score is then computed in Eqn (8). Next, select the optimal next- hop node is selected defined in Eqn (9). Postural changes on mobility sensor nodes can be shown in Fig.3.
  
  K (t) = j
  
  (24)
  
  Algorithm 1: IMA-MCDM-Based Mobility-Aware Next-
  
  j Twindow
  
  Hop Selection
  
  In this term, the temporal reliability of the wireless link is represented as Kj(t). Computed as the ratio of the cumulative duration for which the link remains connected, symbolized by
  
  Tstable(t), and Twindowis denotes the total observation of the
  
  Input: Set of candidate nodes J, Position vectors rj(t),
  
  kj j
  
  Received power Prx(t), Residual energy Erem(t), Observation
  
  window Twindow
  
  j j
  
  interval. The duration of the links stability during the observation window is captured by this ratio. Concurrently, the nodes energy capability is described in Eqn (7). .
3. AHP-Based Priority Weight Computation
  
  The analytic hierarchy process is applied to conclude the relative influence of link quality, energy, and mobility. The creation of Q a pairwise comparison matrix is as follows:
  1. Initialize decision criteria for all j E J
  2. For each candidate node j E J do
  3. Compute displacements 8j(t) = /lrj(t) – rj(t – l) /l
  4. Compute posture stability coefficient <j(t) =
    
    l q12
    
    q13
    
    l/(l + 8j(t))
    
    Q = q21 l q23 (25)
  5. Compute link quality 1/kj(t) = Prx(t)/a2
    
    q31
    
    q32 l
    
    kj n
    
    j
    
    The significance of the criterion m relative to n can be expressed by each element qmn. The stabilized priority vector
  6. Compute link stability ratio Kj(t) = Tstable(t)/Twindow
    
    is achieved as follows:
    
    v = [V
    
    ,V ,V ] (26)
  7. Compute energy sustainability index Ej(t) =
    
    Erem(t)/Emax
    
    <P 1/ E j j
    
    AHP-derived weight vector of decision criteria is referred to as V. The term V<P, V1/, and VE denotes the weight of posture stability, link quality, and energy sustainability criteria. The
  8. End for
  9. Construct AHP pairwise comparison matrix Q
    
    impact of energy availability, signal quality, and posture
  10. Derive priority weight vector v = [V
    
    ,V ,V ]
    stability is quantified by these weights. The decision logic stability is shown by:
  11. Compute consistency ratio CR
    
    <P 1/ E
    
    CR = Amax-3
    
    2 · R
    
    (27)
  12. If
    
    CR .l
    
    then revise comparison matrix
    
    CR it refers to the consistency ratio used to validate the dependability of AHP judgments. The maximum eigenvalue of the AHP pairwise comparison matrix is represented by the iimax. The 3 and is denotes the number of decision criteria considered in the AHP model, and the normalization factor is equal. Random index RI establishes the average constancy of randomly generated matrices. Before aggregation, the stability ratio below the threshold ensures dependable weighting.
  13. Normalize decision metrics zj,m(t) for all j and criteria m
  14. For each candidate node j E J do
  15. Compute SAW score
  16. End for
    
    3
    
    0j(t) = L Vm · zj,m (t)
    
    m=1
  17. Select optimal next-hop node
    
    jt = arg max 0j(t)
    
    j
  18. Return jt
  Fig.3 Postural changes on mobility sensor nodes
Classifying the Optimal Relay Nodes

y = [Y(1), Y(2), Y(3)], r = l,2,3 . , R (29)

r r r r

Further, we use Inspired Moth Flame Optimization (IMFO) for optimal selection of relay nodes to achieve efficiency in data transmission. It dynamically identifies the best relay nodes so that the loss of packets can be considered as very low with a promising low latency in the system. Thus, it reduces almost all unnecessary retransmissions, thereby improving the quality of the network and its life in WBAN devices.
1. Relay Candidate Initialization and Fitness Modeling
  
  Relay candidates found after identifying stable forwarding paths using posture-aware mobility optimization are collected to be optimized further. The total potential relay nodes are represented by R, with each potential relay node indicated as follows:
  
  In this term,Y(2), Y(1) and Y(3) denotes the transmission latency factor, packet delivery capability, and residual energy level, where yr symbolizes the solution vector conforming to the relay node r. Together, these factors designate a nodes suitability for serving as a relay.
  
  r r r
  
  Utilize a multi-objective fitness function to estimate the fitness of each relay candidate described in Eqn (10).
2. IMFO Position Update and Flame Guidance
  
  The model for identifying candidate relay nodes is the Inspired Moth-Flame Optimization, in which the candidate relay nodes are represented by moths that are moving in a manner analogous to the migration of moths toward their optimal nodes; that is, their flames. The position of the flame
  
  k
  
  k is referred to as f(t), while the position of the moth r at
  
  r
  
  iteration t is denoted by y(t). To determine the new position of each moth, the logarithmic spiral function is used as follows:
  
  r r k
  
  y(t+1) = D · ebl · cos(2nl) + ft (30)
  
  r k r
  
  In this equation, D = lf(t) – y(t)l classify the distance among flame and the moth, the continuous definig spiral shape is represented as b, and l E [-l,l] is a random number. This term permits consideration and use by gradually guiding relay candidates toward optimal fitness regions.
  
  To confirm convergence, the quantity of flames decreased adaptively as:
  
  K(t) = [R – t · R-1 ] (31)
  
  Tmax
  
  according to traffic priority, average transmission duration, throughput, and energy consumption. With an incorporated Markov Decision Process (MDP), the system learns and adapts over time, adjusting time slot distribution and power control mechanisms for maximizing energy efficacy. In this way, PA-MAC resource adaptively allocates resources to the varying needs of the network while also reducing transmission delay and achieving better use of the channel.
  1. Modeling network state parameters
    
    Once the best relay nodes are found, the network will need to change its power and schedule decisions based on the situation at the current moment. In order to accomplish this, the PA-MAC protocol attempts to model the buffer, energy, and traffic conditions of each node in real-time to reflect communication needs.
    
    Erem(t)
    
    Here K(t) is the quantity of flames at iteration t, and the
    
    ryi(t) = i
    
    (34)
    
    Emax
    
    i
    
    maximum number of iterations is denoted by Tmax. This
    
    adaptable reduction avoids premature stagnation and enhances the convergence speed.
3. Optimal Relay Selection and Classification
  
  Where ryi(t) refers to the energy ratio of the node i at time
  
  t. Energy availability is normalized between 0 and 1 by this
  
  ratio. During scheduling, nodes with lower ryi(t) energy are treated as energy-critical.
  
  Relay nodes are ordered allowing for their concluding
  
  f] (t) = Qi(t)
  
  (35)
  
  Q
  
  fitness scores following updates. The best relay node is chosen as:
  
  r* = arg max(Fr) (32)
  
  i max
  
  i
  
  In this term, f]i(t) classifies the buffer ratio of the node j, Q (t)and Qmax the number of packets currently waiting in the
  
  r i i
  
  A
  
  The index of the relay node attaining the most fitness is denoted r*. This selection confirms that the preferred relay node offers balanced energy consumption, minimal latency, and
  
  buffer, and the maximum capacity of the buffer. These metrics
  
  show the level of the nodes congestion. Greater values specify
  
  urgency to transmit data to avoid packet loss.
  
  high packet delivery. By establishing a relay eligibility
  
  a (t) = Ai(t)
  
  (36)
  
  requirement, relay nodes are finally classified:
  
  J<opt = {rlFr 2 rF} (33)
  
  Here J<opt , rF denotes the optimal relay set and predefined fitness threshold. Nodes selected for forwarding data are those that meet the previous criteria. Nodes not meeting this requirement will not be used to forward data. As such, in WBAN systems using this procedure, it is expected to reduce the number of packet losses, extend network lifetime, and decrease the delay in WBAN systems.
Dynamic power Allocation

Dynamic power allocation and Medium Access Control (MAC) scheduling highly rely on efficient communication in a WBAN. The fuzzy logic-based dynamic time slot allocation

i max

i

i

This equation indicates the packet arrival ration denoted by ai(t), iii(t)and iimax are current packet arrival rate and the maximum supported arrival rate. Emergency traffic conditions and burstiness are identified by this ratio. In MAC scheduling, it is necessary to consider the traffic priority.
1. Fuzzy Logic-Based Time Slot Allocation
  
  In order to interpret the parameters of the network state into adaptive scheduling decisions, the PA-MAC protocol employs fuzzy logic. This gives the MAC layer the capability to deal with uncertainties and nonlinear relationships associated with the buffer, energy, and traffic states, which are established in Eqn (11).
  
  I,K µ T
  
  mechanisms devised for it consider all the important input
  
  ri(t) = k=l k k
  
  (37)
  
  variables-energy ratio, buffer ratio, and arriving packet ratio so that the dynamic time slots will ensure a decrease in wastage towards better network usage. Somewhat later, sort out the optimum results in communication efficiency. The Priority- Adaptive Fuzzy MAC (PA-MAC) protocol is suggested for that,t and the PA-MAC basically schedules the time slots
  
  K
  
  I,
  
  µk
  
  k=l
  
  In this term, ri(t) denotes the defuzzified time slot duration, rk and µk symbolizes the corresponding slot value, and membership degree of the kth fuzzy rule. Smooth slot adaptation is confirmed by the weighted average. Longer transmission slots are given to nodes with greater priority.
2. Priority-Adaptive Power Allocation
  
  When the time slots are assigned, the adjusted transmission power will depend on the amount of energy for efficient and reliable communication. PA-MAC will dynamically adjust the amount of power to the length of the specified time slot and also to the residual energy remaining in the battery.
  
  i i i i
  
  Ealloc(t) = ry (t) · r (t) · Pref (38) The energy allocated to the node i is represented by
  
  Ealloc(t), where Pref classifies the reference power level. The
  
  In this term, the instantaneous reward for the node i is represented by 9ti(t). Where Ti(t), Econs and, Di(t) are characterizes throughput, energy consumption, and delay. The balanced performance weight objectives are represented as eu1, eu2, eu3. Towards effectual communication guided by this reward.
  
  i
  
  5) Optimal Policy Learning and Scheduling Decision
  
  The goal of maximizing long-term cumulative rewards through maximizing the total accumulated reward allows MDPs to adapt over time as dynamics in networks change.
  
  i i
  
  amount of energy allocated to energy availability and the length of the time-slot supports efficient use of battery power for low energy, thus preventing the relative drain of energy from the
  
  V(si) = max [Ri(t) +y I,s, P(s1lsi, ai)V(s1)] (44)
  
  a i i i
  
  i
  
  battery to low-energy nodes.
  
  ( )
  
  Ealloc(t)
  
  The value function of a state si is denoted by V(si) and y
  
  is the discount factor adjusting future reward importance. The
  
  state transition probability refers to P(s1ls ,a ). The long-term
  
  Pi t
  
  = i
  
  (39)
  
  i i i
  
  Ti(t)
  
  Where, Pi(t) classify the actual transmission power of the node i. Slot duration can divide the allocated energy to compute the power. This sustains constancy between power control and scheduling. Power can automatically adapt to traffic demand.
  
  Pmin -5 Pi(t) -5 Pmax (40)
  
  advantages of acts are assessed by this equation.
  
  i
  
  a*(t) = arg max V(si) (45)
  
  ai
  
  i
  
  It a*(t) symbolizes optimal scheduling action. This is
  
  attained by using an Optimal Scheduling Policy (OSP), which optimizes both time slots and power control. PA-MAC has
  
  i i
  
  therefore completed the adaptive mechanism for the PA-MAC
  
  In this equation, the maximum and minimum admissible
  
  transmission powers are represented by Pmaxand Pmin. This
  
  dynamic adaptation process.
  
  i i
  
  control mechanism provides link reliability and prevents interference and hardware damage. It also prevents nodes from generating aggressive levels of energy depletion. Bounded power control allows for improved stability of a network.
3. Markov Decision Process-Based Adaptation
PA-MACs integration of a MDP will provide a means for long-term optimization. By using the MDP to interact with the network environment, PA-MAC will learn he optimal scheduling and power strategies.

si(t) = [ryi(t), f]i(t), ai(t)] (41)

The MDP state of the node i is represented as si(t). The state represents energy level, buffer occupancy, and traffic level. The compact representation of the states conditions allows adaptive state-based decision making.

ai(t) = [ri(t), Pi(t)] (42)

Selecting an appropriate transmission power and the length of the time slot selected represent the most basic forms of action

in an optimal system, as both parameters will have significant impacts on both overall throughput and total delay incurred by
Energy harvesting based routing in WBAN

This Hamilton Energy Efficiency Routing (HEER) Protocol provides a suitable solution against the energy constraints of WBAN devices. The delays incurred are considerably minimized as this protocol adopts an energy- available and quality-link-based route optimization, reducing multi-hop propagation delays. We also introduce a linear programming (LP) model, which would allow energy minimization while creating efficient and speedy bi-directional data transfer. Consequently, energy harvesting nodes functioning in an efficient manner will increase the operational life of the entire system. By combining HEER-LP is used to minimize the energy usage and delay in an efficient manner.
1. Energy-Aware and Link-Quality-Based Route Modeling
  
  Routing decisions must be made after dynamic power allocation as well as MAC scheduling, to ensure energy sustainability over time and maintain reliable communication. The HEER Routing Protocol considers a combination of harvested energy availability and link quality in routing decisions.
  
  Eharv(t)
  
  X (t) = j · L (t) (46)
  
  E
  
  the message transmitted. Long-term rewards are the basis for how the MDP evaluates actions taken by the decision maker.
  
  i
  
  9ti(t) = eu1Ti(t) – eu2Di(t) – eu3Econs(t) (43)
  
  ij rem ij
  
  j
  
  In this term, Xij(t) denotes the routing suitability metric from node i to node j at instant time t, where Lij(t) classifies the wireless link quality between nodes i and j. Next-hop
  
  nodes that sustain constant connectivity and gather enough energy are given priority by this metric.
2. Multi-Hop Delay and Energy Cost Evaluation
  
  To decrease end-to-end delay, HEER estimates the cumulative cost of multi-hop routes. The overall routing cost of
  
  combined. Algorithm 2 illustrates the algorithm of energy harvesting-based routing using HEER-LP.
  
  i
  
  Algorithm 2: Energy harvesting-based routing using HEER-LP Input: Set of WBAN nodes N, Residual energy Erem(t),
  
  Harvested energy rate ( ) ( )
  
  a path p is figured in Eqn (12).
  
  D
  
  = dij + Q
  
  (47)
  
  Demand Di(t)
  
  Hi t , Link quality 1/ij t , Traffic
  
  ij Rij j
  
  In this formula, dij, Rij and Qj symbolizes the data packet size, transmission rate, and queuing delay at the node j. Both buffering delays and transmission are encapsulated by this formulation. To avoid overfilled or slow relay nodes, it permits HEER.
3. Linear Programming-Based Energy Optimization
  1. Initialize network graph G(N, L)
  2. For each node i E N do
  3. Update available energy
    
    i i
    
    Etot(t) = Erem(t) + Hi(t)
  4. End for
    
    ij
    
    min I,(i,j) Etx · xij
    
    (48)
  5. For each communication link (i, j)E L do
  6. Compute link cost
    
    The binary decision variable indicating whether the link
    
    (i, j) is selected is represented as xij. Across the designated l l
    
    i
    
    route, the objective can reduce the total transmission energy and confirm energy-efficient data forwarding. Constraints on flow and energy harvesting:
  7. End for
    
    Cij(t) =
    
    1/
    
    ij
    
    (t) + Etot(t)
    
    I,j xij – I,k xki = bi
    
    (49)
  8. Formulate Linear Programming objective
    
    Where bi indicates the traffic flow balance at the node i. Effective source-to-sink routing is ensured by this restriction, and routing loops and packet loss are avoided.
    
    Econs(t) -5 Eharv(t) + Erem(t) (50)
  9. Subject to:
    
    min L Cij(t) · xij
    
    (i,j)EL
    
    i i i
    
    This restriction confirms that expended energy does not harvest energy and surpasses the available. It imposes maintainable routing decisions. If the energy is inadequate, the nodes are avoided.
4. Optimal Route selection and Enhance network lifetime
  
  Following the LP models solution, HEER determines the optimal path as:
  
  p* = arg min Cp (51)
  
  p
  
  Where, p*characterizes the optimal routing path with less energy-delay cost. This path confirms fast and energy-efficient data delivery. It reduces multi-hop propagation delay. The network lifetime is estimated as follows:
  
  E (t)+E (t)
  
  rem harv
  
  Tlife = min ( i i ) (52)
  1. flow conservation constraints
  2. energy availability constraints
    
    i
    
    L xij · Pij(t) -5 Etot(t)
    
    j
  3. delay bound constraints
    
    ij
  4. Solve LP to obtain optimal routing variables x*
  5. Extract routing path
    
    ij
    
    p* = (i, j)lx* = l
  6. Transmit data along p*
  7. Update residual energy
    
    i
    
    i
    
    i
    
    i Econs(t)
    
    rem i
    
    (t + l) = Etot
    
    (t) – Econs
    
    (t)
    
    E
    
    In this formulation, Tlife denotes the operational lifetime
  8. Return p*
  of the WBAN. It depends upon the rate of consumption, residual energy, and collected energy. Network lifetime is
  
  greatly increased when HEER and LP optimization are
Security

Security will be a lot more contested within WBANs with the sensitivity of biometric data passed through from the patients to the hospital servers. The encryption keys cannot deal with a solid drawback, which is: once possible, the thieves can analyze the encrypted traffic for an extended time to glean possible keys. There, in this context, we present a new Bio-Key Update scheme. In this respect, we introduce encrypting keys

After the Bio-key has been generated, the sensor data packets will be encrypted before they are sent to the relay nodes or sink. The encrypted data packet can be expressed in Eqn (13).

The following equation can be used to measure the resistance against key compromise:

at random intervals so that long-term hacks don't happen based on above-average traffic analysis. The other things also need to

Pcomp

= Tattack

IE[l::ti]
(55)

be realized, as in the schemes. We will have a Message Authentication Code (MAC) included, as well, for the integrity and authentication of the data. It thus acts almost as a gate preventing the data from being interfered with by unauthorized modification, along with carrying reduced levels of risk with key compromise. The bio-key update and MAC authentication have combined to create a secure means of protecting patient information from potential cyber threats. These processes will also facilitate safe and efficient transfer of WBAN data from WBAN systems to cloud storage and nalysis services.
1. Bio-key Generation and Dynamic Update Mechanism
  
  As the physiological data gathered in WBAN has a very sensitive nature, the use of static cryptographic keys is exposed to long-term traffic analysis attacks. To address this weakness, we are proposing a Bio-key update mechanism, which allows for encryption keys to be generated dynamically by making use of biometric features and updating those keys at randomly chosen intervals.
  
  Ki(t) = JC(Bi(t)ll1ri(t)llpi) (53)
  
  In this formulation, Ki(t) the encryption key is generated for the sensor node i at time t, where Bi(t) denotes the biometric feature vector that was taken from physiological
  
  signs like ECG or heart rate, and 1r (t), p indicates the time-
  
  In this equation, the probability of key compromise represented by Pcomp, E[l::ti], and Tattack are classifies interval of expected key update and attackers required analysis time.
  
  Due to the frequent and random refresh of keys, E[l::ti] stays tiny. It can force Pcomp towards an insignificant value; therefore, the entire system remains secure even through persistent attempts at eavesdropping.
  
  3) Message Authentication code for integrity and Authentication
  
  A message authentication code (MeAC) is integrated with encryption to protect data integrity and verify the authenticity of a message source. The MeACs tag for each packet is calculated as follows:
  
  MeACi(t) = JC(Ki(t) II Ci(t)) (56)
  
  Where MeACi(t) represents the authentication tag generated by the node. The authentication tag will be invalidated if there is any change to the ciphertext. Therefore, this mechanism prevents both tampering and impersonation attacks. Integrity verification at the receiver end is performed as follows:
  
  MeACi(t)? JC(K (t) II C (t)) (57)
  
  i i = i i
  
  dependent nonce, and node-specific salt. The hash function JC(·
  
  ) confirms randomness and inalterability, making key prediction unachievable. Attackers are unable to correlate long- term encrypted traffic in order to get cryptographic keys owing to its dynamic creation. The key update interval is randomized as follows to further improve robustness:
  
  l::ti~11(tmin, tmax) (54)
  
  The key refresh interval for the node i is denoted by l::ti, which is derived from a uniform distribution between tmin and tmax.
  
  Because of the random nature of the process, key updates
  
  will not follow any predictable patterns. Therefore, any amount of traffic monitoring will not be enough information for performing cryptanalysis. This decreases the chance of an attack being successful in the long term.
2. Data Encryption and Secure Transmission Process
  
  This allows you to check to see if the packet is the same packet that was sent. Acceptance will only be for packets with a valid authentication tag, and all other packets will be immediately discarded. This process functions as a security checkpoint for the WBAN data, preventing unauthorized changes to the data. PA-MAC has therefore completed the Adaptive Mechanism for PAC/A-MAC dynamic adaptation process. Fig.4 shows the process of security.
  
  START
  
  Biometric Data Collection

WBAN Sensor Nodes

Bio-key Generation

Encrypt Biometric Data

Yes

No

Is Bio-Key Update

Continue using

Current Key

Update Bio-Key

Generate Message Authentication Code

Encrypted Data with Selected Key

Secure Data transmission

Attach MeAC to encrypted data

Hospital Server

End

the best routing strategy for distributing each node's available power based on data collected by the nodes regarding their historic states. To compare the performance of the proposed framework against HCEL, SEBA, and other state-of-theart techniques all under the same situation environment, five performance metrics were recorded and analyzed. A summary of the simulation parameters and system configurations can be found in Table 2.

TABLE 2

System Specification

Software requirements	OS	Ubuntu 22.04
Software requirements	Development Tool	Ns-3.35 and python 3.10.12
Hardware requirements	RAM	8GB or above
	Processor	2.5 GHz or above
	Storage	10 GB or above

Fig.4 Process of the security

VI. EXPERIMENTAL RESULTS

This section discusses the experimental evaluation of the proposed deep reinforcement learning-based energy harvesting and power allocation approach to WBANs. The performance of the proposed methodology is evaluated via both qualitative and quantitative metrics to show its effectiveness across dynamic network conditions. The included methodologies within the framework are historical state learning, mobility-aware forwarder selection, optimal relay node categorization, adaptive power allocation, and energy- efficient routing. The proposed methodology was assessed under various traffic loads, node densities, and different mobility scenarios. The proposed WBAN framework should be used for long-term WBAN use, as the preliminary results show significant improvements in numerous metrics compared with the current approaches used in WBAN systems.

Simulation Setup

For the simulation of the performance of the proposed WBAN architecture, a realistic body sensor network model will be used, consisting of many biosensor nodes connected to relay nodes connected to a central sink node. The simulation design includes a person's posture as part of its effect on how dense a network of sensors will be. Along with the person's posture, the simulation also considers a variety of factors, including but not limited to: the amount of residual energy left in the buffers; the state variables of the buffers; the quality of the links between nodes (based on signal strength and coherence); how much energy each sensor node and relay node is able to harvest from external sources; and other real-life WBAN processes. Finally, the proposed deep reinforcement learning model will determine

To perform the experiments, the necessary hardware and software resources were developed to ensure successful execution of the WBAN framework. We developed the learning and optimization modules of our project software component using the NS-3.35 simulation environment under Ubuntu 22.04 with Python 3.10.12. The simulated WBAN needs a minimum of 8 GB RAM, 2.5 GHz processor, and 10 GB free disk space in order to carry out an accurate performance evaluation of the proposed WBAN framework under dynamic network conditions.

Comparative analysis

The proposed Framework uses Deep Reinforcement Learning to harvest energy and to distribute that energy to achieve an objective function. The Proposed Framework is compared against three of the most predominant techniques used today with respect to their performance based on throughput, end-to-end delay, energy consumption, packet delivery ratio, and route loss. The results from the assessments indicate that the Proposed Framework will reduce routing losses as compared to the existing techniques and, consequently, will extend the communication lifetime and reliability in WBANs.

Traffic bit rate (bits/sec) vs. end-to-end delay(s)

End-to-End delay is the total amount of time a data packet takes to get from the source sensor node tothe sink node. The lower the end-to-end delay is required for the WBAN application because they need to get the physiological data transmitted in a timely manner and in real time. The numerical outcomes of the Traffic bit rate (bits/sec) vs. end-to- end delay (s) are found in Table 3 and Fig.5.

10	60000	50000	28000
15	55000	40000	22000
20	50000	35000	20000
25	43000	30000	15000

TABLE 3

Traffic bit rate (bits/sec) vs. end-to-end delay(s)

Traffic bit rate (bits/sec)	HCEL	EAFST	Proposed method
0	0.30	0.25	0.20
2	0.24	0.20	0.15
4	0.22	0.18	0.11
6	0.18	0.14	0.08
8	0.13	0.05	0.02

Fig.5 Numerical outcomes of Traffic bit rate (bits/sec) vs. end-to-end delay(s)

The figure shows that the proposed method provides the minimum end-to-end delays of all the different traffic bit rates, with the proposed method providing delays ranging from 0.02 s to 0.20 s. The EAFST method provided moderate end-to-end delays of 0.05 s to 0.25 s. The HCEL method had the highest end-to-end delays of 0.13 s to 0.30 s. Consequently, the proposed method is able to effectively decrease transmission latency in increased traffic conditions.

Simulation time (s) vs. Energy consumption (J)

WBAN energy usage is the aggregate of all energy consumed by both relay and sensor nodes in the execution of activities and transmission of data over their duration. Lower overall energy used is preferred in supplying a WBAN because it is directly related to the continued operation and overall lifespan or longevity of the WBAN. Table 4 and Fig.6 illustrate numerical outcomes of Simulation time (s) vs. Energy consumption.

TABLE 4

Simulation time (s) vs. Energy Consumption (J)

Fig.6 Numerical outcomes of Simulation time (s) vs. Energy consumption (J)

The figure shows that when comparing both the EAFST method and SEBAs approach to the proposed method, the proposed method exhibits a reduced level of energy consumption throughout the entirety of the simulation period. The principal reason for this is that the proposed method effectively and efficiently manages the energy harvesting and power allocation processes, which help to reduce overall energy waste. In contrast, both the EAFST method and SEBAs methods supply more energy over time than do the methods proposed by the authors through their lesser adaptive control for power and less adaptive routing strategies; thus demonstrating that the proposed method exhibits superior energy efficiency characteristics than either the existing method or SEBA in any WBAN environment.
Number of nodes vs. throughput (kbps)

The throughput indicates the successful data delivery on the communication link and is measured in kilobits per second (kbps). A higher throughput results in greater efficiency of the network and the use of the bandwidth in WBAN.

TABLE 5

Number of Nodes

SEBA

EAFST

Proposed method

2

25

42

50

4

30

50

55

6

35

54

68

8

40

62

70

10

45

68

85

Number of nodes vs. throughput (kbps)

Simulation time (s)

SEBA

EAFST

Proposed method

0

75000

70000

48000

5

70000

62000

30000

Fig.7 Numerical outcomes of Number of nodes vs. throughput (kbps)

The results from the graph show that the proposed approach has a greater amount of throughput than the existing method and SEBA, and is continuing to improve; the throughput of the proposed method is 50 kbps – 85 kbps, while the EAFST method is from 42 kbps 68 kbps, and SEBA is 25 kbps 45 kbps. These results showed the proposed methods ability to maintain a higher level of data transfer when increasing the size of the network. The outcomes of the number of nodes vs. throughput can be shown in Table 5 and Fig.7.
Number of nodes vs. path loss

In WBANs, path loss quantifies the loss of signal strength between transmitter and receiver. A lower path loss indicates a better quality signal and lower power requirements for transmitting that signal. Table 6 and Fig.8 explore the numerical outcomes of the number of nodes vs. path loss.

TABLE 6

Number of nodes vs. path loss

Fig.8 Numerical outcomes of Number of nodes vs. path loss

As shown in the figure, the proposed method has the least path loss at all densities, 40dB 10dB, while the EAFST method has a range of 45dB 20dB for path loss, and the HCEL method has the highest path loss, 50dB 38dB. These results show that the proposed method will provide better communication reliability and energy efficiency in WBAN settings.
Traffic bit rate (bits/sec) vs. Packet delivery ratio (%)

In WBANs, the packet delivery ratio (PDR) represents the ratio of correctly delivered packets at the sink node and thus indicates the reliability and success rate of data transfer. A high PDR will ensure reliable communication and accurate monitoring of health. Table 7 and Fig.9 establish numerical outcomes of the traffic bit rate (bit/sec) and the packet delivery ratio (%).

TABLE 7

Traffic bit rate (bits/sec) vs. packet delivery ratio (%)

Number of Nodes

HCEL

EAFST

Proposed method

2

50

45

40

4

45

38

30

6

42

35

20

8

38

30

15

10

40

20

10

Traffic bit rate (bits/sec)

HCEL

EAFST

Proposed method

0

10

20

28

2

25

38

45

4

30

35

40

6

45

30

48

8

40

55

68

Fig.9 Numerical outcomes of Traffic bit rate (bits/sec) vs.

Packet delivery ratio (%)

The proposed technique demonstrates much better performance than other existing solutions over all tested traffic rates, with 68% of packets delivered versus 55% for the EAFST method and only 45% with the HCEL method. The proposed technique has been shown to be a better alternative than existing methods for WBAN applications, capable of supporting a greater number of active users while maintaining an acceptable level of packet transmission reliability.

Research Summary

In doing so, this research project starts by implementing a WBAN using 11 different types of wearable sensors positioned at strategic locations around the body, along with two gateways and a single centralized database server, as part of an overall strategy to facilitate accurate data collection and processing. Additionally, while the sensors are collecting data from the body, they are also collecting various physiological and environmental metrics in real-time to simulate accurate on- body communication behaviors. These metrics consist of Received Signal Strength Indicator (RSSI) metrics, and all of these metrics are stored in an RSSI dataset generated by the WBAN. Using this data, a CLSTM model is used to predict the state of the network before the actual transmission of data, allowing the best power to collect energy through harvesting from the body while also ensuring sufficient energy through different body postures and mobility scenarios. In this manner, the best paths to forward data are also selected from IMAA- MCDM methodology. The method for developing reliable and adaptable routes considers many factors: changes in stance, fluctuations in signal quality, as well as the remaining battery life of sensor nodes. To select optimal nodes for forwarding packets, an IMFO Algorithm allows effective exploration and

exploitation of the search space for solutions, thus reducing the number of lost packets and the time needed to resend packets. After that, an MDP combined with a PA-MAC Protocol dynamically allocates power to each transmitting node on the network. This protocol intelligently adjusts the priority of packet transmissions for each node based on current network state and urgency of transmitted data. Finally, Hamilton Energy Efficient Routing, in conjunction with an LP Model, minimizes the propagation time of multipoint packets and overall energy utilized by the network. To enhance the security of patient information, a Bio-Key in conjunction with a MAC method will be used to safeguard patient records from cybercriminals and tampering. Lastly, performance data regarding the throughput, end-to-end delay, route loss, energy consumption, and packet delivery ratio will be collected under a variety of network conditions to support the estimate of the performance of the system.

VII. CONCLUSION

In this work, I have developed and tested a smart architecture based on deep reinforcement learning for improving wireless body area networks (WBANs). In particular, we focus on solving numerous issues related to WBANs so that they are able to operate optimally under both dynamic movement of human bodies and varying levels of traffic; these issues include but are not limited to all aspects of WBAN operation (e.g., how to gather energy; how to allocate transmit power dynamically; how to route data based on device mobility; and how to securely and reliably transmit data). A combination of historical state-based learning, optimal selection of relays, and adaptive resource management is utilized to provide WBANs with better performance, energy efficiency, and reliability. A wide range of simulation tests have validated that the proposed technique greatly enhances important performance measures such as packet delivery ratio, end-to-end, path loss, energy consumption, and throughput as compared to other methods that include EAFST, SEBA, and HCEL. The improved performance demonstrates that the proposed framework is appropriate for long-term, dependable, energy-conserving applications of healthcare monitoring. Research is being completed to confirm and validate the use of the proposed framework with real-world hardware that utilizes wearable sensors. This research may include the integration of federated or distributed learning techniques to improve privacy and scale. In addition, it will incorporate edge-assisted intelligence for real-time medical decision making. Additionally, future research may also examine multi-patient WBAN scenarios as well as heterogeneous sensor deployments. In order to maximize the performance of ultra-low-power devices used in wearable requires a lot more research and development efforts.

References

Demrozi, F., Turetta, C., Kindt, P. H., Chiarani, F., Bacchin, R. A., Valè, N., … & Pravadelli, G. (2023). A low-cost wireless body area network for human activity recognition in healthy life and medical applications. IEEE Transactions on Emerging Topics in Computing, 11(4), 839-850.
Olatinwo, D. D., Abu-Mahfouz, A. M., Hancke, G. P., & Myburgh, H. C. (2023). Energy-efficient multichannel hybrid MAC protocol for IoT- enabled WBAN systems. IEEE Sensors Journal, 23(22), 27967-27983.
Mokhtar, B., Kandas, I., Gamal, M., Omran, N., Hassanin, A. H., & Shehata, N. (2023). Nano-enriched self-powered wireless body area network for sustainable health monitoring services. Sensors, 23(5), 2633.
Ahmad, N., Shahzad, B., Arif, M., Izdrui, D., Ungurean, I., & Geman, O. (2022). An energyefficient framework for WBAN in health care domain. Journal of Sensors, 2022(1), 5823461.
Selvaprabhu, P., Chinnadurai, S., Tamilarasan, I., Venkatesan, R., & Kumaravelu, V. B. (2022). Prioritybased resource allocation and energy harvesting for WBAN smart health. Wireless Communications and Mobile Computing, 2022(1), 8294149.
Mahapatra, R., Sethi, D., & Mishra, K. (2025). Enhancing Healthcare with WBAN and Digital Twins: A Machine Learning Approach for Predictive Health Monitoring. IEEE Access.
Memon, S., Wang, J., Ahmed, A., Rajab, A., Al Reshan, M. S., Shaikh, A., & Rajput, M. A. (2023). Enhanced probabilistic route stability (EPRS) protocol for healthcare applications of WBAN. IEEE Access, 11, 4466- 4477.
Pan, Y., Ren, Y., & Mu, J. (2024). Deep Incremental Learning-Driven Human Body Channel Prediction With Adaptive Relay Selection for Enhanced WBAN Performance. IEEE Access.
Preethichandra, D. M. G., Piyathilaka, L., Izhar, U., Samarasinghe, R., & De Silva, L. C. (2023). Wireless body area networks and their applicationsA review. Ieee Access, 11, 9202-9220.
Boumaiz, M., Ghazi, M. E., Bouayad, A., Balboul, Y., & El Bekkali, M. (2025). Energy-efficient strategies in wireless body area networks: A comprehensive survey. IoT, 6(3), 49.
Olatinwo, D. D., Abu-Mahfouz, A. M., Hancke, G. P., & Myburgh, H. C. (2023). Energy efficient priority-based hybrid MAC protocol for IoT- enabled WBAN systems. IEEE Sensors Journal, 23(12), 13524-13538.
Herculano, J., Pereira, W., Guimarães, M., Cotrim, R., de Sá, A., Assis, F., … & Gorender, S. (2024). MAC approaches to communication efficiency and reliability under dynamic network traffic in wireless body area networks: a review. Computing, 106(8), 2785-2809.
Arafat, M. Y., Pan, S., & Bak, E. (2024). An adaptive reinforcement learning-based mobility-aware routing for heterogeneous wireless body area networks. IEEE Sensors Journal.
Mohammadi, R., & Shirmohammadi, Z. (2024). Optimizing energy harvesting in wireless body area networks: A deep reinforcement learning approach to dynamic sampling. Alexandria Engineering Journal, 109, 157-175.
Chen, Y., Han, S., Chen, G., Yin, J., Wang, K. N., & Cao, J. (2023). A

deep reinforcement learning-based wireless body area network offloading optimization strategy for healthcare services. Health information science and systems, 11(1), 8.
Hu, J., Xu, G., Hu, L., & Li, S. (2023). A cooperative transmission scheme in radio frequency energy-harvesting WBANs. Sustainability, 15(10), 8367.
Abdullah, A. M. (2024). Energy-efficient aware and predicting bandwidth estimation routing protocol for hybrid communication in wireless body area networks. Cluster Computing, 27(4), 4187-4206.
Jing, Y., Peng, H., & Liu, Z. (2024). WBNN: a weight-based next neighbor selection algorithm for wireless body area network. Soft Computing, 28(2), 1803-1818.
Kamruzzaman, M. M., & Alruwaili, O. (2022). Energy efficient sustainable wireless body area network design using network optimization with smart grid and renewable energy systems. Energy Reports, 8, 3780-3788.
Dhanvijay, M. M., & Patil, S. C. (2022). Energy aware MAC protocol with mobility management in wireless body area network. Peer-to-peer Networking and Applications, 15(1), 426-443.
Hu, J., Xu, G., Hu, L., Li, S., & Xing, Y. (2022). An adaptive energy efficient MAC protocol for RF energy harvesting WBANs. IEEE Transactions on Communications, 71(1), 473-484.
Samal, T., & Kabat, M. R. (2022). A prioritized traffic scheduling with load balancing in wireless body area networks. Journal of King Saud university-computer and information sciences, 34(8), 5448-5455.
Abdullah, A. M. (2024). An efficient energy-aware and reliable routing protocol to enhance the performance of wireless body area networks. The Journal of Supercomputing, 80(10), 14773-14798.
Mekathoti, V. K., & Nithya, B. (2025). Superframe contention slot scheduling (SCSS): deep reinforcement learning-based time slot allocation for wireless body area network. Telecommunication Systems, 88(1), 35.
Sarang, S., Stojanovi, G. M., Drieberg, M., Stankovski, S., Bingi, K., & Jeoti, V. (2023). Machine learning prediction based adaptive duty cycle MAC protocol for solar energy harvesting wireless sensor networks. IEEe Access, 11, 17536-17554.
Umeonwuka, O. O., Adejumobi, B. S., & Shongwe, T. (2024). Deep learning-assisted energy prediction modeling for energy harvesting in wireless cognitive radio devices. IEEE Access, 12, 8700-8720.
Ayeesha Nasreen, M., & Ravindran, S. (2022). Energy saving mechanism using extensive game theory technique in wireless body area network (ES-EG). In Computational Vision and Bio-Inspired Computing: Proceedings of ICCVBIC 2021 (pp. 431-448). Singapore: Springer Singapore.
Mohammadi, R., & Shirmohammadi, Z. (2024). Optimizing energy harvesting in wireless body area networks: A deep reinforcement learning approach to dynamic sampling. Alexandria Engineering Journal, 109, 157-175.
Helal, H., Sallabi, F., Sharaf, M. A., Harous, S., Hayajneh, M., & Khater,

H. (2024). HCEL: hybrid clustering approach for extending WBAN lifetime. Mathematics, 12(7), 1067.
Ahmad, N., Awan, M. D., Khiyal, M. S. H., Babar, M. I., Abdelmaboud, A., Ibrahim, H. A., & Hamed, N. O. (2022). Improved QoS aware routing protocol (IM-QRP) for WBAN based healthcare monitoring system. IEEE Access, 10, 121864-121885.
Olatinwo, D. D., Abu-Mahfouz, A. M., Hancke, G. P., & Myburgh, H. C. (2024). Markov decision process based energy aware MAC protocol for IoT WBAN systems. IEEE Sensors Journal.
Anbarasan, H. S., & Natarajan, J. (2022). Blockchain based delay and energy harvest aware healthcare monitoring system in WBAN environment. Sensors, 22(15), 5763.
Sharone, M., & Muhtaroglu, A. (2023). Patient-Centered design method for self-powered and cost-optimized health monitors. IEEE Access, 11, 125055-125063.
Delgado-Vargas, K. A., Gallegos-Garcia, G., & Escamilla-Ambrosio, P.

J. (2023). Cryptographic protocol with keyless sensors authentication for WBAN in healthcare applications. Applied Sciences, 13(3), 1675.
Pradeep, R., & Kavithaa, G. (2024). Network lifetime optimization and route selection strategy towards energy enrichment in wireless body area networks. Peer-to-Peer Networking and Applications, 17(3), 1158-1168.
Pradhan, P. P., Revanthkumar, V., & Bhattacharjee, S. (2024). Energy aware forwarder selection in wireless body area networks to enhance stability and lifetime. Wireless Networks, 1-13.
Samal, T. K., Patra, S. C., & Kabat, M. R. (2022). An adaptive cuckoo search based algorithm for placement of relay nodes in wireless body area

networks. Journal of King Saud University-Computer and Information Sciences, 34(5), 1845-1856.
Samal, T., & Kabat, M. R. (2022). Energy-efficient time-sharing multichannel mac protocol for wireless body area networks. Arabian Journal for Science and Engineering, 47(2), 1791-1804.
Guo, W., Wang, Y., Gan, Y., & Lu, T. (2022). Energy efficient and reliable routing in wireless body area networks based on reinforcement learning and fuzzy logic. Wireless Networks, 28(6), 2669-2693.
Divya, S., Prema, K. V., & Muniyal, B. (2023). Efficient key generation techniques for wireless body area network. International Journal of Wireless Information Networks, 30(3), 270-281.

Deep Reinforcement Learning Driven Energy Harvesting and Power Allocation for Sustainable Wireless Body Area Networks

INTRODUCTION

Lack of Prior information: In the existing approach, they did not consider the past states that limit predictive accuracy and adaptability in WBAN.

Neglects Sensor Node Mobility: Current studies do not take into account sensor node movement as a result of movements in posture which can change how well a node is able to choose its best forwarder thus hindering a successful transmission of information.

Unpredictable node traffic management: Instability in node traffic leads to ineffectiveness in static channel and time frame allocation impacts in suboptimal resource utilization.

High Propagation Delay: In the previous research, the multi-hop communication experiences a

Energy constraints: One reason that WBAN has inherent problems is due to its energy capacity or lack of it, as well as the volatility of its data transfer; both of these issues contribute to an unstabl network.

LITERATURE SURVEY

TABLE 1

Summary of existing works

PROBLEM STATEMENT

SYSTEM MODEL

PROPOSED METHODOLOGY

Fig.1 Overall System Architecture

Data Collection

Fig.2 Data collection

Priority of sensor nodes:-

Mobility of Sensor nodes under postural changes

Fig.3 Postural changes on mobility sensor nodes

Classifying the Optimal Relay Nodes

Dynamic power Allocation

Energy harvesting based routing in WBAN

Security

START

Yes

No

End

TABLE 2

System Specification

Fig.4 Process of the security

VI. EXPERIMENTAL RESULTS

Simulation Setup

Comparative analysis

TABLE 3

Traffic bit rate (bits/sec) vs. end-to-end delay(s)

Fig.5 Numerical outcomes of Traffic bit rate (bits/sec) vs. end-to-end delay(s)

TABLE 4

Simulation time (s) vs. Energy Consumption (J)

Fig.6 Numerical outcomes of Simulation time (s) vs. Energy consumption (J)

TABLE 5

Number of nodes vs. throughput (kbps)

Fig.7 Numerical outcomes of Number of nodes vs. throughput (kbps)

TABLE 6

Number of nodes vs. path loss

Fig.8 Numerical outcomes of Number of nodes vs. path loss

TABLE 7

Traffic bit rate (bits/sec) vs. packet delivery ratio (%)

Fig.9 Numerical outcomes of Traffic bit rate (bits/sec) vs.

Research Summary

VII. CONCLUSION

References