Efficient Approach for Resource Provisioning to Manage Workload in Cloud Environment

Cloud Computing is a perspective which applies parallel or distributed computing, or both to manage its resources. The Cloud is built with physical or virtualized resources over centralized or distributed large data centers. The Resources in the Data Center are provisioned which involves scheduling of tasks to enhance the performance of Cloud services during runtime. Existing Resource Provisioning algorithms improves the CPU utilization and turnaround time of executing the cloud resources which in turn impacts latency and energy overhead. The work is carried out for scheduling of tasks for managing workload under Large Scale cloud computing environment. The data intensive applications are employed as workload for carrying out the experiment. An Efficient Approach for Resource Provisioning (EARP) model is developed for task scheduling by employing Dynamic Voltage Frequency Scaling (DVFS). The EARP model aims to minimize the processing time and energy consumption by effectively utilizing system resources


INTRODUCTION
Internet has revolutionized the way people access information and communicate with each other ever since its invention. Since its evolution from information access systems to service-oriented computing platforms; one of such platforms is Cloud Computing. Cloud Computing, as a business administration and computation framework, has pulled in increasingly more consideration from both industries as well as scholarly community. Cloud Computing is modeled by combining huge number of computational resources into a shared resource pool. This involves accomplishing huge scale and proficient resource usage through the Internet with minimum cost and management.
Resource provisioning is a technique which involves with mapping and scheduling of tasks to virtual machines and consequently virtual machines are also mapping and scheduled onto the physical servers [22,23]. The Cloud Data Center (CDC) maintains physical and virtual resources with diverse capabilities of processor, memory, network devices, disks, cooling devices etc. The operational cost of CDCs becomes challenging and the need for minimizing the energy consumption is of vital importance. The cloud resources are managed by scheduling of tasks to handle the workflow and workload scheduling techniques.
The workflow scheduling process has been widely utilized model for huge scale data-intensive processing along with scientific application services deployed on cloud environment. The workflows are established by various tasks and sub-tasks, data dependencies among each sub-tasks. It tends to be disconnected into a Directed Acyclic Graph (DAG) in which nodes are connected with sub-tasks sets and edge sets signify the dependency among sub-tasks. Many grid workflow execution frameworks such as ASKALON, Pegasus, etc. support the workload execution on Cloud Computing Environments [16].
Cloud infrastructure services are one of the well-known and widely used Cloud service designs, which offer its users with capabilities to provide Computational Nodes (CN) within Cloud environment. The Computational Nodes in Cloud environment are generally called as instance. The cloud user can access number of boundless CN which significantly reduces the overall cost of ownership for processing the workload [1]. In general, these services are provisioned with Service Level Agreement (SLA) constraint that defines and characterizes the Quality of Service. Hence the Cloud service provider can charge its customer as per Quality of Service requested and amount of time spent by the user. The discovering the right kind of methods for allocating task in Large Scale Cloud environment is a challenging problem.
In the process of workload scheduling, the users submit their jobs to the cloud scheduler. The cloud scheduler inquires the cloud information service for getting the status of available resources and their properties and allocates the various tasks on different resources as per the task requirements. Also the Cloud scheduler will assign multiple user tasks to multiple virtual machines. Workload scheduling can be performed based on different parameters in different ways. They can be statically allocated to various resources at compile time or can be dynamically allocated at runtime. A good workflow scheduling algorithm improves the CPU utilization, turnaround time and cumulative throughput of cloud resources. However, very limited work is done employing DVFS and meeting Quality of Service constraint of workload scheduling. This research work aims at presenting an efficient approach of resource provisioning model that considers minimizing processing time and energy consumption by employing DVFS which effectively utilizes the system resources in cloud computational environment.
• The EARP model schedules the tasks to minimize the execution time and power consumption for execution of scientific workflow.
The paper organization is as follows: The literature survey is discussed in section II. The workload scheduling problem definition is discussed in section III. The proposed Efficient Approach for Resource Provisioning (EARP) of dynamic workload execution in cloud computing environment is presented in section IV. The experimental analyses are presented in the section V. In last section, the research work is concluded with future research direction.

II. LITERATURE SURVEY
In recent time, there has been lot of work been carried out on the workflow scheduling problem in homogeneous computing environments. David eadamiet. al [2], addresses the major issues of how effectively allocate virtual computing resources in dynamic manner based on applications -QoS prerequisites, energy and cost saving by optimizing the amount of computing servers being used. For addressing this problem, they presented cost effective and dynamic virtual computing node allocation design employing existence of Nash Equilibrium. K.Das guptaet. al, [3] employs Genetic algorithm (GA) for bringing tradeoff in balancing load and reduce makespan time. However, both [2] and [3] did not consider minimizing energy consumption for executing workload. Thus, induces higher cost of execution. For addressing energy efficiency, efficient performance, and reliable processing requirement of modern Big Data processing frameworks, J. R. Doppa et. al, [4] considers that self-aware multi-core framework autonomously optimize the performance parameter for permitting computation process in dynamic manner in accordance with user QoS or SLA prerequisite needs, resource accessibility, energy constraint, and performance requirement. This adaptively traverses right from the application such as scheduling and task mapping to the core such as power gating and DVFS.
Further, Large Scale distributed computational environment such as cloud computing environment comprising of various collection of Virtual Computing Machine (VCM) or processing core offers data storage and computing environment and strategies at a large scale [5]. These strategies involve huge computational expenses and impacts environment. This is due to high energy dissipation at different degrees of storage and computational procedures [6]. K. Li [7], analyses that the fastest super computer in china comprising of sixteen thousand nodes consumes about 17,808 kilowatt (KW) of power. Energy dissipation is a noteworthy issue that influences the improvement and utilization of computational frameworks. In Large Scale Cloud environments, a parallel application with priority bound jobs is described by a Directed Acyclic Graph (DAG). Further, in DAG the nodes describes the jobs and the edges describes the correspondence messages among jobs [8], [9], and [10]. Numerous examinations have been led as of late to limit energy dissipation and at same time fulfilling job SLA perquisite or requirement [11], [12]. Nonetheless, these examinations are confined to autonomous jobs. As Large Scale frameworks keep on being improved, DAG-based work flow with priority bounded jobs rise in size.
The issue of scheduling jobs on different or multiprocessing environment is NP-hard [13]. Various meta-heuristic methods, for example, Genetic Algorithm (GA), Ant Colony Optimization (ACO) algorithm, and annealing are broadly utilized in DAG-based data-intensive and scientific workflow scheduling [14], [15], and [16]. These methodology for the most part produce superior scheduling quality when compared with heuristic method. This is because of poor search efficacy and frequent strategy computation [17]. Khorramnejad K et. al, [18], discussed about enhancing and improving performance in terms of reducing computation cost for processing multimedia information they focused on merging, prefetching and workflow scheduling together. The processing cost, response time minimization and optimization problems are modeled. Also by considering cost, response time, computational resource allocation, and queueing stability limitations a heuristic method for workload scheduling is modeled. Chunlin, L., Jianhanget. al, [19] showed the significant challenges exist for workload scheduling in hybrid cloud platform. These are different Cloud Service Providers (CSP), Large Scale workload, how to deploy and port the services to CC environment with minimal fiscal budget. For assuring superior resource utilization they presented a Large Scale workload scheduling in private Cloud environment. Besides, for assuring the workload execution is completed within deadline constraint a workload scheduling mechanism is presented using Back Propagation Neural Network (BPNN) under hybrid cloud environment. Junlong Zhou et al., [20], suggest that the workflow scheduling design considers the fiscal budget and computation time under hybrid cloud platform. The first scheduling model is designed using Single Objective (SO) function namely Deadline Constrained Cost Optimization (DCOH) model.The DCOH are designed by reducing fiscal budget of workload scheduling with deadline prerequisite. Second, they presented a Multi-Objective (MO) based workload scheduling method namely Multi-Objective Optimization Method (MOH). The MOH method is designed for bringing tradeoffs between fiscal budget and execution time for scheduling workload execution. However, these models are not efficient in minimizing energy for scientific workflow execution under Large Scale computing environment.
In Cloud infrastructure as a service market, the customers procure cloud services offered by CSP for carrying out their workload executions. Every workload generally comprises of certain deadline prerequisite for guarantying QoS. At the same time, poor QoS will impose strict penalty on CSP. The CSP will generally charge its customer according to QoS requested and execution time. Therefore, for earning better profitability with assured QoS, minimizing execution time and assuring QoS of workload execution is major objectives followed by CSP. However, the existing workload scheduling algorithm assumes that the makespan (i.e., computation time) of tasks in the data-intensive workload application are fixed. However, this hypothesis generally doesn't exist in actual environmental conditions [20]. In this view, the Cloud Servers (CS) have already started to support DVFS practice, which is not employed by existing workload scheduling model [20].
For DVFS-enabled CS, to reduce the cost, energy and execution time of workload execution without affecting system

III. WORKLOAD SCHEDULING PROBLEM DEFINITION
In this section the workload scheduling problem is described. A general way for describing workload is to utilize DAG. A workflow is a DAG. A workload is represented as DAG using following equation where depicts task sets described as follows = { 1 , 2 , … , } and depicts data or task control dependencies which can described as follows The weights given for each task sets depict the reference makespan. The reference makespan depicts the time for processing a task on a virtual computing node with certain configuration, and the edge weights depicts the amount of data in bits to be transferred among different task sets. The reference makespan of is represented as ( ) and the amount of data to be transmitted from and is represented by ( , ). Along with, all predecessors of task is defined using following equation For respective , ← depicts incoming task assuring ( ← ) = ∅ and → depicts an outgoing task assuring ∄ ∈ : → ∈ ( ).
Majority of existing workload scheduling method requires a DAG with single ← and → . This can be effectively satisfied by including pseudo ← . Similarly, by including pseudo → with zero weight to the DAG. Thus, this work consider for respective workload used will possess single ← and → [16]. Lastly, workload scheduling in next section aims to minimize the processing energy and communication energy employing DVFS.

IV. AN EFFICIENT APPROACH FOR RESOURCE PROVISIONING OF DYNAMIC WORKLOAD EXECUTION IN CLOUD COMPUTING ENVIRONMENT
In this section an Efficient Approach for Resource Provisioning (EARP) of dynamic workload execution in cloud computing environment is proposed. The model employs DVFS technique for scheduling the Large Scale data intensive workload under cloud computing environment. The task scheduling in EARP is handled by employing the distinct frequencies and with respective time slots for each computing nodes for scheduling task. The proposed EARP scheduling model is described in Algorithm 1.

Algorithm 1: Efficient approach for Resource
Provisioning of dynamic workload execution in Cloud Computing environment.
In EARP model, the load is balanced by the execution of tasks equally across computing nodes as follows where depicts resultant processed information in bits, depicts the number of server or computing nodes, is the information block size, and depicts for each computing node (CN)i.e., the processor with the number of frequencies which is divided between highest and lowest frequencies. The endpoint association link information bandwidth is as follows where depicts the computing node interacts with the scheduler via a traffic free reliable connection bandwidth rate,.
The Eq. (1) and Eq. where, ( ) and ( )represents the total computation time and interaction energy of ( ) for the permitted block processing and interaction time underneath constraint , respectively. The interaction energy consumption depends on one-way interaction delay ( ) which can be expressed as follows The interaction delay is occurred by endwise virtual connections. For DVFS technique, the functioning frequency is considered for every computing node and lies in small range of distinct frequencies. The optimum functioning frequency can be selected by switching the CPU frequencies of computing nodes over a various range of possible time periods. However, due to the existence of distinct frequencies a non-convex problem can be occurred which can be addressed as discussed below.
First, let every computing node switches from its current distinct frequency to succeeding distinct frequency to finish the task load. Thus, the time is distributed into + 1 distinct indefinite time variables. Therefore, the distinct frequencies for each computing node, the corresponding time slots are unknown. Moreover, every component of time vector defines the time period length during which computing node process at the frequency . System keeps the record of working servers so that it can assign next tasks to them which are coming from the gateway. This information is very essential to forward over all the information processing centers and servers so that the average energy consumption can be reduced by minimizing the execution time. Therefore, the above problem can be expressed in following form, where and ℂ depicts the active gate percentage and effective load of capacitance, respectively The equation (5) defines the combined energy computation and interaction cost in which cost of the switching frequencies from the arriving task load is also considered, which is subjected to satisfying Eq. (6), (7), (8) and (9).
The Eq. (6) indicates that the summation of products of computing rates of each CNs of their respective time slots must be equal to the arriving task load .Moreover, equation (7) and Eq. (8) presents a factor which represents the maximum time required for the processing. The total energy computation and interaction time underneath constraint can be distributed in two parts which is shown in, Eq. (7) and Eq. (8) respectively.
The Eq. (7) represents the computational cost.
The Eq. (8) represents the interaction cost.
The above Eq. (9) represents that the volume of information transferred through information processing center should not surpass the total capacity of information network center. This equation provides endwise connection for the bandwidth load matching and also fine tunes computing node bandwidth according to their given task load.
Further, for reducing the non-convex difficulty, this work divides two energy components into two different events such as computation cost and interaction cost. The both events can be scheduled separately to achieve an efficient scheduling as well as execution. Hence the energy consumption will be minimized. Therefore, the computational optimization problem is expressed using below equation From the above observations it may be considered that Eq. (10) is linear for a control parameter and can be sorted out using the Eq. (6) and Eq. (7). Similarly, the interaction aware non-convex variables are and optimization problem are expressed using below equation This interaction aware non-convex variables are and optimization problem can be addressed using the equations (8) and (9) or the following equation (12) also can be a solution, The optimization problem in Eq. (3) are addressed using following equation where, Eq. (13)  Therefore, all the energies are optimized as well as performance of the model also maintained at very high level. Hence, the tradeoff between performance and energy consumption can be achieved using the proposed EARP which is experimentally proved in below section.

V. RESULTS AND ANALYSIS
The proposed EARP and existing resource provisioning algorithms for workload execution on Cloud environment are simulated using CloudSim. In the simulation scenario the comparison of proposed EARP and existing scheduling algorithms [5], [6], [16], [20] on the basis of energy consumed and execution time. The work is carried out considering various data sets, various data centers and virtual machines which are allocated to different hosts. From analysis it can be seen that finding ways to allocate task loads to every embedded processor and to effectively decrease energy consumption in each processor is an essential significance. In recent times, the demand of embedded processors are extremely enhanced in real world due to ample use of digital instruments, network tools, portable gadgets and information devices etc. Various techniques can be utilized in these embedded processors like multimedia-signal-processing-mechanism. Thus, superior performance of embedded devices becomes mandatory requirement due to the ample demand of these embedded devices in daily life. These embedded processors come with two major drawbacks which can affect their efficiency. Firstly, high amount of power is consumed in these devices. Secondly, lack of balance between performance requirement and power consumption is found. Therefore, in this section, for the evaluation of performance and power consumption, various results are demonstrated using existing and proposed EARP based on DVFS techniques [6]. Here, execution time considering different jobs as 30, 50, 100, and 1000 are evaluated. Different graphs are plotted considering time, number of jobs, power consumption etc. Various parameters are considered to evaluate the execution time and power consumption. The model is tested on Inspiral scientific dataset because it is a data-intensive workflow and it is categorized by having CPU intensive tasks that requires enormous amount of memory. The Inspiral workflow is used for analyzing the information collected from the coalescing of compact binary systems such as black holes and binary neutron stars [21]. A sample structure of Inspiral workflow is shown in Fig. 1.The proposed model is implemented using Java programing language. The model is deployed on 64-bit quad core processor with 8 GB RAM on windows 10 operating system.       4 shows performance comparison of average execution time attained by EARP over existing resource allocation model [16], [20]. The average execution time of existing resource allocation model [16], [20]  First experiment is carried out over existing scheduling model that aim to bring good tradeoff between energy minimization meeting workload QoS constraint. From experiment it is seen the EARP model achieves much better performance than existing DVFS model [6]. An execution time performance enhancement of 83.4% is achieved by EARP over existing DVFS based workload scheduling model [6]. Further, experiment are conducted to evaluate performance of average execution time for executing each task of Inspiral workflow using EARP over existing workflow scheduling model [6], [16], and [20]. From result attained it is seen EARP improves processing time by 30.29% over workload scheduling model [6] and 88.41% over workload scheduling model [16] and [20]. The EARP reduces the energy (i.e., power) consumption for executing scientific workflow by 41.355% over existing DVFS based workload scheduling model [6]. From overall result attained it can be seen proposed EARP model bring good tradeoffs between minimizing execution time and energy for executing CPU and memory intensive task.

VI. CONCLUSION
The paper presents the survey of various state-of-art QoS and energy efficient real time dynamic Large Scale workload scheduling algorithm. From the analysis it can be concluded that finding ways to allocate task loads to every embedded processor and to effectively decrease the energy consumption in each processor is of essential significance. Therefore, a solution to sort out the difficulties to achieve trade-off between performance and energy consumption for virtual machines in a cloud environment using Efficient Approach for Resource Provisioning (EARP) based on DVFS technique is provided. The results are demonstrated in terms of execution time and reduction in power consumption required for processors. From the result it can be seen that the EARP model achieves much better performance than existing workload scheduling model.