Workflow Scheduling Algorithms in Cloud Computing

Download Full-Text PDF Cite this Publication

Text Only Version

Workflow Scheduling Algorithms in Cloud Computing

Savio Vaz, Alisha Crystal DAlmeda ,Santhosh B

Dept. of MCA, AIMIT, St Aloysius College(Autonomous), Mangalore

Abstract – Cloud computing is a new terminology which is achieved by Distributed, Parallel and Grid computing. Cloud computing provides different types of resources like hardware and software as services via internet. Efficient task scheduling mechanism can meet users requirements, and improve the resource utilization, thereby enhancing the overall performance of the cloud computing environment. In this paper we have considered workflow scheduling and conducted comparative analysis of workflow scheduling algorithms. Here Tasks require minimum completion time, better performance and total cost of execution is considered.

  1. INTRODUCTION

    Cloud computing now is known as a provider of dynamic, scalable on demand, virtualized resources over the internet. It also makes it possible to access applications and associated data from anywhere. Companies are able to rent resources from cloud for storage and other computational purposes so that their infrastructure cost can be reduced significantly. They can also make use of software, platforms and infrastructures as a services, based on pay- as-you-go model.

  2. INTRODUCTION TO WORKFLOW MANAGEMENT SYSTEM

    Workflow Management System (WMS) is used for management of executing the workflow tasks on the computing resources. The major components of WMS are shown in Figure 1.

    Figure 1: Elements of a WMS [1]

    Workflow design describes how components of workflow can be defined and composed. The elements of workflow design as shown in Figure 2.

    Figure 2: Taxonomy of Workflow Design [2]

    A workflow structure describe the relationship between the tasks of the workflow. A workflow structure can be of two types Directed Acyclic Graph (DAG) and Non DAG. Workflow structure can further be categorized as Sequence, Parallelism and Choice in the DAG based structure. In the sequence structure, the tasks execute in a series. A new task can only be executed if the previous task has been successfully executed. In parallelism structure, tasks of workflow can executes concurrently. In choice structure, the workflow tasks can be executed in series as well as concurrently. The Non DAG based structure, contains all the DAG based patterns along with Iteration pattern. In Iteration pattern, the task can executes in an iterative manner. A complex workflow can be formed with the combination of these four types of patterns in multiple ways. Workflow model defines the workflow, its tasks and its structure. Workflow models are of two types; abstract and concrete. Abstract workflow describe the workflow in an abstract form which means that specific resources are not referred for task execution. Concrete workflow describes the workflow tasks binded to specific resources. Workflow composition system allows users to accumulate component to the workflow. In the user directed system, user can modify the workflow structure directly by editing in the workflow. But in automatic, system generates workflow automatically. User directed system is further categorized in Language based modeling and graph based modeling. In former, user describe the workflow using some markup language such as XML. It is a difficult approach because user have to remember a lot of language syntax. Graph based modeling is simple and easy way to modify the workflow with the help of basic graph elements [1].

    2.1 WORKFLOW SCHEDULING

    Mapping and management of workflow tasks execution on shared resources is done with the help of workflow scheduling. So workflow scheduling finds a correct sequences of task execution which obeys the business constraint. The elements of workflow scheduling are shown in the Figure 3

    Figure 3: Elements of Workflow Scheduling [2]

    Scheduling Architecture is very important in case of quality, performance and scalability of the system. In the centralized workflow environment, scheduling decisions for all the tasks in the workflow is done by a single central scheduler. There is no central controller for multiple schedulers in decentralized approach but communication between schedulers is possible. Each scheduler schedules the workflow to the resource having less load. On the other hand in hierarchical scheduling there is a central manager. Not only Workflow execution is

    Controlled by this central manager but also the sub- workflows are assigned to the lower-level schedulers [1]. Scheduling decisions which are taken on the basis of task or sub workflow are known as local decisions and which are taken by keeping in mind the whole workflow are called global decisions. Global decisions based scheduling gives better overall results because only one task or sub workflow is considered in local decision scheduling.

    Transformation of abstract models to concrete models is done in two ways: static and dynamic. Concrete models are generated before the execution with the help of current available information about the execution environment and the dynamic change of resource is not considered in the static scheme. Static schemes further categorized in two types; user-directed and simulation-based. In the user directed planning scheme, decision about resource mapping and scheduling is done on the basis of users knowledge, preferences and performance criteria. But in case of simulation based, a best schedule can be achieved by simulating task execution on resources before the workflow execution starts. Both static and dynamic information about resources used for making scheduling decisions at run-time are consider in case of dynamic scheme. For dynamic scheme, there are two approaches prediction-based and just-in-time scheduling is used. In prediction-based, dynamic information is used along with some results on basis of prediction and in simulation-based static

    scheduling, in which prediction of performance of execution of tasks using resources is done along with generation of optimal schedule before the execution of task is done. But due to this the initial schedule changed during the execution. Just in-time scheduling makes decision at execution time. Scheduling strategies are categorized into performance driven, market driven and trust driven. In case of performance driven strategies, focus is to achieve the highest performance for the user defined QoS parameter. So the workflows tasks mapped with resources gives the optimal performance. Most of the performance driven scheduling strategies focuses to maximize the makespan of the workflow. Market driven Strategies focuses on resource availability, allocation cost and quality for the budget and deadline. A market model is used to schedule the workflow tasks to the available resource dynamically which results in less cost. Trust driven focuses on security and reputation of the resources and tasks are scheduled based on the trust by considering these two parameters [1].

    3 WORKFLOW TASK CLUSTERING

    Task clustering is basically used to combine the small sized tasks of the workflow into a comparatively large sized task to minimize the makespan of the workflow by reducing the impact of queue wait time. So task clustering restructure the workflow by combining the small sized tasks into a cluster and execute this cluster as a single task. Due to clustering the number of task in the workflow is reduced and there is less queue wait time as compared to small sized tasks. Task clustering is categorized as level- and label based clustering [3], vertical, blocked and balanced clustering [4]. Figure 4 represents a simple wrkflow. Figure 5 describes horizontal clustering in level 2 and level 4 with each cluster contains two tasks.

    Level-based clustering is also called horizontal clustering. In this type of clustering the tasks are independent and the tasks of the same level of the workflow can be combined together.

    In vertical clustering, tasks of the same pipeline can be combined together. Blocked clustering represents the combination of both horizontal and vertical clustering in the same workflow. Figure 6 represents vertical clustering, in which tasks in the same pipeline can be

    Combined together to form a cluster. Each cluster of vertical clustering contains three tasks. Figure 7 shows the blocked clustering, in which horizontal and vertical clustering can be combined together. Tasks in the same level of workflow can have different execution time and whenever these tasks are combined without the consideration of their runtime variance then it causes the problem of load imbalance, i.e., some cluster may contain the smaller tasks and other may have larger tasks. Due to this inappropriate clustering, workflow overhead is produced. Another problem is data dependency between the tasks of workflow in case of level-based clustering which is called dependency imbalance.

    1. SCHEDULING ALGORITHMS

      The main objective of scheduling algorithm is to achieve the best system throughput with proper resource utilization and high performance by obeying the users specified QoS parameter. There are various time comparison based scheduling heuristics exists in the grid and Cloud computing environment, which schedules the tasks by comparing the arrival time or execution time of the tasks. In the following section these scheduling heuristics are described.

        1. First Come First Serve (FCFS)

          In this algorithm, tasks are compared on the basis of their arrival time and the task which comes first in the ready queue is served first. Advantage of this algorithm is its simplicity and fast execution behavior. But the main disadvantage of this algorithm is that sometimes due to the execution of a longer job, which comes in the queue first, small jobs have to wait for its completion. Due to this problem the waiting time of tasks increased and overall performance of the workflow execution decreases.

        2. Min-min

          In this algorithm, small task is executed first so that large task delays for long time. Algorithm begins with by sorting the set of all unmapped tasks in increasing order of their completion time. Then the tasks having the minimum completion is scheduled from the unmapped task set and the mapped task has been removed from unmapped task list, and the process repeats until all the tasks of unmapped list is mapped to the corresponding available resources [5].

        3. Max-min

          In this algorithm, large task is executed first so that small task delays for long time. This algorithm is very similar to Min-min algorithm, instead of sorting the task in the increasing order of completion time. This algorithms sorts the tasks in decreasing order of their completion time. Then the task with the overall maximum completion time is selected from this task list and scheduled to the corresponding available resource. Then the scheduled task has been removed from unmapped task set and the process repeats until all tasks of unmapped list is mapped [5].

        4. Minimum Completion Time (MCT)

      In this algorithm, task that takes least time to complete is allocated to a machine randomly. So MCT behaves somewhat like Min-min. However, Min-min algorithm considers all the unmapped tasks during each mapping decision but on the other hand MCT considers only one task at a time [6].

    2. EXPERIMENTAL RESULTS

      The paper uses workflowsim simulator[7] check the efficiency and correctness of the workflow-scheduling algorithms,

        1. Simulation setup

          Tested with datacenter with the following properties

          and tested with 20 VMs..

        2. Analysis of results

      The fig 8 shows workflow of task dependency in CyberShake_50 and fig 9 shows simulation results when MINMIN simulation algorithm is used. Here the total cost is calculated using the formula

      cost= (communication cost + computation cost) = (data (both input and output) *

      * unit cost of data + runtime * cpu cost) for each task

      Fig 8: CyberShake_50 task dependency graph

      Fig 9: simulation results when MINMIN simulation is used

      The table 1 and table 2 shows the makespan and total cost of scheduling of tasks using the algorithms FCFS,MCT

      ,MAXMIN and MINMIN with Montage_100 and CyberShake_50 datasets

      Table 1 : Makespan and Total cost of various scheduling algorithms using Montage_100 dataset

      Table 2: Make span and Total cost of various scheduling algorithms using CyberShake_50 data set

      The Experimental result shows that MAXMIN is the more efficient with respect to the makespan and total cost in comparison to the other algorithms FCFS, MCT and MINMIN

    3. CONCLUSIONS

Minimizing Makspan is one of the main objective in most of the scheduling algorithm. The Experimental result show that selecting the largest task and assigning to the resource which gives minimum execution time is the most efficient when compared with FCFS, MCT, MINMIN. These algorithm still can be improved by optimising multiple attributes.

REFERENCES

  1. J. Yu and R. Buyya, A Taxonomy of Workflow Management Systems for Grid Computing, journel of Grid Computing, Springer, pp. 171-200, 2005.

  2. J. Yu and R. Buyya, A taxonomy of scientific workflow systems for grid computing, SIGMOD Record, Vol. 34, no. 3, pp. 44-49, 2005.

  3. G. Singh et al. Workflow task clustering for best effort systems with Pegasus, In Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities. ACM, 2008.

  4. W. Chen, R. Silva, E. Deelman, and R. Sakellariou, Balanced Task Clustering in Scientific Workflows, In proceedings IEEE 9th International Conference on eScience, pp. 188-195. 2013.

  5. D. Tracy et. al. A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems, Journal of Parallel and Distributed computing, vol.61, no. 6, pp. 810-837, 2001.

  6. F. Dong and S.G. Akl, Scheduling Algorithms for Grid Computing: State of the Art and Open Problems, Technical Report of the Open Issues in Grid Scheduling Workshop, School of Computing, University Kingston, Ontario, January 2006.

  7. W. Chen and E. Deelman, WorkflowSim: A toolkit for simulating scientific workflows in distributed environments, E-Science (e- Science), 2012 IEEE 8th International Conference on. IEEE, 2012.

Leave a Reply

Your email address will not be published. Required fields are marked *