Different types of systems model for Dynamic Load balancing

DOI : 10.17577/IJERTV2IS3102

Download Full-Text PDF Cite this Publication

Text Only Version

Different types of systems model for Dynamic Load balancing

Mohammad Haroon Prof (Dr) Mohammad Husain

Research Scholar Director (JIT )Barabanki

TMU Moradabad

Abstract: In this paper presents a number of scalable, extensible system model for load balancing in a computational grid. The system model define the basic work for the load- balancing algorithms. The systems model is include of a grid architecture model, job queue model, communication model, job model, job migration model, and performance objective. The grid architecture model provides a representation and organization of system resources. The job queue model provides two- level architecture for the job-waiting queue at each grid site. The communication model provides an estimate of expected communication costs for message exchange and job transfer among grid sites. The job model provides a representation for jobs, and defines the job information needed by the load-balancing algorithms. The job migration model considers techniques for reducing the opportunities for site thrashing and job starvation. The performance metric for evaluating our load-balancing algorithms is given in the performance objective. Architecture model

It is assumed that the grid system consists of a collection of sites S connected by a communication network (Figure 1). The set S contains n sites, labelled as s1,, sn. Logically, the architecture is hierarchical and is divided into four levels: the grid, site, cluster and node levels. The capacity of resource management is different at different levels. The node can be a workstation or a processor. The other three levels are now discussed.

Figure 1 Logical view of the grid architecture; G, S, C are grid, site and cluster levels, respectively.

Cluster level

The cluster level contains a cluster of processors. The processors in a processor cluster share communication bandwidth and are protected by firewalls from the outside world. Processor clusters include tightly coupled multiprocessors such as a Sequent (in which processors communicate via shared-memory), distributed-memory multicomputer such as a Paragon, and loosely coupled workstations such as a Sun 4 cluster (in which processors communicate via message passing).

The management of jobs at cluster level has been addressed by many research and commercial systems, including: Condor ,Load Sharing Facility (LSF) , Portable Batch System (PBS) , Load Leveler , Sun grid Engine/CODEINE , Maui , MOSIX , COSY , A

comprehensive review of seven commercial packages and 12 research packages is given in Site level

The Site is an organizational entity. Each site contains a processor cluster. Each site has a broker denoted by the circle. On the one hand, each site si can be regarded as a whole system, and all of its nodes have a common objective. On the other hand, a site si can fully centrally control the resources of its nodes, but cannot directly operate the resources of nodes in other sites. In this view, all nodes are cooperative within the same site.

The site model can be extended to support

sophisticated architecture. For example, a site

site communicates only with

a subset of grid

may contain multiple administration domains. Each site has the freedom to choose the number

sites while maintaining load information.

Role of site brokers

of hierarchical levels and of

clusters or

The site broker handles all communications

resources belonging to each level, such that these numbers will best satisfy its management goals.

To clarify the statement and emphasize our main

with other site brokers via core grid middleware on behalf of the local site, and acts as a grid scheduler. It handles all communication with local scheduler on behalf of remote sites. Site

ideas in the dissertation, we will

simplify the

brokers are software processes that can run on a

model of grid site to one computing node with a single processor. Our scheduling can be easily

computer node in a cluster server node. When the

or on a separate node fails, a

extended to accommodate these

complicated

predetermined backup node

becomes the site

cases.

System heterogeneity can be

of different

broker. The focus of this dissertation is on the design of algorithmic mechanisms for grid

kindsfor example, processor speed, memory and disk I/O. A simpler and more practical

schedulers.

solution is to use CPU speed alone. It is

reasonable to assume that a machine with a powerful CPU will have matching memory and I/O resources. The sites in the grid system may

have different processing power. Processing

power of a site si is denoted as APWi. For ij, APWi may be different from APWj. APWi is presented as the number of computational units that the site can execute per unit of time. The processing power of a grid site si is measured by the average processing power across all processors within the grid site si if that site has more than one processor. The most common

Job queue model

measure of heterogeneity used in the ratio of processing power of

literature is the system

nodes [54]. APWi means the ratio of the average We assume that there is a global job-waiting

processing power of site si to processing power of the slowest

the average site sj in the

queue at each site that holds those jobs waiting to be assigned to local job management system

systemin other words, a job that takes one

unit of time on the site si requires APWi units of time on the site sj.

Grid level

All sites at the grid level are organised in a fully

or a remote grid site (Figure 3.2). Jobs that are

submitted to the site are first placed in this queue. The site broker will determine that the jobs in the global job-waiting queue are processed at local site or at remote sites. If a job is determined to be processed at the local site, it

distributed way. There is no central broker in the

will be transferred to the

underlying job

computational grid. The sites themselves are in a completely connected graph (Figure 3.1). The grid sites are mutually independent. Each grid

management system at cluster level within the site. We use GJQ(si) to denote the global job- waiting queue in site si. The jobs in the global

job-waiting queue are processed in a first- come-first-serve order.

The job-waiting queue at site level is different from the job-waiting queue at cluster level. For the following reasons, we used a job-waiting queue at site level.

The implementation complexity of pulling a job from the job-waiting queue managed by cluster- level job management system can be reduced.

Different load-balancing algorithms can be implemented at site level and have not any interference with job management system at cluster level. This incurs no extra work for the underlying job management system.

This approach leads to a flexible and portable solution to the existing grid job management system. It is a compromise between the benefits obtained from load-balancing algorithms applied at site level and the implementation complexity introduced in modifying the job management system running at cluster level. Although a trend is starting to occur as vendors adopt a grid perspective to scheduling, by combining pairs of local and grid schedulers into a single scheduler these systems do not interoperate and are not yet widely used.

Communication model

The sites S are fully interconneted, such that there is at least one communication path between any two sites in S. The only way that inter-site communication can occur is through message passing. There is a non-trivial transfer delay on the communication network between the sites. The transfer delay is different between different pairs of sites. The underlying network protocol guarantees that messages sent across the network are received in the order sent. The sites are interconnected by point-to-point links. There is no efficient broadcasting service available.

In general, the network performance between any site pair(si, sj) is represented as two parameters: a transfer delay TDij and a data

transmission rate BWij. The communication time for sending a message of Z bytes between these sites is then given

TDij + Z/BWij where Z/BWij is the transmission time. The two parameter abstractly represent the total time for traversing all of the links on the path between si and sj. BWij is presented as effective data transferring rate in bytes per time of unit, or is characterised in terms of Kb/s. TDij includes a startup cost and delays incurred by contention at intermediate links on the path between si and sj. TDij and BWij can be dynamically forecast by what is known as the Network Weather Service [38]. Other research has been proposed on estimating host distance between any two IP addresses

Job model

For any site, siS, jobs are arriving at si. We assume that the arrival of jobs is a random process with an average delay, -1, between two successive arrivals (e.g., the arrivals could be a Poisson process with rate, ; that is, the delay between two successive arrivals follows an Exponential law with the same rate of change). The jobs are assumed to be computationally intensive, mutually independent, and can be executed at any site. Job execution is not time- shared, but dedicated. As soon as a job arrives, it must be assigned to exactly one site for processing. When a job is completed, the executing site will return the results to the originating site of the job. We use J to denote the set of all jobs generated at S, J = {j1,, jr}. The following parameters related to the job are created automatically by the system:

bornSite(ji) denotes the originating site of the job ji exeSite(ji) denotes the executing site of the job ji arrTime(ji) denotes the arrival time of job ji, which is the time when the job is generated at bornSite(ji) endTime(ji) denotes the finish time of ji; this includes the job communication time from bornSite(ji ) to exeSite(ji), waiting time

queued at the exeSite(ji), processing times at the exeSite(ji), and the communication time it takes to return the processing results from exeSite(ji) to bornSite(ji) respTime(ji) denotes the finish

commTime(jx)

= t com jx(si, sj)

+ Tcomjx (sj, si)

time of ji. respTime(ji) endTime(ji) arrTime(ji). Each job jx that arrives at a grid site si is represented in two parameters: the amount of computation and the amount of communication. The values for these two parameters may be unknown or can be estimated from prediction techniques. The amount of computation normally has one of the following formats.

An expected execution time ETC(jx, sstd), that is, the time that would be taken at a standard platform (with a APW equal to 1) for processing that job. On a site si with APWi, the expected execution time of a job ETC(jx, si) will therefore beETC(jx, sstd)/APWi. We assume that the expected execution time ETC(jx, sstd) follows a type of probabilistic distribution (for instance, an Exponential, Hyperexponential or Bounded Pareto distribution).

The number of computation unit in a job jx

is denoted as NCUx. Thus, the

expected execution time for the job jx on site si

is NCUx/APW

In a grid environment, the related file of a job needs to be transferred through much slower internet links if the job is scheduled to run in a remote site. Therefore, the cost of file transfers or the amount of communication must be considered in the scheduling algorithm. The amount of communication is calculated in one of two ways.

A:The file size of a job jx includes input file size A1x and output file size A2x. Assume that, on average, A1x bytes are required to profile a job and that A2x bytes are required to return a response for the job. A1x and A2x are represented as the number of packets needed to be transferred. Thus, the communication time for job jx needed for transfer purpose is denoted as follows:

Tcomjx (si, sj) = TDij + A1x

BWij

Tcomjx (sj, si) = TDji + A2 x

BWji

where Tcomjx (si, sj) denotes the communication time of the job jx from si to sj,

Tcomjx (sj, si)denotes the communication it takes to return the processing results from sj to si.

However, due to the changes in the load situations that might occur during the transmission of the job, this job may have to make several moves before it reaches its final destination where it will be processed. Thus, we assume that the job jx has been

Transfer from Ti to Tj

B. The communication time for running a job in a remote site is set to the computation time divided by CCR, where CCR is the computation to the communication ratio. By using a range of CCR values, different communication time incurred in transit can be accommodated. The computation time is the expected execution time ETC(jx, sstd). The communication time means the total of the communication time of transferring a job from its bornsite(ji) to its final exesite(ji) and the communication time of sending the execution results from its exesite(ji) to its bornsite(ji).

start.

The age of a job is set to 1 when it is moved for the first time, and is incremented by 1 each time the job is moved again.

is a constant that can be adjusted empirically to change the extent to which ageing affects the operation of the scheduler.

The approach promotes the position of transferred job in the global job queue of that site sx, instead of adding it at the end of the

queue. This can considerably reduce the probability that the job will be transferred again, and guarantees the minimization of its response time. We used the approach throughout our simulation to improve the performance of the proposed algorithm.

A more conservative approach was used to reduce the rate at which jobs are moved from one site to another. This can be achieved by restricting the maximum number of jobs transmitted between sites to one job at any given time. This approach is more robust and requires minimal processing time at each site.

Performance objective

Our major objective is to minimize the average (overall) response time for a collection of jobs, here denoted as ART. Minimizing the ART of the jobs submitted for processing in a parallel/distributed system is a critical performance metric for improving the overall performance of the system. Many load- balancing algorithms have striven to meet this objective of minimising the ART

The average response time for a collection of jobs is defined by:

u

responseTime( ji )

ART = i =1

u

where u represents the total number of jobs completed for evaluation purpose.

Note that u < r.

To evaluate the performance of our algorithms that developed in Chapter 46, we define the improvement factor of algorithm F over another algorithm G as follows in terms of average response time of jobs:

ART ( G ) ART ( F ) ART ( G )

where ART(F) denotes the average response time of jobs using algorithm F.

ART(G) denotes the average response time of jobs using algorithm G. A positive value of the improvement factor indicates an improvement, while a negative value implies degradation. The value of the improvement factor is presented in terms of percent (%).

Conclusion:

This paper has described both a model for presenting grid resource architecture, and a model for presenting job queue. Then a communication model and job modelare also presented. These two models define the information needed to construct cost functions for computation and communication. The migration considerations and major performance objectives were then discussed.

References:

1: Load Sharing Facility. http://www.platform.com/Products/Platf orm.LSF.Family/.

2 Portable Batch System. http://www.openpbs.org/.

3: Sun Grid Engine / CODEINE. http://www.sun.com/software/gridware/i ndex.xml.

4: Load Leveler. http://www- 03.ibm.com/systems/clusters/software/lo adleveler.html.

5: COSY.http://www.ccrl- nece.de/~falk/COSY/cosy.shtml.

6: Condor. ://www.cs.wisc.edu/condor/.

15 (56) (1999) 757768.

7:

MOSIX. http://www.mosix.org/.

10:

P. Francis, S. Jamin, C. Jin, Y. Jin, D.

Raz, Y. Shavitt, L. Zhang, IDMaps: a

8:

A. Barak A. and A. Shiloh, The

global internet host distance estimation

MOSIX2 management system for linux

service, IEEE/ACM Transactions on

clusters and organizational Grids, white

Networking,

paper, March 2007.

11:

A. Agrawal, H. Casanova, Clustering

9:

R. Wolski, N. Spring, J. Hayes, The

hosts in P2P and global computing

network weather service: A distributed

platforms, in: Proceedings of the 3rd

resource performance forecasting

IEEE/ACM International Symposium on

service for meta computing, Journal of

Cluster Computing and the Grid, 1215

Future Generation Computing Systems,

May 2003, pp. 367373

Leave a Reply