 Open Access
 Total Downloads : 903
 Authors : Punita Panwar, Varun P. Saxena, Aditi Sharma, Vijay Kumar Sharma
 Paper ID : IJERTV2IS1025
 Volume & Issue : Volume 02, Issue 01 (January 2013)
 Published (First Online): 30012013
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Load Balancing using NQueens Problem
Load Balancing using NQueens Problem
Punita Panwar [1],Varun P. Saxena[2],Aditi Sharma [3],Vijay Kumar Sharma[4] [1]M.Tech (p), Rajasthan Institute of Engineering & Technology Jaipur.
[2] Assistant Professor, Govt. Women Engineering College Ajmer. [3]R.N. Modi Engineering College Kota. [4]Rajasthan Institute of Engineering & Technology Jaipur.Abstract– A distributed system can be viewed as a collection of computing and communication resources shared by active users. These resources are distributed and possibly owned by different agents or organization. When the demand for computing power increases the load balancing problem becomes important. The purpose of load balancing is to improve the performance of a distributed system through an appropriate distribution of the application load. Load balancing is a way to keep processor utilization as even as possible. A general formulation of this problem is as follows: Given a large number of jobs, find the allocation of jobs to computers optimizing a given objective function.

INTRODUCTION

T
T
HE load balancing is the process by which elements of the data domain are assigned to processors, with the same
two goals of maximizing the processors' utilization, and minimizing the total execution time. . Loadbalancing refers most often, to the dynamic distribution of data among the processors.
We have an application for which the data domain is composed of N units, and we want to have it solved on a network of P processors. The data domain offers great flexibility in the way it can be decomposed. Assuming that P divides N evenly, each processor can start with an equal number of data units. Let's assume, furthermore, that the processors are arranged in a chain, with P1 serving as a host interface. Assuming that each processor is sending intermediary results to the host as soon as it obtains them, we
have a situation where P1 must spend a large amount of time shuffling data from the other processors in the chain to the host. This involvement in data communication affects all the processors, and lessens as we move closer to PN. Hence, P1 will spend less time processing its own data than P2 does, which it turns, will spend less time than P3, and so on, until we reach PN which can devote all of its time computing and processing its own data. As a result PN will probably finish first and run out of data before the others. We can then expect PN1 to be the next to finish, and so on.
Fig: I
For this problem of load balancing here we implement a solution by Nqueen problem.
The N queens problem is to place N queens on an N by N chessboard, so that no queens can take each other. Because queens can move horizontally, vertically, and diagonally, this means that there can be only one queen per row and one per column, and that no two queens can find themselves on the same diagonal. Finding a solution for a dimension N requires creating a search tree where each node represents a valid position of a queen on the chess board. Nodes at the first level correspond to one queen on the N by N board. Nodes at the second level represent boards containing two queens in valid locations, and so on. When a tree of depth N is found then we have solution for positioning N queens on the board.
Partitioning in this case can be done by assigning sub trees to the individual processors, allowing the search to be done in parallel on different sub trees. The program stops when a processor reaches a terminal leaf (success), or when all the sub trees have been visited without ever reaching a terminal leaf (no solutions). For problems that exhibit such irregular domains that grow and shrink during the computation, partitioning must often be carried out dynamically by techniques that ensure that each processor has some work to do, and that progress is made towards a solution.
The pseudocode of the sequential solution is as follows: beginArray () //diagonals and columns marking them empty call to addQueen proceeding
addQueen() //place a queen on the following row
row++
for each column do(i:1..N)
test if a queen can be placed on column i.
If true then
mark the column and diagonals as filled.
If is the last row then New solution found If not
Call addQueen proceeding
The solution presented above shows a nonlinear growth in the complexity as the size of the board increases.
Fig: II


LOAD BALANCING
The distributed system consists of independent workstations connected usually by a local area network. Users of the system submit jobs to their computers at random times. In such a system some computers are heavily loaded while others have available processing capacity. The goal of the load distributing schema is to transfer the load at heavily loaded machines to idle computers, hence balance the load at the computers and increase the overall system performance The parallel program is set up to search for a solution containing N queens on an N by N chessboard by positioning
Each processor contains an array representing the chess board, and from the knowledge of that first position it deduces all the allowed positions of the second queen on Row

Each one of these positions represents the leaf of a unary tree of height 2, and the processor records all these trees in a heap, keeping the last tree found as the current tree to which it will try to add a new level.
As the processor progresses it may find that the next row of the chessboard is completely covered by the queens already on the board, and the current tree cannot be further extended. The current tree my therefore been thrown out and a new tree must be obtained from the heap. Because the program maintains the heap as a stack, and because the program always increases the height of its current tree, the tree on top of the stack is always the taller one (other trees in the stack may have the same height). This way, when a processor has reached an impasse and gets a new tree from the heap, it gets one that has the fewest leaves to add, hence a partial solution with the highest number of queens already in place. The load in each transputer is balanced by a managerworkers scheme, where the root transputer hosts the manager. The manager accesses its workers via a virtual star network (network of virtual channels) that can be mapped over any physical network.
The ManagerWorkers paradigm is easy to implement. Virtual channels allow the creation of a starshaped network on any physical network with very little effort. The Manager sits at the center of the star and enjoys direct access to the workers. For larger networks, however, this simplicity may not be acceptable due to the delays that may dramatically reduce the processors' utilization. In such cases, distributed methods for loadbalancing may be more attractive, and because they require a synchronization that involves only near neighbors, lower performance penalty can be expected in general. In addition, the processor originally implementing the Master can now be given a bigger share of the computation.
Algorithm LoadUnbalance Nqueens (Metrics C1, Metrices C2, queen Ii)

Maximum work load Lmax and minimum work load Lmin of the processors.
If Lmax=1000 and Lmin=500

Unbalance the load in between Lmax and Lmin, Metrics is defined as
C1 = Lmax / Lmin C1 = 1000/500
C1 = 2

Average work done is
a queen on successiverows, starting with the top row of the board, and going down one row at a time. At startup, a processor is given one of the many possible starting positions of the first queen on the first row (N total positions exist).
=
=1

Unbalance related to the average work done, Metrics C2 into account the deviation percentage of the work done by the processors in relation to the average of the work done

C2 = (W * 100) / Tprom
If C2= 0 then the obtained balance is the optimal.
If C2 = 50 it means that each processor deviates a 50 % of the work that it should carry out if it had an optimal balance.
The study of the load unbalance has been initiated for a type of parallel systems, focusing on the adjustment of the algorithm to the supporting architecture for load balancing.


The parallel algorithm for the same is intuitive. Assign first configuration of the queens to each node and run the sequential program on the same. If p < n/2 then each node will attain a maximum of n/2p initial configurations.
Each node with an initial configuration will run a Depth First Search. Any state cannot be further expanded if there are no nonconflicting positions for the queen in the next column and the program would need to backtrack.
Managerworker paradigm has been invoked for this parallelization. Node with rank 0 is assumed to be the manager while the rest are workers. The workflow is as follows:

Manager waits for request for work. A worker sends request for work if its idle.

Manager assigns tasks to the requesting worker.

Manager waits for another request until no more work.

On assignment of a task, the worker will process it and send back all the results.


For our purpose, tuning can be applied to finding the best interval of time between loadbalancing periods. Hence the performance of the loadbalanced application was dependent on the heuristic used to balance the load, and on the update interval. Because the RobinHood heuristic we choose here always brings back the two nodes that support the extreme loads, it must be run often to make sure the gap between these nodes (possibly different every update) does not increase. Tuning can take the form of a simple experiment where the parameter of interest is defined at run time of the Nqueen problem for N=27 and various interval lengths.
For board size = 12 using 7 processors: Got 7 slots. Node = 1: Results computed = 2139
Node = 5: Results computed = 2330 Node = 3: Results computed = 2139 Node = 6: Results computed = 2718 Node = 2: Results computed = 2437 Node = 4: Results computed = 2437
Total number of Results = 14200
Fig: III
Tuning the Nqueen problem for N=27, and intervals ranging from 500 sec. to 50 sec. Each execution time is the average of two runs.
Similarly for board size = 15 using 8 processors: Node = 2: Results computed = 357770
Node = 7: Results computed = 304450 Node = 6: Results computed = 323927 Node = 4: Results computed = 323927 Node = 5: Results computed = 332330 Node = 3: Results computed = 304450 Node = 1: Results computed = 332330
Total number of Results = 2279184

In this paper a fast and practical approach of load balancing for the Nqueens problem is used. By using n processors, a5 ms interval length provides the best execution time, 250.94 seconds, compared to 295.89 seconds for the asymptotic bounds for longer delays.

INMO88b Inmost, IMST800 Transputer, Document No. 42 1082 00, March 1988.

BRAW89 Brawer S. Introduction to Parallel Programming. San Diego, CA: Academic Press, Harcourt Brace Jovanich, Publishers, 1989.

Parallel Programming in C for the Transputer Â© D. ThiÃ©baut, 1995

AMDA67 Amdahl, G. M. "Validity of the singleprocessor approach to achieving large scale computing capabilities," AFIPS Conf. Proc. 30, AFIPS Press: 483485, 1967.

DALL92 Dally W. J. "Virtualchannel flow control," IEEE Trans. on Parallel and Dist. Syst., 3(2):194205, March 1992.

EAGE89 Eager D. L., J. Zahorjan, and E. D. Lazowska "Speedups versus efficiency in parallel systems," IEEE Trans. Computers 38:406 423, Mar 1989.

ELM86 Elmagarnid A. K. "A survey of distributed deadlock detection algorithms," SIGMOD Records, 15(3):3745, Sept. 1986.

FEIT91 Feitelson D. "Deadlock detection without waitfor graphs,"
Parallel Computing, 17:13771383, 1991.

FLYN72 Flynn, M., "Some computer organizations and their effectiveness," IEEE Trans. Comp. 21(9):948960, 1972.

GOSH91 Goshal D, G. Serazzi, and S. Triphath "The processor working set and its use in scheduling multiprocessor systems," IEEE Trans. Soft. Eng., 17(5):443453, May 1991.

GRÃ˜N91 GrÃ¸nning P., T. Qvist Nielson, and H. H. LÃ¸vengreen "Stepwise development of a distributed load balancing algorithm," Lecture Notes in Comp. Science, (486):151168, 1991.

GUNT81 Gunther K. D. "Prevention of deadlocks in packetswitched data transport systems," IEEE Trans. Commun., COM29:512524, Apr. 1981.

INMO88b Inmos, IMST800 Transputer, Document No. 42 1082 00, March 1988.

INT86 Intel, Introduction to the 80386. 231746001, April 1986.

KNAP88 Knapp E. "Deadlock detection in distributed databases," Technical Report, Depat. Computer Science, University of Texas, Jan. 1988.

KROG91 KrÃ¶ger B., R. LÃ¼ling, B. Monien, and O. Vorngerger. "An improved algorithm to detect communication deadlocks in distributed systems," in Lecture Notes in Computer Science, 486:90101.

KUNG88 Kung H. T. VLSI Array Processors. Englewood Cliffs, NJ: Prentice Hall, 1988.

LEST93 Lester B. P. The art of Parallel Programming, Englewood Cliffs, NJ: Prentice Hall, 1993.

THIE92 ThiÃ©baut D. J. Wolf, and H. Stone, "Improving diskcache performance with partitioning" IEEE Trans. Computers, 41(6), June 1992.

VONN45 Von Neumann, J. "First draft of a report on the EDVAC" Moore School, University of Pennsylvania, 1945.

WILS, Wilson P. "Highly concurrent systems using the transputer,"
Tech. Report, Inmos Corporation.

Adaptive Load Balancing for MPI Programs by Milind Bhandarker,
L.V. Kale, Eric De Sturler, Jay Hoeinger

Dynamic Load Balancing of Unbalanced Computations Using Message Passing by James Dinan, Stephen Olivier, Gerald Sabin, Jan Prins, P.
Sadayappan, and ChauWen Tseng

Common Search Strategies and Heuristics With Respect to the n queens

Problem by Sheldon Dealy