 Open Access
 Total Downloads : 416
 Authors : Vaishali Udar, Sanjeev Sharma
 Paper ID : IJERTV2IS50364
 Volume & Issue : Volume 02, Issue 05 (May 2013)
 Published (First Online): 31052013
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Place and Route Algorithm Analysis for Field Programmable Gate Array (FPGA) Using KL Algorithm
Vaishali Udar
Department of Electronics and telecom. Engg Priyadarshini college of Engineering
Nagpur, India
Sanjeev Sharma
Department of Electronics and telecom. Engg Priyadarshini college of Engineering
Nagpur, India
Abstract: Efficient placement and routing algorithms play an important role in FPGA architecture research. Together, the placeandroute algorithms are responsible for producing a physical implementation of an application circuit on the FPGA hardware. This paper presents the KL Algorithm along with the reduction in the circuit as well as the implementation of the algorithm using multiplier which further reduces the cost and power and increases the performances. In the fast growing communication field, requirements of minimization are increasing to reduce the cost and timing of the integrated circuit. The KL partitioning algorithm has been implemented and the result has been observed on the processor based design.
Keywords: KL algorithm, Place and Route, FPGA
I INTRODUCTION
The process of placing and routing for an FPGA is generally not performed by a person, but uses a tool provided by the FPGA Vendor or another software manufacturer. The need for software tools is because of the complexity of the circuitry within the FPGA, and the function the designer wishes to perform. Generally the FPGA designflow map designs onto an SRAMbased FPGA consist of three phases. The first phase uses synthesizer which is used to transform a circuit model coded in a hardware description language into an RTL design. The second phase uses a technology mapper which transforms the RTL design into a gatelevel model composed of lookup tables (LUTs) and flip flops (FFs) and it binds them to the FPGAs resources (producing the technologymapped design). During the third phase,
the place and route algorithm use the technology mapped design to implement on FPGA.
The routing and placing operations may require a long time for execution in case of complex digital systems, because complex operations are required to determine and configure the required logical blocks within the programmable logic device, to interconnect them correctly, and to verify that the performance requirements specified during the design are ensured. The delay introduced by logic block and the delay introduced by interconnection can be analyzed by the use of efficient place and route algorithm.
The placement algorithms use a set of fixed modules and the netlist describing the connections between the various modules as their input. The output of the algorithms is the best possible position for each module based on various cost functions. We can have one or more cost functions depending on designs. The cost functions include maximum total wire length, wire routability, congestions, and performance and I/O pads locations.
II PLACING AND ROUTING
These operations are performed when an FPGA device is used for implementation. For designing, Placing is the process of selecting particular modules or logical blocks of the programmable logic device which will be used for implementing the various functions of the digital system. Routing consists in interconnecting these logical blocks using the available routing resources of the device.
III PARTITIONING ALGORITHM (KL ALGORITHM)
Basic purpose of partitioning is to simplify the overall design process. The circuit is decomposed into several sub circuit to make the design process manageable

Partitioning Algorithms:

Iterative partitioning algorithms

Spectral based partitioning algorithms

Net partitioning vs. module partitioning

Multiway partitioning

Multilevel partitioning


Iterative Partitioning Algorithms:

Greedy iterative improvement method
 [KernighanLin 1970]
 [FiducciaMattheyses 1982]
 [ krishnamurthy 1984]

Simulated Annealing
 [KirkpartrickGelattVecchi 1983]

KERNIGHANLIN (KL) ALGORITHM
The KL (KernighnanLin) algorithm was used for bisecting graph in VLSI layout which was first suggested in 1970 . The algorithm is an iterative algorithm; which Starts from a load balanced initial bisection, it will first calculate each vertex gain in the reduction of edgecut that may result if that vertex is moved from one partition of the graph to the other. For every inner iteration it moves the unlocked vertex having the highest gain, from the partition with more vertices to the partition which it requires which has less in number. Then the vertex is locked and the gains are updated.
The procedure is repeated until all of the vertices are locked even if the highest gain may be negative. The last few moves that had negative gains are then undone and the bisection is reverted to the one with the smallest edgecut so far in this iteration. Here one outer iteration of the KL algorithm is completed and the iterative procedure is restarted again. If an outer iteration will results in no reduction in the edge cut or load imbalance, then the algorithm is terminated.
If an outer iteration gives no reduction in the edgecut or load imbalance, the algorithm is terminated.
The KL algorithm is a local optimization algorithm, with a capability for getting moves with negative gain.

How Kl Works
Let we have a graph G (V, E), and let V be the set of nodes and the E set of edges.
The algorithm attempts to find a partition of V into two disjoint subsets A and B of equal size, or unequal such that the sum T of the weights of the edges between nodes in A and B is minimized.
Let Ia be the internal cost of a, that is, the sum of the costs of edges between a and other nodes in A, and let Eabe be the external cost of a, that is, the sum of the costs of edges between a and nodes in B. Furthermore, let Da, Da = Ea Ia be the difference between the external and internal costs of a. If a and b are interchanged, then the reduction in cost is
Told Tnew = Da + Db 2Ca,b
Where Ca,b is the cost of the possible edge between a and b.
The algorithm will try attempts to find an optimal series of interchange operations between elements of A and B which maximizes Told Tnew and then executes the operations, producing a partition of the graph to A and B[5].
We can try all possible bisections. Choose the best one. If there are 2n vertices, then numbers of possibilities are (2n)! / 2(n!)2 .For 4 vertices (A, B, C, D), possibilities are three:

X = (A, B) and Y = (C, D)

X = (A, C) and Y = (B, D)

X = (A, D) and Y = (B, C)


KL Algorithm Implementation
Figure 1 Example 2Bit Multiplier
Now the above application was converted to eight nodes as mentioned above in the designing aspects according to KL algorithm. The nodes are
Node 1 And gate having input A1B1 Node 2 Xor gate having S3 as output Node 3 And gate having S2 output. Node 4 Xor gate
Node 5 And gate having input A1B0 Node 6 And gate having input A0B1 Node 7 And gate
Node 8 And gate having input A0B0
Figure 2 Cut size = 3,
Largest value of G is G53=3 here we consider (a1 b1)
= (5, 3) and Al = Al – 5= (1, 2) and Bl = Bl – 3 = (4, 6,
7).New Al = (1,2,8) and Bl = (4,6,7). Both Al and Bl are not empty, and then we update D values in next step and repeat te procedure from step 3.
Step 4: Update D values of nodes connected to (5, 3).
l
l
The vertices connected to (5, 3) are vertex (1) in set A and vertices (4, 7) in set B. The new Dvalues for vertices of A and B are given by:
l
l
l
l
l
l
l
l
D1 = D1+2(C13)2(C15) =2 D4 = D4+2(C43)2(C45) = 1 D7 = D7+2(C75)2(C73) =2 D2 = D2+2(C52)2(C23) =1 D6 = D6+2(C63)2(C65) =2
Repeat step 3
l l
l l
G24 = D2 + D4 2C24 = 2 G26 = D l + D l 2C26 = 2

Algorithm Steps 2 6
G27 = D l + D l 2C27 =1
l l
l l
l l
l l
7 2
Step I: Initialization
Let the initial partition be a random division of vertices into the partition A= {1, 2, 3, 4} and B= {5,
6, 7, 8}.Here let Al=A= {1, 2,5,8} and Bl = {3,4,6,7}
Step 2: Compute D – values. D1 = E1I1 = 11 = 0
G14= D1 + D4 2C14= 1 G16= D1 + D6 2C16= 0 G17= D1 + D7 2C17= 4
l l
l l
Here the G value for G17 is large. Hence pair (a2, b2) is (1, 7).
Al = Al1 = (2, 8) and Bl = Bl 7 = (4, 6).
The new D values are
D ll= D l +2(C )2(C ) = 1
D2 = E2I2 = 01 = 1
2 2 21 27
D ll= D l +2(C )2(C
) = 1
D3 = E3I3 =10 = 1
4 4 47 41
D ll= D l +2(C )2(C
) = 0
ll ll
ll ll
D4 = E4I4 = 12 = 1
6 6 67 61
D5 = E5I5 = 20 = 2
G24= D2 + D4 2C24= 0
G = D ll + D ll 2C = 1
D6 = E6I6 = 02 = 2
26 2
6 26
D7 = E7I7 = 11 = 0
Step 3: compute gains
G23 = D2+D32C23 = 1+12(0) =0
G24 = D2+D42C24 =112(0) =2 G26 = D2+D62C26 = 122(0) = 3 G27 = D2+D72C27 =1+02(0) = 1 G13 = D1+D32C13 = 0+12(1) = 1 G14 = D1+D42C14 = 012(0) = 1 G16 = D1+D62C16 = 022(0) = 2 G17 = D1+D72C17 = 0+02(0) = 0 G53 = D5+D32C53 = 2+12(0) = 3 G54 = D5+D42C54 = 212(1) = 1 G56 = D5+D62C56= 222(0) = 0
G57 = D5+D72C57= 2+02(1) = 0
The last pair (a3, b3) is (1, 7) and the corresponding gain is G17.
Step 5: determine the values of X and y X = a1 = 1 and Y = b1=7
The new partition that will obtained from moving X
to B and Y to A is A={1,2,5,8} and B={3,4,6,7}.
The entire procedure is repeated again with this new partition as the initial partition. Verify that the second iteration of the algorithm is also the last, and that the best solution obtained is A={1,2,5,8} and B={3,4,6,7}.
The overall procedure is repeated with gain having maximum value was taken and then the cut size was calculated. There after the second pass was implemented, here we had locked the nodes with the maximum gain. This process is repeated for all the
passes and combinations. At the end of the above process we get the minimum cut size which in turns reduces the wire delay and increases the performance.
We observe from figure 1 that the cut size initially is
3 and thereafter finding the highest gain and swapping the nodes we had got the final partition to be as required.


CONCLUSIONS AND FUTURE WORK
The quality of the placeandroute algorithms has a direct bearing on the usefulness of the target FPGA architecture. The benefits of including powerful new features on an FPGA might be lost due to the inability of the placeandroute algorithms to fully exploit these features. Thus the advancement of FPGA architectures relies heavily on the development of efficient placeandroute algorithms. KL Algorithms increase the performance by reducing the wire delay. Further work is necessary in the use of klfeasible cuts for the optimization purpose. Analysis of an efficient algorithm for Place and route process would be done, in order to place the components efficiently and create a proper routing path between them on FPGAs. In this paper we have presented a new methodology for Digital circuits here example is considered as multiplier, which in turns reduces the area and increases the performance of the circuit type algorithms for the problem of hardware/software partitioning.
REFERENCES

Xilinx Inc., SpartanII 2.5 V FPGA Family: Introduction and Ordering Information, Xilinx Product Specification Datasheets, 2003.

Luca Sterpone, Student Member, IEEE, and Massimo Violante, Member, IEEE. A New ReliabilityOriented Place and Route Algorithm for SRAMBased FPGAs, IEEE TRANSACTIONS ON COMPUTERS, VOL. 55, NO. 6, JUNE 2006.

Chenguang Guo, Yanlong Zhang, Lei Chen, Tao Zhou, Xuewu Li, Min Wang, Zhiping WenDept. FPGA A Novel Application of FPGABased PartialDynamic Reconfiguration System with CBSC 2012 IEEE.

VirtexII Pro and VirtexII Pro X Platform FPGAs: Complete DataSheet, Xilinx Corporation, DS083 (v4.7) Nov. 5, 2007.

Osvaldo Martinello Jr, Felipe S. Marques, Renato P. Ribas, AndrÃ© I. Reis KLCuts:A New Approach for Logic Synthesis Targeting Multiple Output Blocks,777782.

Amr M. Fahim, LowPower HighPerformance Arithmetic Circuits and Architectures, IEEE Journal of SolidState Circuits, Vol. 37, No. 1, pp. 9094, January 2002.

Patterson and Hennessy, Computer Organization and Design, 3rd Edition, Morgan Kaufman, 2005.

John F. Wakerly Digital Design Principles and Practices. [9]S. Brown, FPGA Architecture Research: A Survey, IEEE Design and Test of Computers, pp. 915, Nov./Dec. 1996.

J. Rose, A. El Gamal, and A. SangiovanniVincetelli, Architecture of FieldProgrammable Gate Arrays, Proc. IEEE, vol. 81, no. 7, pp. 10131029, July 1993.

Amr M. Fahim, LowPower HighPerformance Arithmetic Circuits and Architectures, IEEE Journal of SolidState Circuits, Vol. 37, No. 1, pp. 9094, January 2002.

Xilinx, Achieving Higher System Performance with the Virtex5 Family of FPGAs, White Paper, 2006. http://www.xilinx.com.

Altera, Improving FPGA Performance and Area Using an Adaptive Logic Module, White Paper, 2004. http://www.altera.com.