 Open Access
 Total Downloads : 305
 Authors : Deepak Sharma, Parminder Singh
 Paper ID : IJERTV3IS041102
 Volume & Issue : Volume 03, Issue 04 (April 2014)
 Published (First Online): 01052014
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
A 65nm Parallel Prefix High Speed Tree based 64Bit Binary Comparator
International Journal of Engineering Research & Technology (IJERT)
ISSN: 22780181
Vol. 3 Issue 4, April – 2014
Er. Deepak sharma
Student MTECH, Yadavindra College of Engineering,
Punjabi University, Guru Kashi Campus,Talwandi Sabo.
Er. Parminder singh
Assistant Professor, Yadavindra College of Engineering,
Punjabi University, Guru Kashi Campus,Talwandi sabo.
Abstract We present a high speed and lower power consumption 64bit comparator based on parallel prefix tree based structure that leverages the final result outcome of the most significant bit, proceeding bitwise toward the least significant bit only when the input bits are equal. This method reduces complexity of design and dynamic power dissipation by eliminating unnecessary transitions. Simulation results show this proposed work is much faster and less power consumption design compare to previous comparator. The minimum measured time delay of proposed 64bit comparator is 231ps which is 62% and 49% less compare to implemented previous 64 bit comparators[7], [12] respectively and the average power consumption during operation of this 64bit comparator is 2.37Âµw which is 30% and 56% less compare to implemented previous 64bit comparators[7],

respectively. All simulations are done on Tanner Eda V13 using 65nm technology.

INTRODUCTION
The comparison of two bits is one of most widely used in computer systems, device interfaces for equality, scientific computation (graphics and image/signal processing), test circuit applications (jitter measurements, signature analyzers, and builtin self test circuits), and optimized equalityonly comparators for generalpurpose processor components (associative memories, loadstore queue buffers, translation lookaside buffers, branch target buffers, and many other CPU argument comparison blocks. Also it is an important arithmetic operation in DSP applications, digital filter. Digital signal processing (DSP) is one of most important units in electronic devices. Function of basic binary comparator is to compare the two bits suppose and it gives result in terms of A greater than B (A>B), A less than B (A<B) and A equal to B (A=B).
In past decade several comparators architecture are introduced. Mostly of these designs consume more power, take more time to operation and these were large in size. Recently treebased comparator that utilizes dynamic Manchester adders are introduced [7], [12]. These comparators had showed high performance compared to previous comparators. According to this structure comparator is divided into several building blocks and the output of ones output is given as input to another building block until final result output comes out. In this brief,
a new tree structure based 64bit comparator is modified from existing architecture [12] to improve its operating speed and
power efficiency from existing comparators is introduced in this work. Moreover, this work provides the highly performance 64bit comparator as comparison between comparators is shown and performance parameter for comparison taken are time delay, Average power consumption and number of transistor. This work is performed using Tanner Eda V13 as simulation tool software at 65nm technology modal file.
The rest of this brief is organized as follows: Section II reviews previous comparator works. Section III introduces and explains the proposed comparator implementation. IN section IV schematic view of all sets is shown and described. Section V describes the simulation setup and presents the delay and power simulation results. Finally, Section VI concludes with some final remarks and comments.

EXISTING COMPARATOR DESIGNS
A brief description of the design principles of existing comparator is provided here. Interested readers are referred to
[12] for a detailed overview. The comparison resolution module (which depicts the highlevel architecture of this design) is a novel MSBtoLSB parallelprefix tree structure that performs bitwise comparison of two Nbit operands A and B, denoted as AN1, AN2, . . ., A0 and BN1, BN2, . .., B0, where the subscripts range from N1 for the MSB to 0 for the LSB. The comparison resolution module performs the bitwise comparison asynchronously from left to right, such that the comparison logics computation is triggered only if all bits of greater significance are equal.
The parallel structure encodes the bitwise comparison results into two Nbit buses, the left bus and the right bus, each of which store the partial comparison result as each bit position is evaluated, such that
if Ak > Bk, then leftk = 1 and rightk = 0 if Ak < Bk , then leftk = 0 and rightk = 1 if Ak = Bk , then leftk = 0 and rightk = 0.
In addition, to reduce switching activities, as soon as a bitwise comparison is not equal, the bitwise comparison of every bit of lower significance is terminated and all such positions are set to zero on both buses, thus, there is never
more than one high bit on either bus. The decision module uses two ORnetworks to output the final comparison decision based on separate ORscans of all of the bits on the left bus (producing the L bit) and all of the bits on the right bus (producing the R
Fig. 1. Example 8b comparison.[12]
bit). If LR = 00, then A = B, if LR = 10 then A > B, if LR = 01 then A < B, and LR = 11 is not possible. An 8b comparison of input operands A = 01011101 and B = 01101001 is illustrated in Fig. 1. In the first step, a parallel prefix tree structure generates the encoded data on the left bus and right bus for each pair of corresponding bits from A and
B. In this example, A7 = 0 and B7 = 0 encodes as left7 = right7 = 0, A6 = 1, and B6 = 1 encodes as left6 = right6 = 0, and A5 = 0 and B5 = 1 encodes left5 = 0 and right5 = 1. At this point, since the bits are unequal, the comparison terminates and a final comparison decision can be made based on the first three bits evaluated. The parallel prefix structure forces all bits of lesser significance on each bus to 0, regardless of the remaining bit values in the operands. In the second step, the ORnetworks perform the bus ORscans, resulting in 0 and 1, respectively, and the final comparison decision is A > B.
The structure of this comparator is partitioned into five hierarchical prefixing sets, as depicted in Fig. 2, with the associated symbol presentations in Tables I, where each set
performs a specific function whose output serves as input to the next set, until the fifth set produces the output on the left bus and the right bus. All cells (components) within each set operate in parallel, which is a key feature to increase operating speed while minimizing the transitions to a minimal set of leftmost bits needed for a correct decision. This prefixing set structure bounds the components fanin and fan out regardless of comparator bitwidth and eliminates heavily loaded global signals with parasitic components, thus improving the operating speed and reducing power consumption. Additionally, the ORnetworks fanin and fan out is limited by partitioning the buses into 4b groupings of the input operands, thus reducing the capacitive load of each bus.
TABLE I
SYMBOL NOTATION AND DEFINITIONS [12]
Symbol (Cells)
Definition
N
Operand bitwidth
A
First input operand
B
R
Right bus result bit
L
Left bus result bit
Bitwise AND
Bitwise OR
T{}
Logic function of cell type
COMP{}
Complement function of set
This is based on using a novel parallel prefix tree (Tables I contain symbols). Each set or group of cells produces outputs that serve as inputs to the next set in the hierarchy, with the exception of set 1, whose outputs serve as inputs to several sets. Set 1 compares the Nbit operands A and B bitbybit, using a single level of type cells. The type cells provide a termination flag Dk to cells in sets 2 and 4, indicating whether the computation should terminate. These cells compute (where 0 k N 1)
: Dk = Ak Bk. (1)
Set 2 consists of 2type cells, which combine the termination flags for each of the four type cells from set 1 (each 2type cell combines the termination flags of one 4b partition) using NORlogic to limit the fanin and fanout to a maximum of four. The 2type cells either continue the comparison for bits of lesser significance if all four inputs are 0s, or terminate the comparison if a final decision can be made. For 0 m N/41, there is a total of N/4 2type cells, all functioning in parallel
i=4m
2:C2,m= COMP( 4m+3 Di) (2)
Set 3 consists of 3type cells, which are similar to 2type cells, but can have more logic levels, different inputs, and carry different triggering points. A 3type cell provides no comparison functionality; the cells sole purpose is to limit the fanin and fanout regardless of operand bitwidth. To limit the 3type cells local interconnect to four, the number
of levels in set 3 increases if the fanin exceeds four. Set 3 provides functionality similar to set 2 using the same NOR logic to continue or terminate the bitwise comparison
activity. If the comparison is terminated, set 3 signals set 4 to set the left bus and right bus bits to 0 for all bits of lower significance. For 0
Fig. 2. Implementation details for the comparison resolution module (sets 1 through 5) and the decision module of existing comparator design [12].
m N/4 1, there is a total of N/4 3type cells per level, with cell function and number of levels as
0
3:C3,m=COMP ( m C2,i) (3)
TABLE II
LOGIC GATE REPRESENTATIONS FOR SYMBOLS
From left to right, the first four 3type cells in set 3 combine the 4b partition comparison outcomes from the one, two, three, and four 4b partitions of set 2. Since the fourth 3type cell has a fanin of four, the number of levels in set 3 increases and set 3s fifth 3type cell combines the comparison outcomes of the first 16 MSBs with a fanin of only two and a fanout of one.
SED IN FIG. 2.[12]
: Yk = C3,[k/4]1Dk k1 k D
i=4 4 1
(4)
The number of inputs in the type cells increases from left to right in each partition, ending with a fanin of five. Thus, the type cells in set 4 determine whether set 5 propagates the bitwise comparison codes. Set 5 consists of N type cells (twoinput, 2bwide multiplexers). One input is (Ak, Bk) and the other is hardwired to 00. The select control input is based on the type cell output from set 4. We define the 2b as the leftbit code (Ak) and the rightbit code (Bk), where all leftbit codes and all rightbit codes combine to form the left bus and the right bus, respectively. The type cells compute (where 0 k N1)
: 1,0 =Yk Ã— Mk + Yk Ã— 00 (5)
The output 1,0 denotes the greaterthan, lessthan, or
equal to final comparison decision
If 1,0 = 00, for Ak = Bk
= 01, for Ak < Bk
= 10, for Ak > Bk .
Essentially, the 2b code 1,0 can be realized by ORing all left bits and all right bits separately, as shown in the decision module (Figs. 2 and 3), using an ORgate network in the form of NORNAND gates yielding a more optimum gate structure
f) Finally last modification is done in set 5, as this contains 2bit wide 2 inputs multiplexer and in this previous circuit shown for both MUX, one input is connected Ak, Bk and other is hardwired with ground or 00 and for this input it require only one NMOS to propagate 0 as NMOS transistor is good conductor to pass 0.
This comparator is based on using a novel parallel prefix
=
1 4j+3 1
1, k=4j
(6)
tree. These sets or group of cells generate outputs that are given as inputs to the next set in the hierarchy, except set 1,
1, k=4j
0 = 4j+3 0m (7)
The superscripts 1 and 0 in (6) and (7) denote the summation of the left and right bits, respectively, and the subscript 1 denotes the first level of ORlogic in the decision module that receives data directly from set 5. If we limit the fanin of each gate to four, the number LDM of the ORgate tree levels for the decision module is given by
LDM = [log4 N]. (8)

THE PROPOSED WORK
A novel scalable parallel prefix based structure that assisted the comparison outcome of the most significant bit, further propagate bitwise toward the least significant bit only when the compared bits are equal. This comparator is also divided in five sets similar to previous discussed comparator. The basic design principle is to similar to previous comparator [12]. The changes are done in sets and
its outputs given as inputs to more than one set. Detail working or function done by these sets is given below.
Set 1 contains the XOR gate. This set compares the Nbit A, B bitbybit, with a single level of N type cells. This type cells basically have a function of termination flag Dk to cells in sets 2 and 4, and these compute (where 0 k N
1)
Dk = Ak Bk. (9)
Set 2 consists of NOR type cells, this set contains the termination flags from each of the two set 1 (as two outputs of 2b partition of set 1 will be given as inputs to set 2 as a termination flags of one 2b partition) using NORlogic to limit the fan in and fan out to a maximum of two. These type cells either continue the comparison for bits of lesser significance if all two inputs are 0s, or terminate the comparison if a final decision can be made. For 0 m N/21, there is a total of N/2 this type of cell, all functioning in parallel. N is supposed number of N bit.
function of comparator. Changes done in existing architecture are given below:
C2,m= D2n D2n 1
(10)

The previous comparator architecture is divided into group of 4 parallel input bits and it works in group of four and in this proposed work the grouping of bits only 2. By doing this change the number of the maximum FAN IN and FAN OUT become three instead of five. This change helps comparator to work faster and reduce its design complexity.

Second change is done in set 1 of this comparator architecture [12]. In set 1of previous comparator XOR gate is used based on static logic and require 12 transistors to complete circuit and in this proposed comparator another type of XOR gate based on pass transistor gate is used. This XOR gate is taken from previous comparator [7]. In this work [7], XNOR gate is used and for proposed work complementary circuit of XNOR gate i.e. XOR gate is used. It takes only 6 transistors to complete circuit.

Due to change in grouping of bits, set 2 is also changed. Now this set becomes two input NOR instead of four input NOR gate.

In set 3 of proposed work, the pass transistor logic based AND gate is used instead of NOR gate used. It makes comparator faster, but this change does not change the function of comparator as inverter is removed from next stages.
Set 3 consists of AND type cells. This cell has two iputs and one output as a termination flag for set 4. As this cell is AND gate, it purely sensitive to 0. If any one of input is 0 it gives termination flag to stop process for the set 4. This set will differentiate for MSB from LSB. If in inputs bits MSB shows inequality. The comparison is terminated, set 3 signals set 4 to set the left bus and right bus bits to 0 for all bits of lower Significance. For 0 m N/2 1, there is a total of N/2 this type of cells per level, with cell function and number of levels as.
C3,m=C2,m* C3,m+1 (11)
Levels set 3 = ([log2 (N)]).
Set 4 consists of final decision type cells according to inputs from set 1, previous set1 and previous set4 and produces the select inputs of 2input multiplexer in set 5, which in turn drive both the left bus and the right bus. For this type cell and the 2b partition to which the cell belongs, bitwise comparison outcomes from set 1 provide information about the more significant bits in the cells set 5, that compute (0 k N 1)

In set 4 similarly due to change in grouping of bits also limit the FAN IN to three and it also reduce the size of this
S = C3,[k/2]1Dk k1 k D
i=2 2 1
(12)
circuit. Now this set work for maximum of 3 inputs instead of maximum 5 inputs used in reference architecture.
The number of inputs in this set increases from left to
right in each partition, ending with a fan in of five. Thus, set 5 in set 4 determine whether set 5 propagates the bitwise comparison. Set 5 consists of N type MUX (two input, 2bwide multiplexers). One input is (Ak, Bk) and the other is hardwired to 00. The select control input is output from
set 4, the 2b as the leftbit code (Ak) and the rightbit code (Bk), where all leftbit codes and all rightbit codes combine to form the left bus and the right bus, respectively. This cells compute (where 0 k N1).
Y0,1=Yk Ã— Mk + Yk Ã— 00 (13)
The output Y0,1 denotes the greaterthan, lessthan, or
equal to. Essentially, the 2b code Y0,1 can be realized by ORing all left bits and all right bits separately, as shown in the decision module, using an ORgate network in the form of NORNAND gates yielding a more optimum gate structure
Fig. 4. Schematic view of NOR gate set 2 of comparator
Figure 4 shows the schematic view for set 2 i.e. NOR gate using CMOS logic. This set contains 4 transistors. It has two inputs they are name as D0, D1 and one output provides inputs for set 4. Output of this circuit is


Schematic view of all sets and circuit of proposed
comparator
In this section the schematic view all sets and 4bit structure of comparator will be shown and discussed.
Fig. 3. Schematic view of XOR gate set 1 of comparator
Figure 3 shows the schematic view of first set i.e. XOR gate. This set contains 6 transistors instead of 12 used in previous comparator [12]. It has two inputs Ak, Bk and one output and provides inputs for set 2 and set 5. The relation between inputs and outputs are shown below.
Dk = A XOR B
C2,m = D0 D1
This output is activated when any of input is 1 from set 1. It has maximum of 2 fan in and maximum 2 fan out.
Fig. 5. Schematic view of AND gate set 3 of comparator
Figure 5 shows the schematic view of set 3. It uses AND gate logic using PASS transistor logic of comparator. This set contains 5 transistors. It has two inputs first is C2,m and second is C3,m+1. First input C2,m is from set 2 and m consider the number of set 2 is using. This circuit produces one output i.e. C3,m . The relation between inputs and outputs are shown below
C3,m = C3,m+1. C2,m
It has 2 fan in and 3 maximum fan out.
Fig. 6. Schematic view of second proposed 4bit comparator
Figure 6 shows the schematic view of second proposed 4bit
comparator. 4bit comparator requires 130 transistors to complete circuit. This circuit is shown for better understanding of architecture of circuit. Further it can be repeated again and again for more bit comparator. Figure 7 shows the schematic view of set 4 it uses NAND gate and inverters to inverts the inputs using CMOS logic. This set contains maximum 10 transistors. It has maximum three one of them is inverted inputs named as D0, D1, C3,m and one output S. Output of circuit is described below
S =D0+ D1 + C3,m
The D1 input is taken from previous set 1, D0 from current set 1 related to present input bits and C3,m from previous set 4. This set provides input to set 5 as select line for two 2:1 MUX. This circuit has maximum 3 fan in and 1 fan out.
Figure 8 shows the schematic view of set 5 i.e. two 2:1 MUX using CMOS logic of comparator. This set contains 8 transistors. It has three inputs Ak, Bk, one select line i.e. from set four and one output which provide output for decision blocks.
Y0 = S Ã— Ak + S Ã— 00 and Y1 = S Ã— Bk + S Ã— 00
Fig. 7. Schematic view of set 4 of comparator
TABLE III
COMPARISON OF SIMULATION RESULTS OF IMPLEMENTED 64BIT COMPARATOR DESIGNS.
SR.
NO.
Comparator
type
VDD
(V)
Tech.
file
Avg. power
Consumption (Âµw)
Delay
(ps)
Transistor
count
1
P.chuang et al. [7]
0.7
65nm
3.39
632
1206
2
S.A Hafeez et al. [12]
0.7
65nm
5.43
461
2864
3
Proposed work
0.7
65nm
2.37
231
2204
Fig. 8. Schematic view of Set 5 MUX.
This circuit has three PMOS transistors and five NMOS transistors at different channel width. The channel length of all transistors are kept 65nm.

Simulation results and waveform
All simulations are done by keeping same channel length of mosfet i.e. 65nm and different channel width which depend on type of circuits as shown in previous schematic views. The output waveform of comparator with respective to initial inputs is shown in figure 9. The simulated results of proposed 64bit comparator and two existed tree based 64bit comparators are shown in table III. The simulation of all comparators is done on TANNEREDA V13 software as a tool. The simulation environment for all implemented comparator is kept same as we use 65nm technology model file based parameter, same VDD supply i.e. 0.7v, same input bits pattern etc. Simulated results shows our comparators
better compare to other simulated comparators. The changes done on comparator makes comparator more efficient as due to change of grouping of bits from 4bit to 2bit make comparator more faster and reduce its complexity of architecture design and changes done in various sets reduce power consumption. The simulated results given in table III showed the operating speed of proposed 64bit comparator is 231ps which is 62% and 49% less compare to implemented previous 64bit comparators [7], [12] respectively and the average power consumption during operation of this 64bit comparator is 2.37Âµw which is 30% and 56% less compare to implemented previous 64bit comparators [7], [12] respectively.
Fig. 9. Waveform of proposed 64bit comparator

CONCLUSION

A new 65nm parallel prefix tree based 64bit comparator has been proposed. This design shows very good result and accurate outputs. The simulated results given in table III showed the operating speed of proposed 64bit comparator is 231ps which is 62% and 49% less compare to implemented previous 64bit comparators [7], [12] respectively and the average power consumption during operation of this 64bit comparator is 2.37Âµw which is 30% and 56% less compare to implemented previous 4bit comparators [7], [12] respectively. This comparator is very simple and easy to design and can be used for electronic circuits for better overall performance.
REFERENCES

J. L. Hennessy and D. A. Patterson, Computer Architecture a Quantitative Approach. Morgan Kaufmann Publsihers, 2002.

J. Eyre and J. Bier, DSP processors hit the mainstream, IEEE Computer, pp. 5159, 1999.

C. C. Wang, C. F. Wu, K. C. Tsai, 1 GHz 64bit highspeed comparator using ANT dynamic logic with twophase clocking IEE Proc. Computers and Digital Techniques, Volume 145, Issue 6, Nov.1998, pp:433 436.

C. H. Huang, J. S. Wang, Highperformance and powerefficient CMOS comparators IEEE Journal of SolidState Circuits, Volume 38, Issue 2, Feb. 2003, pp.:254 262.

H. M. Lam, C. Y. Tsui, Highperformance single clock cycle CMOS comparator IEE Electronics Letters, Volume 42, Issue 2, Jan. 2006, pp:75 77.

S. Perri, P. Corsonello, Fast LowCost Implementation of Single Clock Cycle Binary Comparator IEEE Transactions on Circuits and Systems II: Express Briefs, Volume 55, Issue 12, Dec. 2008, pp:1239 1243.

Pierce Chuang, David Li, and Manoj Sachdev, Fellow, IEEE A Low Power HighPerformance SingleCycle Tree Based 64Bit Binary Comparator IEEE Transactions On Circuits and SystemsII: Express Briefs, Vol.59, No. 2,February 2012.

S. Perri and P. Corsonello, Fast lowcost implementation of single clockcycle binary comparator, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 55, no. 12, pp. 12391243, Dec. 2008.

F. Frustaci, S. Perri, M. Lanuzza, and P. Corsonello, A new lowpower highspeed singleclockcycle binary comparator, in Proc. IEEE Int.Symp. Circuits Syst., 2010, pp. 31 7320.

K. Furuya, Design methodologies of comparators based on parallel hardware algorithms, in Proc. ISCIT, 2010, pp. 591596.

N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective., 3rd ed. Reading, MA: AddisonWesley, May 2004.

S. A. Hafeez, A. G. Ross and B. Parhami, Fellow, IEEE Scalable Digital CMOS Comparator Using a Parallel Prefix Tree IEEE Transaction on Very Large Scale Integration (VLSI) Systems September 13, 2012.
Deepak Sharma is currently pursuing MTech degree in Electronics & Communication Engineering at Department of E&CE of Yadavindra College of Engineering, Talwandi Sabo. His research interests are in high speed and low power digital VLSI circuit.
Er. Parminder Singh Jassal received the MTech degree in VLSI Design from C DAC, Mohali Punjab Technical University in 2006. He is currently working as Assistant Professor in Department of Electronics and Communication Engineering at Yadavindra College of Engineering, Punjabi University Guru kashi Campus, Talwandi Sabo. He was previously working at NIT Hamirpur as a VLSI faculty under special Manpower development in VLSI Area (SMDPII). His main research interests are in High speed digital VLSI circuits, Low power, synthesis and Simulation of digital circuits and FPGA Implementation.