 Open Access
 Total Downloads : 1480
 Authors : Debarshi Datta, Partha Mitra, Avisek Sen
 Paper ID : IJERTV1IS10422
 Volume & Issue : Volume 01, Issue 10 (December 2012)
 Published (First Online): 28122012
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Low Power 40bit SQRT Carry Select Adder
Debarshi Datta1, Partha Mitra2, Avisek Sen3
1,2,3Assistant Professor of SDET Brainware Group of Institutions
Abstract
Digital processor requires high speed and low power MultiplierAccumulator (MAC Unit). Adder circuit is the main building block in DSP processor. However, Digital adders suffer with the problem of carry propagation delay. To alleviate this problem Carry Select Adder (CSLA) are used in computational unit. There is scope to reduce the power consumption in the regular CSLA. A simple gate level modification is required of the regular CSLA to reduce the power. This paper proposes modified 40bit squareroot CSLA (SQRT CSLA) architecture. Both the regular and modified 40bit CSLA are designed with TSMC 0.13
Âµm CMOS process technology. The proposed design has reduced area and power as compared with the regular SQRT CSLA with only slightly increases in the delay.
Keywords CSLA, DSP processor, Low power, MAC Unit, Power Delay Product, VLSI

Introduction
Due to the rapid growth of portable electronic component the low power arithmetic circuits have become very important in VLSI industry. Multiplier Accumulator (MAC) unit is the main building block in DSP processor. Full Adder is a part of the MAC unit can significantly affect the efficiency of whole system. Hence the reduction of power consumption of Full Adder circuit is necessary for low power application. Carry Select Adder are used for high speed application by reducing propagation delay. All manuscripts must be in English. These guidelines include complete descriptions of the fonts, spacing, and related information for producing your proceedings manuscripts.
The basic operation Carry Select Adder (CSLA) is parallel computation. CSLA generates many carriers and partial sum [1]. The final sum and carry are selected by multiplexers [mux].
Multiple pairs of Ripple Cary Adders (RCA) are used in CSLA structure. Hence, the CSLA is not area efficient. In this paper, we proposed a modified CSLA architecture.
The proposed method use Binary to Excess1 converter (BEC) instead of RCA with Cin=1 in the regular CSLA. The main goal of this BEC logic is to use lesser number of logic gate than the nbit Full Adder. So that, the modified CSLA architecture is lower area and power consumption [2][4]. The details of the BEC logic are discussed in Section III.
This paper is organized as follows. Section II presents the delay evaluation methodology of basic adder block. The structure and the function of the BEC logic come from the Section III. The SQRT CSLA has been chosen for comparison with the proposed design as is has more balanced delay and need lower power [5][6]. The delay evaluation methodologies of the regular and modified SQRT CSLA are presented in Sectioned IV and V, respectively. Section VI reviews the results obtained from the simulations and Section VII concludes this work.
Fig. 1. Delay evaluation of an XOR ga t e .
3. BEC Logic Gate
TABLEI::DELAY AND AREA COUNT OF THE BASIC BLOCKS
OF CSLA
Adder blocks
Delay
Area
XOR
3
5
2:1 Mux
3
4
Half Adder
3
6
Full Adder
6
13
Fig. 2. 4b BEC.

Delay and Area Evaluation Methodology of the Basic Adder Block
An XOR gate consists of basic gates like AND, OR, and Inverter (AOI) shown in Fig.1. The gates are performing parallel operation between the dotted line and the numeric representation of each gate indicates the delay contributed by that gate. For the delay and area evaluation methodology all the gates having equal to 1 unit delay and 1 unit area. The maximum delay can be finding out by adding gates of a longest path of a logic block. Based on this approach, the CSLA blocks of 2:1 mux, Half Adder (HA), and Full Adder (FA) are evaluated and listed in Table I.
The proposed method uses BEC logic. The regular CSLA structure consists of two Ripple Carry Adders (RCA). One of RCA use with initial carry Cin=0 and with carry Cin=1. BEC is use instead of RCA with Cin=1 in order to reduce and power consumption of the regular CSLA. To replace the nbit RCA, an n+1 bit BEC is required. The structure of a 4bit BEC is shown in Fig. 2 and Table II shows its corresponding Boolean expression.
From Fig. 3 shows the 4bit BEC and a 8:4 multiplexer perform the basic function of CSLA. One input of the mux is direct input (B3, B2, B1, and B0) and another input of the mux is the BEC output. This produces the two possible partial results in parallel and the mux is used to select either the BEC output or the direct inputs according to the control signal Cin. The Boolean expressions of the 4bit BEC are shown below (note the functional symbols ~ NOT, & AND, ^ XOR).
X0 = ~ B0, X1 = B0 ^ B1,
X2 = B2 ^ (B0 & B1), X3 = B3 ^ (B0 & B1 & B2).
Fig. 3. 4b BEC with 8:4 mux
TABLE II: FUNCTION TABLE OF THE 4BIT BEC
B[3:0] 
X[3:0] 
0000 
0001 
0001 
0010 
0010 
0011 
.. 
.. 
.. 
.. 
1110 
1111 
1111 
0000 
Mux = 12(3 * 4).

Delay and Area Evaluation Methodology of Regular 16bit SQRT CSLA
The 16b regular SQRT CSLA structure is shown in Fig. 4. It has five groups of different size RCA. Fig. 5 shows the delay and area evaluation. The numerals within [] specify the delay values. The steps leading to the evaluation are as follows.

The group2 [see Fig. 5(a)] requires two sets of 2bit RCA. Delay calculation on considering the Table I, the arrival time of selection input c1[time(t) = 7] of 6:3 mux is earlier than s3[ t
= 8] and later than s2[t=6]. Thus, sum3[t = 11] is summation of s3 and mux[ t = 3] and sum2[t = 10] is summation of c1 and mux.

The delay of group3 to group5 is determined, respectectively as follows:
{c6, sum [6:4]} = c3 [t = 10] + mux
{c10, sum [10: 7]} = c6 [t = 13] + mux
{count, sum[15 : 11]} = c10[t = 16] + mux

The one set of 2bit RCA in group2 has 2 FA for Cin = 1 and the other set has 1 HA for Cin
= 0. As if the area consideration of Table I, the total number of gate can be calculated as follows:
Gate count = 57 (FA + HA + Mux) FA = 39 (3 * 13),
HA = 6 (1 * 6),
Fig. 4. Regular 16bit SQRT CSLA
Fig. 5. Delay and area evaluation of regular SQRT CSLA: (a) group2, (b) group3, (c) group4, and (d) group5. F is Full Adder.

Similarly, the maximum delay and area can be calculated of the other groups in the
regular SQRT CSLA are evaluated in Table III.
TABLE III: DELAY AND AREA COUNT OF REGULAR SQRT CSLA GROUPS
Group
Delay
Gate Count
Group2
11
57
Group3
13
87
Group4
16
117
Group5
19
147


Delay and Area Evalution Methodology of Modified 16bit SQRT CSLA
The Modified 16bit SQRT CSLA is shown in Fig.

RCA with Cin = 1 is replaced by BEC logic gates. The evaluation procedures are as follows:

The group2 [see Fig. 7(a)] has one 2bit RCA which has 1 FA and 1 HA for Cin = 0. A 3bit BEC is used in place of another 2bit RCA with Cin = 1.The 3bit RCA adds one to the output from 2bit RCA. Delay consideration as on Table I, the arrival time of selection input c1[time (t) = 7] of 6:3 mux is earlier than the s3[t=9] and c3[t = 10] and later than the s2[t = 4]. Thus, the sum3 and final c3 (output from mux are depending on s3 and mux and partial c3 (input to mux) and mux, respectively. The sum2 depends on c1 and mux.

The area count of group2 is calculated as follows:
Gate count = 43 (FA + HA + Mux + BEC) FA = 13 (1 * 13), HA = 6 (1 * 6),
AND = 1, NOT = 1
XOR = 10 (2 * 5), Mux = 12 (3 * 4)
Fig. 6. Modified 16b SQRT CSLA. The parallel RCA with Cin=1 is replaced with BEC.
Fig. 7. Delay and area evaluation of modified SQRT CSLA: (a) group2, (b) group3, (c) group4, and (d) group5. H is Half Adder.

The maximum delay and the area of the modified SQRT CSLA are evaluated in Table IV.
TABLE IV: DELAY AND AREA COUNT OF MODIFIED SQRT CSLA
(PDP) by 15.6%. The adder circuit is operated at 125MHz and supply voltage 1.5V.


Conclusion
In this paper, a modified 40bit SQRT CSLA has been proposed for data path circuit (MAC unit) for low power DSP application. Table V shows that modified CSLA has reduced the powerdelay product (PDP) as compare with regular CSLA with slightly increase in delay. Therefore these modified 40bit SQRT CSLA architecture can be used for low power high speed DSP processor.
Group 
Delay 
Gate Count 
Group2 
13 
43 
Group3 
16 
61 
Group4 
19 
84 
Group5 
22 
107 
Acknowledgment
Comparing Tables III and IV, it is clear that proposed modified SQRT CSLA saves 113 gate counts than regular SQRT CSLA, with only 11 increases in gate delays.
6. Simulation Results
The proposed 40bit SQRT CSLA has been developed using TSMC 0.13Âµm CMOS process technology.
TABLE V: COMPARISON OF THE REGULAR AND MODIFIED 40BIT SQRT CSLA ARCHITECTURE
Type of Adders 
Supply Voltage (V) 
Delay (ns) 
Switching Power (Âµw) 
Power Delay Product (1015J) 
Regular CSLA 
1.5 
5.986 
1283.7 
7684.2 
Modified CSLA 
1.5 
6.316 
1057.5 
6488.8 
The Table V shows power consumption of the proposed architecture with slight increase in propagation delay. The modified 40bit SQRT CSLA has an improvement in the PowerDelay Product
The authors would like to thank Advanced VLSI Design Lab, IIT Kharagpur for their cooperation and support.
References

O. J. Bedrij, Carryselect adder, IRE Trans.
Electron. Comput., pp.340344, 1962

B. Ramkumar, H. M. Kittur, and P. M. Kannan, ASIC implementation of modified faster carry save adder, Eur. J. Sci. Res., vol. 42, no. 1, pp.53 58, 2010

T. Y. Ceiang and M. J. Hsiao, Carryselect adder using single ripple carry adder, Electron. Lett., vol. 34, no. 22, pp. 21012103, Oct. 1998

Y. Kim and L.S. Kim, 64bit carryselect adder with reduced area, Electron. Lett., vol. 37, no. 10, pp. 614615, May 2001.

J. M. Rabaey, Digtal Integrated CircuitsA Design Perspective. Upper Saddle River, NJ: Prentice Hall, 2001.

Cadence, Encounter user guide, Version 6.2.4, March 2008