 Open Access
 Total Downloads : 1008
 Authors : P. Yasodha Devi, S. Kalaivani, S. Manimegalai, Andril Alagusabai
 Paper ID : IJERTV2IS110751
 Volume & Issue : Volume 02, Issue 11 (November 2013)
 Published (First Online): 02122013
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Design of FixedWidth Multiplier Using BaughWooley Algorithm
P. Yasodha Devi1, S. Kalaivani2, S. Manimegalai3, Andril Alagusabai4
1(Department of ECE, Muthayammal College of Engineering, India) 2(Department of ECE, Muthayammal College of Engineering, India) 3(Department of ECE, Muthayammal College of Engineering, India) 4(Department of EEE, Muthayammal College of Engineering, India)
Abstract
This paper focuses on the design of fixedwidth multipliers with BaughWooley algorithm. The fixed width multipliers derived from BaughWooley algorithm produce nbit output product with nbit multiplier and nbit multiplicand. In previous papers the fixedwidth signed multipliers are designed by using both halfadders and fulladders. In this paper we have designed the fixed width BaughWooley multiplier by using only an array of fulladders and the final Ripple Carry Adder (RCA). We have implemented the Multiply and Accumulate (MAC) unit using fixed width signed multiplier and fixed width BaughWoolley multiplier. The main contribution of the proposed work is to reduce the area and power by designing the fixed width BaughWooley multiplier. We also present the comparison results of parameters like power, area, gates and delay.
Keywords: FixedWidth Multiplier, MAC Unit, Multiplication, Power.

Introduction
Multiplications occur frequently in digital signal processing systems, communication systems, and other application specific integrated circuits. The diversity of application areas for multipliers and the ubiquity of multiplication in digital systems exhibit a variety of requirements for speed, area, power consumption, and other specifications [1]. Parallel multipliers are fundamental building blocks in multimedia and digital signal processing systems. In many applications, the inputs and the outputs of the multiplier have the same bit width. These circuits are denoted in literature as fixedwidth multipliers or truncated multipliers. The most obvious way to design a fixedwidth multiplier
uses nÃ—n a fullwidth multiplier, whose output is truncated/rounded to n bits by discharging the less significant bits of the products [6].
In fixed width multiplication the product is truncated to nbits, the leastsignificant columns of the product matrix contribute little to the final result. To take advantage of this, truncated multipliers do not form all of the leastsignificant columns in the partialproduct matrix [5]. Output is the weighted sum of partial products. By eliminating more columns the area and power consumption of the arithmetic unit are significantly reduced & in many cases the delay also decreases [2][4]. Area saving of a fixed width multiplier can be achieved by directly truncating n least significant columns and preserving n most significant columns.
In the design of digital signal processing systems, where singleprecision results are required, the power dissipation and area of parallel multipliers can be significantly reduced by truncating the less significant columns and compensating to produce an approximate rounded product [7]. This dissertation presents the design of truncated multiplications of signed inputs utilizing BaughWooley algorithm, the negative fractional two's complement number system which solves an inherent problem of the conventional two's complement number system.
The BaughWooley (BW) algorithm is a relatively straightforward way of doing signed multiplications. The fixedwidth property, however, can be exploited to simplify the multiplier structure [9][10], with the aim of improving power and speed. Basically, one can discard some of the partialproducts in the partial products array to reduce the circuit complexity, at a price in terms of accuracy.

FIXEDWIDTH SIGNED MULTIPLIER
In previous work we designed the fixedwidth signed multipliers with linear compensation function by investigating in detail the effect of coefficients quantization [8]. New fixedwidth multipliers are obtained by varying the quantization schemes such as uniform and nonuniform quantization scheme. In uniform quantization scheme all coefficients are quantized with same bit. In nonuniform quantization scheme some of the coefficients are quantized with some bits and remaining coefficients are quantized with some another bits.
Fig. 1 Signed multiplication with negative and positive partial products
Fig. 1 shows the signed multiplication with negative and positive partial products.
Fixedwidth signed multiplier consumes more power and area. Since the multiplier are implemented by using both full adder and half adder [8]. In proposed work we reduced the power and area of the multiplier by designing the fixedwidth BaughWooley multiplier [11]. BaughWooley multiplier uses only full adders for implementation. The main contribution of the proposed work is to reduce the area and power by designing the fixedwidth BaughWooley multiplier. BaughWooley multiplier simplifies the multiplier structure and wiring layout.

FIXEDWIDTH BAUGHWOOLEY MULTIPLIER

Baugh – Wooley multiplier
It is an efficient way to handle the sign bit. Baugh Wooley multiplier uses only full adders. All bit products are generated in parallel & collected through an array of full adder & final Ripple Carry Adder (RCA). The creation of the reorganized partialproduct array comprises three steps:

The most significant bit (MSB) of the first N1 partialproduct rows and all bits of the last partial product row, except its MSB, are inverted.

A 1 is added to the Nth column.

The MSB of the final result is negated
Fig. 2 shows the partial product array diagram for nÃ—n Baugh Wooley multiplier.


Partialproduct generation
The PPs are generated using multiplexers or AND gates in an unsigned radix2 shiftandadd multiplication [12][13]. For multiplication of signed magnitude numbers, the unsigned multiplication core can be used for the magnitude part of the inputs, with an extension that the sign bit is computed separately by checking the two input operands sign bits.
The sign bit in twos complement numbers has a negative weight, the entry a7b0 term can be written in terms of a7b0, a7b0 = a7(1b0) a7 = a7b0 a7 (1)
Fig. 2 Partial product array diagram for nÃ—n Baugh
Wooley multiplier
where a multiplicand and b multiplier.
Hence, the term a7b0 is replaced with a7b0 and a7. If a7 is used instead of a7, the column sum increases by 2a7. Thus, a7 must be inserted in the next higher column in order to compensate the effect of 2a7. Thesameisdonefor a7b1, a7b2, a7b3, a7b4, a7b5, a7b6.
In each column, a7 and a7 cancel each other out. The p14 column gets a a7 entry, which is replaceable by a7 1.This can be repeated for all entries, yielding to the insertion of b7 in the p7 column, and b71 in the p14 column. There are two 1s in the 14th column now, which is equivalent to a 1 entry in p15 and that can be replaced with a 1 and a borrow into the nonexisting tenth column.
BaughWooley method increases the height of the longest column by 2, which may lead to a greater delay through the RCA tree [14]. In the given example of Fig. 3, column height changes from 8 to 9, requiring an extra RCA level. Removing b7 from eighth column and writing two b7 entries in the sixth column, which has only four entries, can reduce the extra delay caused by the additional RCA level. Thus, the maximum number of entries in one column becomes 9, which cn be implemented withfivelevel RCA tree.
Alternatively, all negatively weighted a7bj terms can be transferred to the bottom row, which leads to two negative numbers in the last two rows, where a subtraction operation from the sum of all the positive elements is necessary [15]. Instead of subtracting a7b, twos complement of a can be added b7 times. This method is known as the Modified BaughWooley algorithm.
Modified form of the Baugh – Wooley method, shown in Fig. 4, is more preferable since it does not increase the height of the columns in the matrix.
Fig. 1 depicts the product computation for 2s complement numbers. In order to avoid the complication that subtraction or sign extension will introduce in most applications [16], Baugh and Wooley proposed an efficient method for 2s complement multiplication that handles subtractions by taking the 2s complement of the terms to be subtracted. In 2s complement representation negation of an integer word
x can be obtained by the formula x = comp(x+1);
where comp(x) is bitwise inversion of the bits in word
x. Fig. 3 illustrates the PPM formation using Baugh Wooleys method. BaughWooleys strategy increases the maximum column height by 2 and this can
potentially increase the accumulation delay. Fig. 4 illustrates the modified BaughWooley multiplier.
.
Fig. 3 BaughWooley multiplier
The Modified BaughWooley algorithm removes the need for negatively weighted bits present in the traditional 2scomplement multiplication algorithm by modifying the most significant bit of each partial product and the last row of partial products, and by adding two extra bits to the partial product matrix. This allows for summation of the partial products without using special adders equipped to handle negative inputs and without increasing the height of a tree of 3input, 2output ripple carry adders.
Fig. 4 Modified BaughWooley multiplier
The leastsignificant columns of fullwidth Baugh Wooley multiplier shown in Fig. 4 is truncated to yield the 8bit output in 8Ã—8 multiplier. The uniform quantization scheme is applied to both fixedwidth signed multiplier and BaughWooley multiplier to reduce the partialproducts [17][20]. In uniform quantization scheme some of the coefficients of the partial products are quantized with two bits, while the remaining coefficients are quantized with single bit.
Since the truncated multipliers do not form all of the leastsignificant columns in the partial product matrix. Output is the weighted sum of partialproducts. The final output consists of 12 bits including truncated bits of four bits.


APPLICATION TO MAC UNIT
In order to verify the performances of proposed fixedwidth multipliers in a real DSP application, we have implemented a Multiply and Accumulate (MAC) unit, as shown in Fig. 5.
Fig. 5 Implementation of MAC using fixed width modified Baugh Wooley multiplier
In computing, especially digital signal processing, the multiplyaccumulate operation is a common step that computes the product of two numbers and adds that product to an accumulator. The hardware unit that performs the operation is known as a multiplier accumulator (MAC, or MAC unit); the operation itself is also often called a MAC or a MAC operation [1]. Refer to (2) the MAC operation modifies an accumulator a.
a a + (bÃ—c) (2)
The multiplication is computed by using a fixed width multiplier with n=8bits input and output bits (see Fig. 5), and 4 guard bits are added to the accumulator in order to avoid the overflow. The final output is rounded back to 8 bits [1]. The rounding is realized by initializing the accumulator with the rounding constant before each accumulation.
The MAC implementations using fixedwidth multipliers exhibit better performance compared to fixedwidth signed multipliers which uses both half adders and full adders. The parameters like area, delay, power and the number of gates used are reduced by using the fixedwidth Baugh Wooley multiplier.

RESULTS

We have obtained the simulation results by using Cadence digital tool ncsim which is a 180nm technology. We have taken the power report, area report, gates report in rclabs digital tool. Fixedwidth signed multiplier uses both full adder and half adder for implementation but the fixedwidth BaughWooley multiplier uses only full adder for implementation. So the parameters like power, area will be reduced for fixedwidth BaughWooley multiplier.
We have presented the comparison results of these parameters in the table below. Fig.6 shows the simulation result of 8bits fixedwidth signed multiplier using Cadence digital tool ncsim.
Table I, shows that the total power required for the implementation of MAC unit using both fixedwidth signed multiplier and BaughWooley multiplier. The total power is calculated by adding leakage power and dynamic power. The fixedwidth BaughWooley multiplier consumes less power than the fixedwidth signed multiplier which uses uniform quantization scheme. Table II and III shows the comparison report of area, and gates respectively.
Fig.6. Simulation result of 8bits fixedwidth signed multiplier
Fig 7. Simulation result of 8bits fixedwidth Baugh Wooley multiplier
Fig 7. shows the simulation result of 8bits fixed width BaughWooley multiplier using Cadence digital tool ncsim.
Fig.8. Implementation of MAC unit using 8bits fixedwidth signed multiplier
Fig.8 shows the simulation result of implementation of MAC unit using 8bits fixedwidth signed multiplier Cadence digital tool ncsim.
The power and area of the proposed fixedwidth BaughWooley multiplier is reduced than the fixed width signed multiplier. The design of fixedwidth BaughWooley multiplier uses only an array of full adders and a final RCA. Similarly the area and the number of gates used for the implementation will also be reduced for proposed fixedwidth BaughWooley multiplier.
Fig.9 shows the simulation result of implementation of MAC unit using 8bits fixedwidth BaughWooley multiplier using Cadence digital tool ncsim. The output of MAC unit is rounded back to 8bits.
Fig.9. Implementation of MAC unit using 8bits fixedwidth BaughWooley multiplier
TABLE I
POWER COMPARISON REPORT
Multiplier 
Leakage power (nw) 
Dynamic power (nw) 
Total power (nw) 
MAC unit using fixed signed width multiplier 
7804.050 
77143.432 
84947.482 
MAC unit using fixed width Baugh Wooley multiplier 
7583.413 
76272.773 
83856.187 
The parameters like power, area and the number of gates used for implementation of MAC unit are calculated using Cadence digital tool rclabs. The power, area and the number of gates used for the fixed width BaughWooley multiplier is reduced.
TABLE II
AREA COMPARISON REPORT
Type 
Cells 
Cell area (gate count) 
MAC unit using fixedwidth signed multiplier 
182 
1503 
MAC unit using fixedwidth BaughWooley multiplier 
186 
1495 
TABLE III
GATES COMPARISON REPORT
Multiplier 
Type 
Instances 
Area 
Sequential 
13 
132.653 

MAC unit 
Inverter 
20 
43.042 
using fixed 

width signed 
Buffer 
1 
2.822 
multiplier 

Logic 
148 
1324.411 

Total 
182 
1502.928 

Sequential 
13 
132.653 

MAC unit 
Inverter 
22 
47.275 
using fixed 

width Baugh 
Buffer 
1 
2.822 
Wooley 

multiplier 
Logic 
150 
1312.416 
Total 
186 
1495.166 
VIII. CONCLUSION
In this paper we have designed the fixedwidth multiplier using BaughWooley algorithm. The Baugh Wooley multiplier has the regular structure that simplifies the wiring and the layout. The BW algorithm is a relative straightforward way of doing signed multiplications. Both fixedwidth signed multiplier and fixedwidth BaughWooley multiplier have been designed using uniform quantization scheme. Fixed width signed multiplier is designed using both half adders and full adders which will leads to more power and area consumption. In fixedwidth BaughWooley multiplier only full adders and a final RCA are used. So the power and area consumed by the fixedwidth BaughWooley multiplier is reduced than the fixed width signed multiplier. Also the implementation of MAC unit using fixedwidth BaughWooley multiplier requires less area and power.
REFERENCES

Nicola Petra,Davide De Caro,Valeria Garofalo,Ettore Napoli,Antonio Giuseppe Mario Strollo,Design of Fixed Width Multipliers with Linear Compensation Function , IEEE Trans. Circuit Syst, vol 58,no.5, May 2011.

N. Petra, D. De Caro, and A.G.M. Strollo, Design of fixed width multipliers with minimum mean square error, in Proc. IEEE Eur. Conf.Circuits Theory Des. (ECCTD 2007), Sevilla, Spain, Aug. 2007, pp.464467.

R. Michard, A. Tisserand, and N. VeyratCharvillon, Carry prediction and selection for truncated multiplication, in Proc. Workshop Signal Process. Syst. (SiPS), Banff, AB, Canada, Oct. 2006, pp. 339344.

S. Das and S. P. Khatri, Generation of the optimal bitwidth topology of the Fast Hybrid adder in a Parallel Multiplier, in IEEE Int. Conf.Integr. Circuit Des. Technol. (ICICDT 07), May 2007, pp. 16.

J. E. Stine andO.M.Duverne, Variations on truncated multiplication,in Proc. Euromicro Symp. Digit. Syst. Des., 2003, pp. 112119.

C Magnus SjÃ¤lander and Per LarssonEdefors, Multiplication Acceleration Through Twin Precision. In IEEE Transactions on Very Large Scale Integration(VLSI) Systems, vol. 17, NO. 9, september 2009.

L. Van and C. Yang, Generalized lowerror areaefficient fixedwidth multipliers, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 8, pp. 16081619, Aug. 2005.

V. Garofalo,Truncated binary multipliers with minimum mean square error: Analytical characterization, circuit implementation and applications, Ph. D. dissertation, Dept. Electron. Eng., Univ. Napoli Federico II, Napoli, Italy, 2009.

S. S. Kidambi, F. ElGuibaly, and A. Antonious, Areaefficient multipliers for digital signal processingapplications,IEEE Trans. Circuits and Systems II, Analog Digit. Signal Process., vol. 43, no. 2, pp. 9095,Feb. 1996.[10]

E. J. King and E. E. Swartzlander, Jr, Data dependent truncated scheme for parallel multiplication, in Proc. 31th Asilomar Conf. Signals, Circuits Syst., 1997, pp. 11781182.

H. Park and E. E. Swartzlander, Jr, Truncated multiplication with symmetric correction, in Proc. Asilomar Conf. Signals, Systems Comput. (ACSSC), Oct. 2006, pp. 931934.

S. R. Kuang and J. P. Wang, Lowerror configurable truncated multipliers for multiplyaccumulate applications, Electron. Lett., vol. 42, no. 16, pp. 904905, Aug. 2006.

D. De Caro and A. G. M. Strollo, Highperformance direct digital frequency synthesizers using piecewisepolynomial approximation, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 2, pp. 324337, Feb. 2005.

V. G. Oklobdzija, D. Villeger, and S. S. Liu, A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach, IEEE Trans. Comput., vol. 45, no. 3, pp. 294306, Mar. 1996.

P. F. Stelling, C. U. Martel, V. G. Oklobdzija, and R. Ravi, Optimal circuits far parallel multipliers, IEEE Trans. Comput., vol. 47, no. 3, pp. 273285, Mar. 1998.

N. Petra, D. De Caro, V. Garofalo, E. Napoli, and A. G. M. Strollo,Truncated binary multipliers with variable correction and minimum mean square error, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57,no. 6, pp. 13121325, Jun. 2010.

Y. C. Lim, Singleprecision multiplier with reduced circuit complexity for signal processing applications, IEEE Trans. Comput., vol. 41, no. 10, pp. 13331336, Oct. 1992.

M. J. Schulte and E. E. Swartzlander, Jr, Truncated multiplication with correction constant (for DSP), in Proc. Workshop VLSI SignalProcess., Oct. 1993, pp. 388396.

J. M. Jou, S. R. Kuang, and R. D. Chen, Design of lowerror fixedwidth multipliers for DSP Applications, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 46, no. 6, pp. 836842, Jun. 1999.

L. Van, S. Wang, and W. Feng, Design of the lower error fixedwidth multiplier and its application, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 47, no. 10, pp. 11121118, Oct. 2000. Circuits Syst. I, Reg. Papers, vol. 57,no. 6, pp. 13121325, Jun. 2010.

Y. C. Lim, Singleprecision multiplier with reduced circuit complexity for signal processing applications, IEEE Trans. Comput., vol. 41, no. 10, pp. 13331336, Oct. 1992.

M. J. Schulte and E. E. Swartzlander, Jr, Truncated multiplication with correction constant (for DSP), in Proc. Workshop VLSI SignalProcess., Oct. 1993, pp. 388396.

J. M. Jou, S. R. Kuang, and R. D. Chen, Design of lowerror fixedwidth multipliers for DSP Applications, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 46, no. 6, pp. 836842, Jun. 1999.

L. Van, S. Wang, and W. Feng, Design of the lower error fixedwidth multiplier and its application, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 47, no. 10, pp. 11121118, Oct. 2000.