Design of Fixed-Width Multiplier Using Baugh-Wooley Algorithm

DOI : 10.17577/IJERTV2IS110751

Download Full-Text PDF Cite this Publication

Text Only Version

Design of Fixed-Width Multiplier Using Baugh-Wooley Algorithm

P. Yasodha Devi1, S. Kalaivani2, S. Manimegalai3, Andril Alagusabai4

1(Department of ECE, Muthayammal College of Engineering, India) 2(Department of ECE, Muthayammal College of Engineering, India) 3(Department of ECE, Muthayammal College of Engineering, India) 4(Department of EEE, Muthayammal College of Engineering, India)

Abstract

This paper focuses on the design of fixed-width multipliers with Baugh-Wooley algorithm. The fixed width multipliers derived from Baugh-Wooley algorithm produce n-bit output product with n-bit multiplier and n-bit multiplicand. In previous papers the fixed-width signed multipliers are designed by using both half-adders and full-adders. In this paper we have designed the fixed width Baugh-Wooley multiplier by using only an array of full-adders and the final Ripple Carry Adder (RCA). We have implemented the Multiply and Accumulate (MAC) unit using fixed width signed multiplier and fixed width Baugh-Woolley multiplier. The main contribution of the proposed work is to reduce the area and power by designing the fixed- width Baugh-Wooley multiplier. We also present the comparison results of parameters like power, area, gates and delay.

Keywords: Fixed-Width Multiplier, MAC Unit, Multiplication, Power.

  1. Introduction

    Multiplications occur frequently in digital signal processing systems, communication systems, and other application specific integrated circuits. The diversity of application areas for multipliers and the ubiquity of multiplication in digital systems exhibit a variety of requirements for speed, area, power consumption, and other specifications [1]. Parallel multipliers are fundamental building blocks in multimedia and digital signal processing systems. In many applications, the inputs and the outputs of the multiplier have the same bit width. These circuits are denoted in literature as fixed-width multipliers or truncated multipliers. The most obvious way to design a fixed-width multiplier

    uses n×n a full-width multiplier, whose output is truncated/rounded to n bits by discharging the less- significant bits of the products [6].

    In fixed width multiplication the product is truncated to n-bits, the least-significant columns of the product matrix contribute little to the final result. To take advantage of this, truncated multipliers do not form all of the least-significant columns in the partial-product matrix [5]. Output is the weighted sum of partial- products. By eliminating more columns the area and power consumption of the arithmetic unit are significantly reduced & in many cases the delay also decreases [2]-[4]. Area saving of a fixed width multiplier can be achieved by directly truncating n least significant columns and preserving n most significant columns.

    In the design of digital signal processing systems, where single-precision results are required, the power dissipation and area of parallel multipliers can be significantly reduced by truncating the less significant columns and compensating to produce an approximate rounded product [7]. This dissertation presents the design of truncated multiplications of signed inputs utilizing Baugh-Wooley algorithm, the negative fractional two's complement number system which solves an inherent problem of the conventional two's complement number system.

    The Baugh-Wooley (BW) algorithm is a relatively straightforward way of doing signed multiplications. The fixed-width property, however, can be exploited to simplify the multiplier structure [9]-[10], with the aim of improving power and speed. Basically, one can discard some of the partial-products in the partial- products array to reduce the circuit complexity, at a price in terms of accuracy.

    1. FIXED-WIDTH SIGNED MULTIPLIER

      In previous work we designed the fixed-width signed multipliers with linear compensation function by investigating in detail the effect of coefficients quantization [8]. New fixed-width multipliers are obtained by varying the quantization schemes such as uniform and nonuniform quantization scheme. In uniform quantization scheme all coefficients are quantized with same bit. In nonuniform quantization scheme some of the coefficients are quantized with some bits and remaining coefficients are quantized with some another bits.

      Fig. 1 Signed multiplication with negative and positive partial products

      Fig. 1 shows the signed multiplication with negative and positive partial products.

      Fixed-width signed multiplier consumes more power and area. Since the multiplier are implemented by using both full adder and half adder [8]. In proposed work we reduced the power and area of the multiplier by designing the fixed-width Baugh-Wooley multiplier [11]. Baugh-Wooley multiplier uses only full adders for implementation. The main contribution of the proposed work is to reduce the area and power by designing the fixed-width Baugh-Wooley multiplier. Baugh-Wooley multiplier simplifies the multiplier structure and wiring layout.

    2. FIXED-WIDTH BAUGH-WOOLEY MULTIPLIER

      1. Baugh – Wooley multiplier

        It is an efficient way to handle the sign bit. Baugh Wooley multiplier uses only full adders. All bit products are generated in parallel & collected through an array of full adder & final Ripple Carry Adder (RCA). The creation of the reorganized partial-product array comprises three steps:

        1. The most significant bit (MSB) of the first N-1 partial-product rows and all bits of the last partial- product row, except its MSB, are inverted.

        2. A 1 is added to the Nth column.

        3. The MSB of the final result is negated

          Fig. 2 shows the partial product array diagram for n×n Baugh Wooley multiplier.

      2. Partial-product generation

      The PPs are generated using multiplexers or AND gates in an unsigned radix-2 shift-and-add multiplication [12]-[13]. For multiplication of signed- magnitude numbers, the unsigned multiplication core can be used for the magnitude part of the inputs, with an extension that the sign bit is computed separately by checking the two input operands sign bits.

      The sign bit in twos complement numbers has a negative weight, the entry a7b0 term can be written in terms of a7b0, -a7b0 = a7(1-b0) a7 = a7b0 a7 (1)

      Fig. 2 Partial product array diagram for n×n Baugh

      Wooley multiplier

      where a- multiplicand and b- multiplier.

      Hence, the term a7b0 is replaced with a7b0 and a7. If a7 is used instead of a7, the column sum increases by 2a7. Thus, -a7 must be inserted in the next higher column in order to compensate the effect of 2a7. Thesameisdonefor a7b1, a7b2, a7b3, a7b4, a7b5, a7b6.

      In each column, a7 and a7 cancel each other out. The p14 column gets a a7 entry, which is replaceable by a7- 1.This can be repeated for all entries, yielding to the insertion of b7 in the p7 column, and b7-1 in the p14 column. There are two 1s in the 14th column now, which is equivalent to a 1 entry in p15 and that can be replaced with a 1 and a borrow into the non-existing tenth column.

      Baugh-Wooley method increases the height of the longest column by 2, which may lead to a greater delay through the RCA tree [14]. In the given example of Fig. 3, column height changes from 8 to 9, requiring an extra RCA level. Removing b7 from eighth column and writing two b7 entries in the sixth column, which has only four entries, can reduce the extra delay caused by the additional RCA level. Thus, the maximum number of entries in one column becomes 9, which cn be implemented withfive-level RCA tree.

      Alternatively, all negatively weighted a7bj terms can be transferred to the bottom row, which leads to two negative numbers in the last two rows, where a subtraction operation from the sum of all the positive elements is necessary [15]. Instead of subtracting a7b, twos complement of a can be added b7 times. This method is known as the Modified Baugh-Wooley algorithm.

      Modified form of the Baugh – Wooley method, shown in Fig. 4, is more preferable since it does not increase the height of the columns in the matrix.

      Fig. 1 depicts the product computation for 2s complement numbers. In order to avoid the complication that subtraction or sign extension will introduce in most applications [16], Baugh and Wooley proposed an efficient method for 2s complement multiplication that handles subtractions by taking the 2s complement of the terms to be subtracted. In 2s complement representation negation of an integer word

      x can be obtained by the formula -x = comp(x+1);

      where comp(x) is bitwise inversion of the bits in word

      x. Fig. 3 illustrates the PPM formation using Baugh- Wooleys method. Baugh-Wooleys strategy increases the maximum column height by 2 and this can

      potentially increase the accumulation delay. Fig. 4 illustrates the modified Baugh-Wooley multiplier.

      .

      Fig. 3 Baugh-Wooley multiplier

      The Modified Baugh-Wooley algorithm removes the need for negatively weighted bits present in the traditional 2s-complement multiplication algorithm by modifying the most significant bit of each partial product and the last row of partial products, and by adding two extra bits to the partial product matrix. This allows for summation of the partial products without using special adders equipped to handle negative inputs and without increasing the height of a tree of 3-input, 2-output ripple carry adders.

      Fig. 4 Modified Baugh-Wooley multiplier

      The least-significant columns of full-width Baugh- Wooley multiplier shown in Fig. 4 is truncated to yield the 8-bit output in 8×8 multiplier. The uniform quantization scheme is applied to both fixed-width signed multiplier and Baugh-Wooley multiplier to reduce the partial-products [17]-[20]. In uniform quantization scheme some of the coefficients of the partial products are quantized with two bits, while the remaining coefficients are quantized with single bit.

      Since the truncated multipliers do not form all of the least-significant columns in the partial product matrix. Output is the weighted sum of partial-products. The final output consists of 12 bits including truncated bits of four bits.

    3. APPLICATION TO MAC UNIT

      In order to verify the performances of proposed fixed-width multipliers in a real DSP application, we have implemented a Multiply and Accumulate (MAC) unit, as shown in Fig. 5.

      Fig. 5 Implementation of MAC using fixed width modified Baugh- Wooley multiplier

      In computing, especially digital signal processing, the multiplyaccumulate operation is a common step that computes the product of two numbers and adds that product to an accumulator. The hardware unit that performs the operation is known as a multiplier accumulator (MAC, or MAC unit); the operation itself is also often called a MAC or a MAC operation [1]. Refer to (2) the MAC operation modifies an accumulator a.

      a a + (b×c) (2)

      The multiplication is computed by using a fixed- width multiplier with n=8bits input and output bits (see Fig. 5), and 4 guard bits are added to the accumulator in order to avoid the overflow. The final output is rounded back to 8 bits [1]. The rounding is realized by initializing the accumulator with the rounding constant before each accumulation.

      The MAC implementations using fixed-width multipliers exhibit better performance compared to fixed-width signed multipliers which uses both half adders and full adders. The parameters like area, delay, power and the number of gates used are reduced by using the fixed-width Baugh- Wooley multiplier.

    4. RESULTS

We have obtained the simulation results by using Cadence digital tool ncsim which is a 180nm technology. We have taken the power report, area report, gates report in rclabs digital tool. Fixed-width signed multiplier uses both full adder and half adder for implementation but the fixed-width Baugh-Wooley multiplier uses only full adder for implementation. So the parameters like power, area will be reduced for fixed-width Baugh-Wooley multiplier.

We have presented the comparison results of these parameters in the table below. Fig.6 shows the simulation result of 8-bits fixed-width signed multiplier using Cadence digital tool ncsim.

Table I, shows that the total power required for the implementation of MAC unit using both fixed-width signed multiplier and Baugh-Wooley multiplier. The total power is calculated by adding leakage power and dynamic power. The fixed-width Baugh-Wooley multiplier consumes less power than the fixed-width signed multiplier which uses uniform quantization scheme. Table II and III shows the comparison report of area, and gates respectively.

Fig.6. Simulation result of 8-bits fixed-width signed multiplier

Fig 7. Simulation result of 8-bits fixed-width Baugh- Wooley multiplier

Fig 7. shows the simulation result of 8-bits fixed- width Baugh-Wooley multiplier using Cadence digital tool ncsim.

Fig.8. Implementation of MAC unit using 8-bits fixed-width signed multiplier

Fig.8 shows the simulation result of implementation of MAC unit using 8-bits fixed-width signed multiplier Cadence digital tool ncsim.

The power and area of the proposed fixed-width Baugh-Wooley multiplier is reduced than the fixed- width signed multiplier. The design of fixed-width Baugh-Wooley multiplier uses only an array of full adders and a final RCA. Similarly the area and the number of gates used for the implementation will also be reduced for proposed fixed-width Baugh-Wooley multiplier.

Fig.9 shows the simulation result of implementation of MAC unit using 8-bits fixed-width Baugh-Wooley multiplier using Cadence digital tool ncsim. The output of MAC unit is rounded back to 8-bits.

Fig.9. Implementation of MAC unit using 8-bits fixed-width Baugh-Wooley multiplier

TABLE I

POWER COMPARISON REPORT

Multiplier

Leakage power (nw)

Dynamic power (nw)

Total power (nw)

MAC unit

using fixed- signed width multiplier

7804.050

77143.432

84947.482

MAC unit

using fixed- width Baugh- Wooley multiplier

7583.413

76272.773

83856.187

The parameters like power, area and the number of gates used for implementation of MAC unit are calculated using Cadence digital tool rclabs. The power, area and the number of gates used for the fixed- width Baugh-Wooley multiplier is reduced.

TABLE II

AREA COMPARISON REPORT

Type

Cells

Cell area (gate count)

MAC unit using fixed-width signed multiplier

182

1503

MAC unit using fixed-width Baugh-Wooley multiplier

186

1495

TABLE III

GATES COMPARISON REPORT

Multiplier

Type

Instances

Area

Sequential

13

132.653

MAC unit

Inverter

20

43.042

using fixed-

width signed

Buffer

1

2.822

multiplier

Logic

148

1324.411

Total

182

1502.928

Sequential

13

132.653

MAC unit

Inverter

22

47.275

using fixed-

width Baugh-

Buffer

1

2.822

Wooley

multiplier

Logic

150

1312.416

Total

186

1495.166

VIII. CONCLUSION

In this paper we have designed the fixed-width multiplier using Baugh-Wooley algorithm. The Baugh- Wooley multiplier has the regular structure that simplifies the wiring and the layout. The BW algorithm is a relative straightforward way of doing signed multiplications. Both fixed-width signed multiplier and fixed-width Baugh-Wooley multiplier have been designed using uniform quantization scheme. Fixed- width signed multiplier is designed using both half adders and full adders which will leads to more power and area consumption. In fixed-width Baugh-Wooley multiplier only full adders and a final RCA are used. So the power and area consumed by the fixed-width Baugh-Wooley multiplier is reduced than the fixed- width signed multiplier. Also the implementation of MAC unit using fixed-width Baugh-Wooley multiplier requires less area and power.

REFERENCES

  1. Nicola Petra,Davide De Caro,Valeria Garofalo,Ettore Napoli,Antonio Giuseppe Mario Strollo,Design of Fixed Width Multipliers with Linear Compensation Function , IEEE Trans. Circuit Syst, vol 58,no.5, May 2011.

  2. N. Petra, D. De Caro, and A.G.M. Strollo, Design of fixed- width multipliers with minimum mean square error, in Proc. IEEE Eur. Conf.Circuits Theory Des. (ECCTD 2007), Sevilla, Spain, Aug. 2007, pp.464467.

  3. R. Michard, A. Tisserand, and N. Veyrat-Charvillon, Carry prediction and selection for truncated multiplication, in Proc. Workshop Signal Process. Syst. (SiPS), Banff, AB, Canada, Oct. 2006, pp. 339344.

  4. S. Das and S. P. Khatri, Generation of the optimal bit-width topology of the Fast Hybrid adder in a Parallel Multiplier, in IEEE Int. Conf.Integr. Circuit Des. Technol. (ICICDT 07), May 2007, pp. 16.

  5. J. E. Stine andO.M.Duverne, Variations on truncated multiplication,in Proc. Euromicro Symp. Digit. Syst. Des., 2003, pp. 112119.

  6. C Magnus Själander and Per Larsson-Edefors, Multiplication Acceleration Through Twin Precision. In IEEE Transactions on Very Large Scale Integration(VLSI) Systems, vol. 17, NO. 9, september 2009.

  7. L. Van and C. Yang, Generalized low-error area-efficient fixed-width multipliers, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 8, pp. 16081619, Aug. 2005.

  8. V. Garofalo,Truncated binary multipliers with minimum mean square error: Analytical characterization, circuit implementation and applications, Ph. D. dissertation, Dept. Electron. Eng., Univ. Napoli Federico II, Napoli, Italy, 2009.

  9. S. S. Kidambi, F. El-Guibaly, and A. Antonious, Areaefficient multipliers for digital signal processingapplications,IEEE Trans. Circuits and Systems II, Analog Digit. Signal Process., vol. 43, no. 2, pp. 9095,Feb. 1996.[10]

  10. E. J. King and E. E. Swartzlander, Jr, Data dependent truncated scheme for parallel multiplication, in Proc. 31th Asilomar Conf. Signals, Circuits Syst., 1997, pp. 11781182.

  11. H. Park and E. E. Swartzlander, Jr, Truncated multiplication with symmetric correction, in Proc. Asilomar Conf. Signals, Systems Comput. (ACSSC), Oct. 2006, pp. 931934.

  12. S. R. Kuang and J. P. Wang, Low-error configurable truncated multipliers for multiply-accumulate applications, Electron. Lett., vol. 42, no. 16, pp. 904905, Aug. 2006.

  13. D. De Caro and A. G. M. Strollo, High-performance direct digital frequency synthesizers using piecewise-polynomial approximation, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 2, pp. 324337, Feb. 2005.

  14. V. G. Oklobdzija, D. Villeger, and S. S. Liu, A method for speed optimized partial product reduction and generation of fast parallel multipliers using an algorithmic approach, IEEE Trans. Comput., vol. 45, no. 3, pp. 294306, Mar. 1996.

  15. P. F. Stelling, C. U. Martel, V. G. Oklobdzija, and R. Ravi, Optimal circuits far parallel multipliers, IEEE Trans. Comput., vol. 47, no. 3, pp. 273285, Mar. 1998.

  16. N. Petra, D. De Caro, V. Garofalo, E. Napoli, and A. G. M. Strollo,Truncated binary multipliers with variable correction and minimum mean square error, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57,no. 6, pp. 13121325, Jun. 2010.

  17. Y. C. Lim, Single-precision multiplier with reduced circuit complexity for signal processing applications, IEEE Trans. Comput., vol. 41, no. 10, pp. 13331336, Oct. 1992.

  18. M. J. Schulte and E. E. Swartzlander, Jr, Truncated multiplication with correction constant (for DSP), in Proc. Workshop VLSI SignalProcess., Oct. 1993, pp. 388396.

  19. J. M. Jou, S. R. Kuang, and R. D. Chen, Design of low-error fixedwidth multipliers for DSP Applications, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 46, no. 6, pp. 836842, Jun. 1999.

  20. L. Van, S. Wang, and W. Feng, Design of the lower error fixed-width multiplier and its application, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 47, no. 10, pp. 11121118, Oct. 2000. Circuits Syst. I, Reg. Papers, vol. 57,no. 6, pp. 13121325, Jun. 2010.

  1. Y. C. Lim, Single-precision multiplier with reduced circuit complexity for signal processing applications, IEEE Trans. Comput., vol. 41, no. 10, pp. 13331336, Oct. 1992.

  2. M. J. Schulte and E. E. Swartzlander, Jr, Truncated multiplication with correction constant (for DSP), in Proc. Workshop VLSI SignalProcess., Oct. 1993, pp. 388396.

  3. J. M. Jou, S. R. Kuang, and R. D. Chen, Design of low-error fixedwidth multipliers for DSP Applications, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 46, no. 6, pp. 836842, Jun. 1999.

  4. L. Van, S. Wang, and W. Feng, Design of the lower error fixed-width multiplier and its application, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 47, no. 10, pp. 11121118, Oct. 2000.

Leave a Reply