Improved Performance Of Pipeline Scheme By Using Mac And Modified Booth Algorithm

DOI : 10.17577/IJERTV1IS7123

Download Full-Text PDF Cite this Publication

Text Only Version

Improved Performance Of Pipeline Scheme By Using Mac And Modified Booth Algorithm

*L. Sri Harish, **T. Venkata Lakshmi

*(Department Of Electronics And Communication, Gudlavalleru Engineering College, Gudlavalleru

** (Department Of Electronics And Communication, Gudlavalleru Engineering College, Gudlavalleru


As per in the previous work, the general booth algorithm and CLAs are used in MAC operation for getting the efficient output results through pipeline scheme. But the problem raises by the general booth algorithm and CLAs in previous work by that pipeline scheme performance will be decreased. To rectify these problems this proposed method has been introduced the modified booth algorithm and CSA. The proposed method introduces a parallel operation by generating the sign bits. The methods to reduce dynamic power consumption of a digital F.F.T for D.S.P applications. The overall performance was elevated as follows. The proposed CSA tree uses 1s-complement-based radix-2 modified Booths algorithm (MBA) and has the modified array for the sign generation in order to increase the bit density of the operands. The CSA propagates the carries to the least significant bits of the partial products and generates the least significant bits in advance to decrease the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits instead of the output of the final adder, which made it possible to optimize the pipeline scheme to improve the performance.

Keywords – Carry save adder, DSP, MAC, Modified Booth Algorithm, Pipeline scheme.


    The nature of the work is introduced to improve the performance of pipeline scheme by using proposed MAC structure and modified booth algorithm. CLA (carry look ahead adder) is used in MAC structure and general booth algorithm is used in multiplier in existing work. By using general booth algorithm delays will occur and partial products increased by using CLA. So, to reduce the partial products

    modified booth algorithm is used and to reduce the delays .CSA is used in proposed method for speedup the performance of accumulation [1] in MAC structure.

  2. MAC

    1. Previous MAC

      In MAC structure the multiplier can be divided into four operation steps. First step- booth algorithm, second step partial product summation, third step final addition, fourth step accumulation as shown in Fig.1.

      Previous internal structure in serial operation

    2. General MAC

    The general hardware architecture of MAC is shown in Fig.2.

    It executes the multiplication operation by multiplying the input multiplier X and the multiplicand Y. This is added to the previous multiplication result Z as the accumulation step.

    The -bit 2s complement binary number can be expressed as

    If (1) is expressed in base-4 type redundant sign digit form in order to apply the radix-2 Booths algorithm, it would be [7].

    If (2) is used, multiplication can be expressed as

    If these equations are used, the afore-mentioned multiplicationaccumulation results can be expressed as

    Each of the two terms on the right-hand side of (5) is calculated independently and the final result is produced by adding the two results. The MAC architecture implemented by (5) is called the standard design [3].

  3. Proposed MAC

    CSA tree is used in proposed MAC Architecture for reducing the delays in accumulation step. The overall performance of the proposed MAC is improved by eliminating the accumulator itself by combining it with the CSA function. If the accumulator has been eliminated, the critical path is then determined by the final adder in the multiplier [2]. The basic method to improve the performance of the final adder is to decrease the number of input bits. In order to reduce this number of input bits, the multiple partial products are compressed into a sum and a carry by CSA.

    Fig.3 Pipelined hardware architecture of proposed MAC

    The number of bits of sums and carries are transferred to the final adder is reduced by adding the lower bits of sums and carries in advance within the range in which the overall performance will not be degraded. A 2-bit CLA is used to add the lower bits in the CSA. In addition, to increase the output rate when pipelining is applied, the sums and carries from the CSA are accumulated instead of the outputs from

    the final adder in the manner that the sum and carry from the CSA previous cycle are inputted to CSA.

    Fig.4 proposed arithmetic operation of multiplication and accumulation

    Proposed internal structure for parallel operation

  4. Proposed CSA Architecture

    The architecture of the hybrid-type CSA that complies with the operation of the proposed MAC is shown in Fig. 5, which performs 8 x 8-bit operation.

    Fig. 5 architecture of proposed CSA tree

    In Si is to simplify the sign expansion and Ni is to compensate 1s complement number into 2s complement number. S[i] and C[i] correspond to the i th bit of the feedback sum and carry. Z[i] is the i th bit of the sum of the lower bits for each partial product that were added in advance and Zl[i] is the previous result. In addition, Pj[i] corresponds to the i th bit of the j th partial product. Since the multiplier is for 8 bits, totally four partial products (P0[7:0]~P3[7:0]) are generated from the Booth encoder. D0y and dn/2-12N-2Y correspond to P0[7:0] and P3[7:0], respectively. This CSA requires at least four rows of FAs for the four partial products [3]. Thus, totally five FA rows are necessary since one more level of rows are needed for accumulation. For an n x n bit MAC operation, the level of CSA is (n/2+1). The white square in Fig. 5 represents an FA and the gray square is a half adder (HA). The rectangular symbol with five inputs is a 2-bit CLA with a carry input. The characteristics of the proposed CSA architecture have been summarized and briefly compared with other architectures as shown in table

    1. For the number system, the proposed CSA uses 1s complement, but ours uses a modified CSA array without sign extension.

    A. Hardware resource

    The hardware resources in Table II are the results from counting all the logic elements for a general 16- bit architecture.


    Modified booth algorithm

    Proposed method implemented in FFT processor


The proposed method reduces the partial products by using modified booth algorithm and reduces the delays by using proposed parallel CSA Architecture. So, by using this two methods the performance of pipeline scheme has been improved.


  1. Wen-Chang Yeh and Chein-Wei Jen, High- speed Booth encoded parallel multiplier design, IEEE Trans. on Computers, vol. 49, issue 7, pp. 692-701, July 2000.

  2. Jung-Yup Kang and Jean-Luc Gaudiot, A simple high-speed multiplier design, IEEE Trans. on Computers, vol. 55, issue 10, Oct. pp. 1253-1258, 2006.

  3. Shiann-Rong Kuang, Jiun-Ping Wang and Cang- Yuan Guo, Modified Booth multipliers with a

    regular partial product array, IEEE Trans. On Circuit and Systems, vol.56, Issue 5, pp. 404-408, May 2009.

  4. Li-rong Wang, Shyh-Jye Jou and Chung-Len Lee, A well-structured modified Booth multiplier design, Proc. of IEEE VLSI-DAT, pp. 85-88, April 2008.

  5. A. A. Khatibzadeh, K. Raahemifar and M. Ahmadi, A 1.8V 1.1GHz Novel Digital Multiplier, Proc. of IEEE CCECE, pp. 686-689, May 2005.

  6. S. Hus, V. Venkatraman, S. Mathew, H. Kaul, M. Anders, S. Dighe, W. Burleson and R. Krishnamurthy, A 2GHZ 13.6mW 12x9b mutiplier for energy efficient FFT accelerators, Proc. of IEEE ESSCIRC, pp. 199-202, Sept. 2005.

  7. Hwang-Cherng Chow and I-Chyn Wey, A 3.3V 1GHz high speed pipelined Booth multiplier, Proc. of IEEE ISCAS, vol. 1, pp. 457-460, May 2002.

Leave a Reply