CLA Based 32-Bit Signed Pipelined Multiplier

Download Full-Text PDF Cite this Publication

Text Only Version

CLA Based 32-Bit Signed Pipelined Multiplier

P. N. V. Siva Kumar #1 , M. Saranya #2 , Y. Sirisha #3 , K. Raja Rajeswari #4

#1,2,3,4 Assistant Professor

Vijaya Institute of Technology for Women Vijayawada

AbstractNow a days VLSI systems require high speed with less area and low power multiplier is an extreme need. In the normal binary multiplication three basic steps are required to get the final product. First step is partial product generation, second is addition of partial product rows (PPRs) using partial product reduction tree until two partial product rows remain and third step is final addition using any fast carry propagation adder. In this project, Radix-4 Modified Booth Encoding (MBE) is used to generate partial product. The proposed 32-bit multiplier is based on pipelining. The main target is to reduce the delay of higher bits multiplier and speeding up the computation. The proposed design is implemented in Xilinx ISE 14.7. The delay achieved is 9.39ns for computing 32*32 bit signed multiplication with maximum frequency 106.455MHz on the device 7a100tcsg324-3.

KeywordsPPR, Modified Booth Encoding, CLA, Pipelined Multiplier.


The integrated circuits are widely used in many applications like small logic gates to more complex circuits like microprocessors. Integrated circuits typically consists many numbers of transistors and their interconnections within single silicon chip or die.Integrated circuits are classified depending on their signal processing into various types and mainly there are i) Analog integrated circuits ii) Digital integrated circuits iii) Computer integrated circuits and etc. Technology is improving with development in VLSI. Very large scale integrated circuits are simply referred as VLSI which includes millions of transistors or active devices combined together. VLSI functions include memories, computers and Digital Signal Processors and so on. Semiconductor technology is a process by which circuit implementation can be manufactured from design specifications.

An important design constraint for implementation of digital multiplier power consumption of the device must be addressed. This paper deals with area, power and speed constraints and gives better performance which is used in signal processing applications. Power requirements are one of the most crucial constraints in mobile computing applications, shortened battery life, limiting devices through restricted power dissipation, or increased size and weight.


    In digital signal processing one of the interested block is multiplier. Different digital multipliers are classified based on the structures, operating process (like parallel or serial approach). In multiplication process two inputs are available they are multiplicand, multiplier. Inputs are either positive or negative numbers. Perform the multiplication

    operation in different styles. Those are represented in below figure.

    Fig: 2.1. Classification of Multipliers

      1. Array Multipliers

        In array multipliers, the counters and compressors are connected in a serial fashion for all bit slices of the Partial Product parallelogram. As can be seen in Figure 3.5, the array topology is a two-dimensional structure that fits nicely on the VLSI planar process. Full adders are used for addition purpose. The shifting of the partial products for their proper alignments performed by array multiplier simple routing and does not require any logic. The overall structure can be easily be compacted into rectangle, resulting in very efficient layout. There are several possible array topologies including simple, double and higher order arrays.

        Fig : 2.2. Array Multiplier mechanism

      2. Double array Multiplier

        The double array design is faster than a simple array one. In this type of array, the delay required to produce the result for the simple array can be halved by adding partial products in two parallel rows. The odd-numbered PPs are added in one row while the other row adds the even-numbered PPs. When all the Partial products are accumulated, the two partial sums are combined using a [4:2] compressor. The double array also consists of rows of [3:2] compressors. However, the output of the counter is the input to the row after the next one. The delay required to reduce the partial products is [N/2 -2] [3:2] compressors +1 [4:2] compressor.

      3. Higher-order Array

    In higher-order arrays, more additions are performed concurrently thereby reducing the delay to produce the final result. The idea is to partition the array into more sub-arrays and use [4:2] compressors to combine the sub-arrays. This is accomplished by connecting progressively longer simple array together. The [4:2] compressor is used between a simple array and the other arrays when the delay of the simple array is equal to the total delay of the combined arrays. The higher- order arrays are classified according to the number of partial products in each sub-array. For example, the [6, 6, 8, 10] array combines two simple arrays that each reduce six PPs using a [4:2] compressor. The resulting structure is combined with a simple array that reduces 8 PPs. Finally, the resulting structure is combined with a simple array that sums 10 PPs. Consider the [6, 6, 8] high-order array. The delay for each of the simple arrays that reduce 6 PPs is four [3:2] compressor delays plus one [4:2] compressor delay. So the resulting delay is

    4(3:2) comp + 1(4:2) comp = 6(3:2) comp

    This is connected to the simple 8PP array which has a delay of six [3:2] compressors. Hence, the delay of the larger sub array is approximately equal to that of the combined one.

    Fig :2.3. Partial products addition using (6, 6, 8, 10) array

    structure was introduced by Wallace. Wallace showed that PPs can be reduced by connecting [3:2] compressors in parallel in a tree topology. The regular trees include binary, balanced- delay and overturned-staircase trees as well as [9:2] compressors.

    2.4.1. Wallace Tree

    Wallace trees are irregular in the sense that the informal description does not specify a systematic method for the compressors inter connections. However, it is an efficient implementation of adding partial products in parallel. The Wallace tree operates in three steps:

    1. Multiply: Each bit of multiplicand is AND with each bit of multiplier yielding n2 results. Depending on the position of the multiplied bits, the wires carry different weights, for example, wire of bit a2b3 weights 32.

    2. Addition: As long as there are more than 3 wires with the same weights add a following layer. Take 3 wires of same weight and input them into a full adder. The result will be an output wire of same weight. If there are two wires of same weight, add them using half-adder and if only one is left, connect it to the next layer.

    3. Group the wires in two numbers and add in a conventional adder.

    Fig: 2.4. Typical Wallace Tree

    2.5. Booth Encoding

    Booth encoding is a method used for the reduction of the number of partial products proposed by A.D. Booth in 1950. A binary number X consisting of m bits represented in 2s complement format can be described as

    2.4. Tree multipliers

    In order to speed up the process of addition of partial products, tree based structure is used. In tree architecture, the compressors are connected for each bit slice in the PP parallelogram. Normally, they are used in parallel. Although the trees are faster than arrays, they both use the same number f compressors to reduce the partial products. The first tree

    Considering the first 3 bits of X, we can determine whether to add Y, 2Y or 0 to partial product. The grouping of X bits is shown in Figure

    Fig: 2.5. Multiplier bit grouping according to Booth Encoding

    The multiplier X is segmented into groups of three bits (Xi+1, Xi, Xi-1) and each group of bits is associated with its own partial product row using Table 2.1. For each step i, three bits of multiplier X i.e. x2i-1, x2i, x2i+1 are considered and the corresponding value of di is obtained from Table 2.1

    Table 2.1: Modified Booth encoding table

    1. Zero must always be concatenated to the right of X, i.e. x-1 is considered to be 0.

    2. M must always be even.

    There are two unavoidable consequences when utilizing MBE as sign extension prevention and negative encoding. The combination of these two results in the formation of one additional partial product row, which requires more hardware and the system, also becomes slower. The advantage of using MBE is that the number of partial products is reduced to m/2. This, in turn, reduces the hardware burden and increases the speed of multiplier.


      1. Pipelining approach

        Pipelining is a popular technique that has been used in the design sector several years. This is an architectural option used by designers to reduce power. Arithmetic circuits

        as summaries and multipliers that are a key element of the system can be data path to improve pipeline performance.

        There are two types of pipelining architecture.

        i) Linear Pipeline ii) Non linear Pipeline

        In the transport pipes, all phases are connected in series. No return to the pipe section, but in linear pipe steps is serially connected to the feedback path. Pipelining improves design efficiency.

        If the pipeline architecture is N, the first output is obtained after the clock cycle N and the next output is obtained after the N + 1 cycle. Therefore, in the three-step multiplier proposal, the first output to obtain a third cycle of 'Clock followed by successive outputs for each subsequent input.

        The design is divided into two major blocks.

        1. Partial Product Generation (Booths Encoding)

        2. Partial Product Addition.

        Fig: 3.1. Block diagram for pipelined multiplier based on MBE

      2. Radix-4 modified Booth Encoding

        Booth multiplication algorithm or Booth algorithm can be defined as an algorithm or method

        of multiplying binary numbers in Twos complement notation. It is a simple method to multiply binary numbers in which multiplication is performed with repeated addition operations by following the booth algorithm. The main purpose of using MBE is to reduce the number of partial products from the base of the conventional multiplier. As we know, if both operands are multiplied by n bit add-shift algorithm, the number of PPR is the number of multiplier bits.

        Consider a multiplicand "X" of n bits and is represented as Xn-1 Xn-2……… X2X1X0 and a multiplier Y is also n bits represented as Yn-1 Y n-2. …… Y2Y1Y0. Both operands are numbers with a sign.

        In this block Modified Booth Encoding (MBE) is used to generate PPR.

        It is Radix-4 Booth encoded modulo 2n +1 multiplier using a proprietary number representation that is suited only for the International Data Encryption Algorithm was proposed. Booth encoded modulo 2n +1 multipliers using diminished-1 and weighted-binary representations were described. However, the multiplexers (MUXs) employed to generate the modulo- reduced partial products and the correction factor increase the circuit area and power dissipation.

        Steps to generate partial products using MBE Radix-4: Step 1: The LSB multiplier 'Y', add '0'.

        Step 2: The bit group T, where T = 3 of the multiplier "Y", the names of each group Zk, where n-1 k 0. The rule for each Zk group is such that (Yi +1YiYi -1), as shown in Figure 3.2.

        Figure 3.2: RADIX-4 grouping bits of multiplier

        Step 3: Now, depending on K, where n-1 <= k <= 0, where Sk is the value "Sk" in Table 3.1 for all possible combinations of torque values Zk.

        Each combination should have been generated partial

        products. Different steps need based upon multiplier bits to reduce the Partial Product Rows (PPRs).

        The analytical expression of radix-4 MBE is given as,

        X × Y = X × Sn-1×22×(n-1) + X × Sn-2 ×22×(n-2) …….+ X × S1 ×22 + X×S0× 20(1)

        X × Y = PPn-1 × 22×(n-1) + PPn-2 × 22×(n-2)……. + PP1 × 22 + PP0× 20 (2)


        PPn-1 = X × Sn-1, PPn-2 = X × Sn-2……PP1 = X × S1, PP0 = X ×S0

        Where, PPk are called partial products and n-1 k 0

        step4: Add these lines n partial products to get the final product.

        Table 3.1: RADIX-4 BOOTH encoding

        1. Partial Products Generation

          RADIX-4 MBE requires 16 8×1 multiplexers for 32-bit multiplication process because of 16 groups of multiplier generates 16 partial products.

          Partial product rows from pp0 to pp15 are generated based on multiplier group bits. These Multiplier groups are acts as selection lines for multiplexer. Each group utilized single multiplexer for generation of partial products.

          Selection lines select eight different combinations of operations at different multiplexers.

          Initially first two bits of multiplier added with 0 to act as first selection lines of multiplexer shown in below figure 3.18 as dot representation. After completion of total 16 generated partial products, they are added using different adders and select one of the best adders for final addition.

          Fig: 3.3. Partial product generation using RADIX-4 MBE

        2. Partial Products Addition

    After reduction of partial products, add those partial product rows based on add and shift method. Here we use different adders for addition process. These different kinds of adders truth table and operation given in above for normal multiplication process. Partial product rows are added using two ways. First way is addition in sequential manner shown in Figure 3.4 for RADIX-4 MBE. Second way is using CLA based architecture shown in Figure 3.5 for RADIX-4 MBE.

    If we add the PPRs in sequential manner, number of clock cycles required is large. By using the CLA based architecture clock cycles are get reduced.

    Fig: 3.4. Addition of PPRs in RADIX-4 sequential manner

    In this approach, in first clock cycle, two partial product rows are added. In the second clock cycle sum generated by previous stage addition will be added with next partial product row and so on. At the 16th clock cycle we will get the final product.

    In the second approach, in the first clock cycle, all the partial products are added using CLA in group of two. In the second clock cycle, the entire sum is generated by previous stage are added in group of two. In the third clock cycle, the sums generated by previous stage are added in group of two and soon. At the fourth clock cycle we will get the final product.

    Fig: 3.5.CLA based architecture used for addition of RADIX-4 PPRs

    Comparison between two approaches second approach provide different bits of CLA adder is used based on the inputs and outputs. In our proposed approach different CLA adders that means inputs of adders were changed based on requirement and I also observed 64-bit sequential and normal tree based approach for entire operation.

    Area, power, delay analysis change can be observed for different bit CLA adders are used. Proposed multiplier addition process involve two different bit CLA adder approaches, first approach used 64-bit CLA adder and second one used 38,44,50,56,62,64,65 bit tree based CLA pipelined adders.

    For addition 64-Bit Carry Look ahead Adder is used because it is a fast adder with less propagation delay. The delay of CLA is compared with other existing adder64-Bit Adders are simulated in Xilinx 14.7.


    On-Chip Debugging using Chip-Scope Pro:

    *Go to ISE Project Navigator in the Process pane double click on Analyze Design using Chip Scope. It will open Chip Scope Pro Analyzer.

    * Expand VIO and ILA cores, click on VIO Console, group each port, give input values and click Enter to see output values.

    *Now click Waveforms in ILA core, group each port, and give input values, Click Play button in toolbar to see output values.

    Fig: 4.1. Chip Scope pro when final output verification

    Fig:4.2. Final output

    Fig:4.3. Waveform of final output


In this paper, a 32-Bit signed pipelined multiplier has been designed. The design uses MBE for partial product generation, hence the number of PPRs reduced to half. Pipelining reduces the overall delay of the multiplier. The design is simulated using Xilinx 14.2. The proposed pipelined multiplier is compared with the conventional existing multiplier in terms of delay, clock cycle. The proposed multiplier gives less delay than the existing conventional multiplier.


  1. Abdullah-Al-Kafi, Atul Rahman, Bushra Mahjabeen, Mahmudur Rahman, An Efficient Design Of FSM Based 32-Bit Unsigned High-Speed Pipelined Multiplier Using Verilog HDL 8th International Conference on Electrical and Computer Engineering, DOI 10.1109/ICECE.2014.7027026, Dec 2014.

  2. Qingzheng LI, Guixuan LIANG, Amine BERMAK, A High Speed 32 Bit Signed/Unsig ed Pipelined Multiplier Fifth IEEE International Symposium on Electronic Design, Test & Applications,DOI 10.1109/DELTA.2010.10, Jan 2010.

  3. Shiann-Rong Kuang, Jiun-Ping Wang, Cang-Yuan Guo, Modified Booth Multipliers With a Regular Partial Product Array IEEE Transactions on Circuits and Systems, DOI 10.1109/TCSII.2009.2019334, Vol 56, Issue 5, May 2009.

  4. Huang Z. J., Ercegovac M. D., Cater J., High-performance low-power left-to-right array multiplier design IEEE Transactions on Computers, DOI 10.1109/TC.2005.51, Vol 54, Issue 3, March 2005.

  5. en-Chang Yeh, Chein-Wei Jen, High-Speed Booth Encoded Parallel Multiplier Design IEEE Transactions on Computers, DOI 10.1109/12.863039, Vol 49, Issue 7, July 2000.

  6. Rahul D Kshirsagar, Aishwarya.E.V., Ahire Shashank Vishwanath, P Jayakrishnan, Implementation of Pipelined Booth Encoded Wallace Tree Multiplier Architecture 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE), DOI 10.1109/ICGCE.2013.6823428, Dec 2013.

  7. Vijayalakshmi, R. Seshadri, Dr. S. Ramakrishnan, Design And Implementation Of 32-Bit Unsigned Multiplier Using CLAA And CSLA International Conference on Emerging Trends in VLSI, Embedded System, Nano Electronics and Telecommunication System,DOI 10.1109/ICEVENT.2013.6496579, Jan 2013.

  8. Soniya, Suresh Kumar, A Review of Different Type of Multipliers and Muliplier Accumulator Unit, International Journal of Emerging Trends and Technology in Computer Science, Vol. 2 No. 4, August 2013.

Leave a Reply

Your email address will not be published. Required fields are marked *