 Open Access
 Total Downloads : 10
 Authors : P. N. V. Siva Kumar , M. Saranya , Y. Sirisha , K. Raja Rajeswari
 Paper ID : IJERTV7IS050120
 Volume & Issue : Volume 07, Issue 05 (May 2018)
 Published (First Online): 15052018
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
CLA Based 32Bit Signed Pipelined Multiplier
P. N. V. Siva Kumar #1 , M. Saranya #2 , Y. Sirisha #3 , K. Raja Rajeswari #4
#1,2,3,4 Assistant Professor
Vijaya Institute of Technology for Women Vijayawada
AbstractNow a days VLSI systems require high speed with less area and low power multiplier is an extreme need. In the normal binary multiplication three basic steps are required to get the final product. First step is partial product generation, second is addition of partial product rows (PPRs) using partial product reduction tree until two partial product rows remain and third step is final addition using any fast carry propagation adder. In this project, Radix4 Modified Booth Encoding (MBE) is used to generate partial product. The proposed 32bit multiplier is based on pipelining. The main target is to reduce the delay of higher bits multiplier and speeding up the computation. The proposed design is implemented in Xilinx ISE 14.7. The delay achieved is 9.39ns for computing 32*32 bit signed multiplication with maximum frequency 106.455MHz on the device 7a100tcsg3243.
KeywordsPPR, Modified Booth Encoding, CLA, Pipelined Multiplier.
I.INTRODUCTION (INTEGRATED CIRCUITS)
The integrated circuits are widely used in many applications like small logic gates to more complex circuits like microprocessors. Integrated circuits typically consists many numbers of transistors and their interconnections within single silicon chip or die.Integrated circuits are classified depending on their signal processing into various types and mainly there are i) Analog integrated circuits ii) Digital integrated circuits iii) Computer integrated circuits and etc. Technology is improving with development in VLSI. Very large scale integrated circuits are simply referred as VLSI which includes millions of transistors or active devices combined together. VLSI functions include memories, computers and Digital Signal Processors and so on. Semiconductor technology is a process by which circuit implementation can be manufactured from design specifications.
An important design constraint for implementation of digital multiplier power consumption of the device must be addressed. This paper deals with area, power and speed constraints and gives better performance which is used in signal processing applications. Power requirements are one of the most crucial constraints in mobile computing applications, shortened battery life, limiting devices through restricted power dissipation, or increased size and weight.

ADVANCED EXISTED MULTIPLIER METHODS
In digital signal processing one of the interested block is multiplier. Different digital multipliers are classified based on the structures, operating process (like parallel or serial approach). In multiplication process two inputs are available they are multiplicand, multiplier. Inputs are either positive or negative numbers. Perform the multiplication
operation in different styles. Those are represented in below figure.
Fig: 2.1. Classification of Multipliers

Array Multipliers
In array multipliers, the counters and compressors are connected in a serial fashion for all bit slices of the Partial Product parallelogram. As can be seen in Figure 3.5, the array topology is a twodimensional structure that fits nicely on the VLSI planar process. Full adders are used for addition purpose. The shifting of the partial products for their proper alignments performed by array multiplier simple routing and does not require any logic. The overall structure can be easily be compacted into rectangle, resulting in very efficient layout. There are several possible array topologies including simple, double and higher order arrays.
Fig : 2.2. Array Multiplier mechanism

Double array Multiplier
The double array design is faster than a simple array one. In this type of array, the delay required to produce the result for the simple array can be halved by adding partial products in two parallel rows. The oddnumbered PPs are added in one row while the other row adds the evennumbered PPs. When all the Partial products are accumulated, the two partial sums are combined using a [4:2] compressor. The double array also consists of rows of [3:2] compressors. However, the output of the counter is the input to the row after the next one. The delay required to reduce the partial products is [N/2 2] [3:2] compressors +1 [4:2] compressor.

Higherorder Array
In higherorder arrays, more additions are performed concurrently thereby reducing the delay to produce the final result. The idea is to partition the array into more subarrays and use [4:2] compressors to combine the subarrays. This is accomplished by connecting progressively longer simple array together. The [4:2] compressor is used between a simple array and the other arrays when the delay of the simple array is equal to the total delay of the combined arrays. The higher order arrays are classified according to the number of partial products in each subarray. For example, the [6, 6, 8, 10] array combines two simple arrays that each reduce six PPs using a [4:2] compressor. The resulting structure is combined with a simple array that reduces 8 PPs. Finally, the resulting structure is combined with a simple array that sums 10 PPs. Consider the [6, 6, 8] highorder array. The delay for each of the simple arrays that reduce 6 PPs is four [3:2] compressor delays plus one [4:2] compressor delay. So the resulting delay is
4(3:2) comp + 1(4:2) comp = 6(3:2) comp
This is connected to the simple 8PP array which has a delay of six [3:2] compressors. Hence, the delay of the larger sub array is approximately equal to that of the combined one.
Fig :2.3. Partial products addition using (6, 6, 8, 10) array
structure was introduced by Wallace. Wallace showed that PPs can be reduced by connecting [3:2] compressors in parallel in a tree topology. The regular trees include binary, balanced delay and overturnedstaircase trees as well as [9:2] compressors.
2.4.1. Wallace Tree
Wallace trees are irregular in the sense that the informal description does not specify a systematic method for the compressors inter connections. However, it is an efficient implementation of adding partial products in parallel. The Wallace tree operates in three steps:

Multiply: Each bit of multiplicand is AND with each bit of multiplier yielding n2 results. Depending on the position of the multiplied bits, the wires carry different weights, for example, wire of bit a2b3 weights 32.

Addition: As long as there are more than 3 wires with the same weights add a following layer. Take 3 wires of same weight and input them into a full adder. The result will be an output wire of same weight. If there are two wires of same weight, add them using halfadder and if only one is left, connect it to the next layer.

Group the wires in two numbers and add in a conventional adder.
Fig: 2.4. Typical Wallace Tree
2.5. Booth Encoding
Booth encoding is a method used for the reduction of the number of partial products proposed by A.D. Booth in 1950. A binary number X consisting of m bits represented in 2s complement format can be described as
2.4. Tree multipliers
In order to speed up the process of addition of partial products, tree based structure is used. In tree architecture, the compressors are connected for each bit slice in the PP parallelogram. Normally, they are used in parallel. Although the trees are faster than arrays, they both use the same number f compressors to reduce the partial products. The first tree
Considering the first 3 bits of X, we can determine whether to add Y, 2Y or 0 to partial product. The grouping of X bits is shown in Figure
Fig: 2.5. Multiplier bit grouping according to Booth Encoding
The multiplier X is segmented into groups of three bits (Xi+1, Xi, Xi1) and each group of bits is associated with its own partial product row using Table 2.1. For each step i, three bits of multiplier X i.e. x2i1, x2i, x2i+1 are considered and the corresponding value of di is obtained from Table 2.1
Table 2.1: Modified Booth encoding table

Zero must always be concatenated to the right of X, i.e. x1 is considered to be 0.

M must always be even.
There are two unavoidable consequences when utilizing MBE as sign extension prevention and negative encoding. The combination of these two results in the formation of one additional partial product row, which requires more hardware and the system, also becomes slower. The advantage of using MBE is that the number of partial products is reduced to m/2. This, in turn, reduces the hardware burden and increases the speed of multiplier.


CONCEPTS USED FOR MODIFIED MULTIPLIER

Pipelining approach
Pipelining is a popular technique that has been used in the design sector several years. This is an architectural option used by designers to reduce power. Arithmetic circuits
as summaries and multipliers that are a key element of the system can be data path to improve pipeline performance.
There are two types of pipelining architecture.
i) Linear Pipeline ii) Non linear Pipeline
In the transport pipes, all phases are connected in series. No return to the pipe section, but in linear pipe steps is serially connected to the feedback path. Pipelining improves design efficiency.
If the pipeline architecture is N, the first output is obtained after the clock cycle N and the next output is obtained after the N + 1 cycle. Therefore, in the threestep multiplier proposal, the first output to obtain a third cycle of 'Clock followed by successive outputs for each subsequent input.
The design is divided into two major blocks.

Partial Product Generation (Booths Encoding)

Partial Product Addition.
Fig: 3.1. Block diagram for pipelined multiplier based on MBE


Radix4 modified Booth Encoding
Booth multiplication algorithm or Booth algorithm can be defined as an algorithm or method
of multiplying binary numbers in Twos complement notation. It is a simple method to multiply binary numbers in which multiplication is performed with repeated addition operations by following the booth algorithm. The main purpose of using MBE is to reduce the number of partial products from the base of the conventional multiplier. As we know, if both operands are multiplied by n bit addshift algorithm, the number of PPR is the number of multiplier bits.
Consider a multiplicand "X" of n bits and is represented as Xn1 Xn2……… X2X1X0 and a multiplier Y is also n bits represented as Yn1 Y n2. …… Y2Y1Y0. Both operands are numbers with a sign.
In this block Modified Booth Encoding (MBE) is used to generate PPR.
It is Radix4 Booth encoded modulo 2n +1 multiplier using a proprietary number representation that is suited only for the International Data Encryption Algorithm was proposed. Booth encoded modulo 2n +1 multipliers using diminished1 and weightedbinary representations were described. However, the multiplexers (MUXs) employed to generate the modulo reduced partial products and the correction factor increase the circuit area and power dissipation.
Steps to generate partial products using MBE Radix4: Step 1: The LSB multiplier 'Y', add '0'.
Step 2: The bit group T, where T = 3 of the multiplier "Y", the names of each group Zk, where n1 k 0. The rule for each Zk group is such that (Yi +1YiYi 1), as shown in Figure 3.2.
Figure 3.2: RADIX4 grouping bits of multiplier
Step 3: Now, depending on K, where n1 <= k <= 0, where Sk is the value "Sk" in Table 3.1 for all possible combinations of torque values Zk.
Each combination should have been generated partial
products. Different steps need based upon multiplier bits to reduce the Partial Product Rows (PPRs).
The analytical expression of radix4 MBE is given as,
X Ã— Y = X Ã— Sn1Ã—22Ã—(n1) + X Ã— Sn2 Ã—22Ã—(n2) …….+ X Ã— S1 Ã—22 + XÃ—S0Ã— 20(1)
X Ã— Y = PPn1 Ã— 22Ã—(n1) + PPn2 Ã— 22Ã—(n2)……. + PP1 Ã— 22 + PP0Ã— 20 (2)
Where,
PPn1 = X Ã— Sn1, PPn2 = X Ã— Sn2……PP1 = X Ã— S1, PP0 = X Ã—S0
Where, PPk are called partial products and n1 k 0
step4: Add these lines n partial products to get the final product.
Table 3.1: RADIX4 BOOTH encoding

Partial Products Generation
RADIX4 MBE requires 16 8Ã—1 multiplexers for 32bit multiplication process because of 16 groups of multiplier generates 16 partial products.
Partial product rows from pp0 to pp15 are generated based on multiplier group bits. These Multiplier groups are acts as selection lines for multiplexer. Each group utilized single multiplexer for generation of partial products.
Selection lines select eight different combinations of operations at different multiplexers.
Initially first two bits of multiplier added with 0 to act as first selection lines of multiplexer shown in below figure 3.18 as dot representation. After completion of total 16 generated partial products, they are added using different adders and select one of the best adders for final addition.
Fig: 3.3. Partial product generation using RADIX4 MBE

Partial Products Addition

After reduction of partial products, add those partial product rows based on add and shift method. Here we use different adders for addition process. These different kinds of adders truth table and operation given in above for normal multiplication process. Partial product rows are added using two ways. First way is addition in sequential manner shown in Figure 3.4 for RADIX4 MBE. Second way is using CLA based architecture shown in Figure 3.5 for RADIX4 MBE.
If we add the PPRs in sequential manner, number of clock cycles required is large. By using the CLA based architecture clock cycles are get reduced.
Fig: 3.4. Addition of PPRs in RADIX4 sequential manner
In this approach, in first clock cycle, two partial product rows are added. In the second clock cycle sum generated by previous stage addition will be added with next partial product row and so on. At the 16th clock cycle we will get the final product.
In the second approach, in the first clock cycle, all the partial products are added using CLA in group of two. In the second clock cycle, the entire sum is generated by previous stage are added in group of two. In the third clock cycle, the sums generated by previous stage are added in group of two and soon. At the fourth clock cycle we will get the final product.
Fig: 3.5.CLA based architecture used for addition of RADIX4 PPRs
Comparison between two approaches second approach provide different bits of CLA adder is used based on the inputs and outputs. In our proposed approach different CLA adders that means inputs of adders were changed based on requirement and I also observed 64bit sequential and normal tree based approach for entire operation.
Area, power, delay analysis change can be observed for different bit CLA adders are used. Proposed multiplier addition process involve two different bit CLA adder approaches, first approach used 64bit CLA adder and second one used 38,44,50,56,62,64,65 bit tree based CLA pipelined adders.
For addition 64Bit Carry Look ahead Adder is used because it is a fast adder with less propagation delay. The delay of CLA is compared with other existing adder64Bit Adders are simulated in Xilinx 14.7.


SIMULATION RESULTS
OnChip Debugging using ChipScope Pro:
*Go to ISE Project Navigator in the Process pane double click on Analyze Design using Chip Scope. It will open Chip Scope Pro Analyzer.
* Expand VIO and ILA cores, click on VIO Console, group each port, give input values and click Enter to see output values.
*Now click Waveforms in ILA core, group each port, and give input values, Click Play button in toolbar to see output values.
Fig: 4.1. Chip Scope pro when final output verification
Fig:4.2. Final output
Fig:4.3. Waveform of final output

CONCLUSION
In this paper, a 32Bit signed pipelined multiplier has been designed. The design uses MBE for partial product generation, hence the number of PPRs reduced to half. Pipelining reduces the overall delay of the multiplier. The design is simulated using Xilinx 14.2. The proposed pipelined multiplier is compared with the conventional existing multiplier in terms of delay, clock cycle. The proposed multiplier gives less delay than the existing conventional multiplier.
V. REFERENCES

AbdullahAlKafi, Atul Rahman, Bushra Mahjabeen, Mahmudur Rahman, An Efficient Design Of FSM Based 32Bit Unsigned HighSpeed Pipelined Multiplier Using Verilog HDL 8th International Conference on Electrical and Computer Engineering, DOI 10.1109/ICECE.2014.7027026, Dec 2014.

Qingzheng LI, Guixuan LIANG, Amine BERMAK, A High Speed 32 Bit Signed/Unsig ed Pipelined Multiplier Fifth IEEE International Symposium on Electronic Design, Test & Applications,DOI 10.1109/DELTA.2010.10, Jan 2010.

ShiannRong Kuang, JiunPing Wang, CangYuan Guo, Modified Booth Multipliers With a Regular Partial Product Array IEEE Transactions on Circuits and Systems, DOI 10.1109/TCSII.2009.2019334, Vol 56, Issue 5, May 2009.

Huang Z. J., Ercegovac M. D., Cater J., Highperformance lowpower lefttoright array multiplier design IEEE Transactions on Computers, DOI 10.1109/TC.2005.51, Vol 54, Issue 3, March 2005.

enChang Yeh, CheinWei Jen, HighSpeed Booth Encoded Parallel Multiplier Design IEEE Transactions on Computers, DOI 10.1109/12.863039, Vol 49, Issue 7, July 2000.

Rahul D Kshirsagar, Aishwarya.E.V., Ahire Shashank Vishwanath, P Jayakrishnan, Implementation of Pipelined Booth Encoded Wallace Tree Multiplier Architecture 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE), DOI 10.1109/ICGCE.2013.6823428, Dec 2013.

Vijayalakshmi, R. Seshadri, Dr. S. Ramakrishnan, Design And Implementation Of 32Bit Unsigned Multiplier Using CLAA And CSLA International Conference on Emerging Trends in VLSI, Embedded System, Nano Electronics and Telecommunication System,DOI 10.1109/ICEVENT.2013.6496579, Jan 2013.

Soniya, Suresh Kumar, A Review of Different Type of Multipliers and Muliplier Accumulator Unit, International Journal of Emerging Trends and Technology in Computer Science, Vol. 2 No. 4, August 2013.