 Open Access
 Total Downloads : 13
 Authors : Karthikeyan K, Deepa P
 Paper ID : IJERTCONV3IS12052
 Volume & Issue : NCICCT – 2015 (Volume 3 – Issue 12)
 Published (First Online): 30072018
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
An Efficient Fused AddMultiply Unit Design Based on the SMB Recoder for Signed Digit
Karthikeyan K
PG Scholar, Department of ECE, Government College of Technology, Coimbatore, India
Deepa P
Assistant Professor, Department of ECE, Government College of Technology, Coimbatore, India
Abstract The basic operations behind the DSP applications are addition and multiplication unit. Here an efficient Fused AddMultiply (FAM) unit based on the Sum to Modified Booth (SMB) recoder for signed digit. To perform the addition of numbers in SMB, signed Half Adder (HA) and signed Full adders(FA) are used. By the use of various combination of Half and Full Adders, the three SMB recoder technique has been developed, named as SMB1, SMB2 and SMB3. In parallel, the Modified Booth (MB) recoding technique based on Radix4 encoding has been used to encode the added bits. The Radix4 algorithm reduces the number of intermediate products by half. So the area and the delay gets reduced. The recoded multiplicand from Radix4 is multiplied by the multiplier in 2s complement form to generate the intermediate product. The intermediate products are added through a Wallace Carry Save Adder (WCSA) tree along with the correction term to eliminate the errors. The result of the WCSA tree is given to a Fast Carry Look Ahead Adder to obtain the result.
Keywords AddMultiply operation, arithmetic circuits, Modified Booth recoding, VLSI design

INTRODUCTION
An common and very useful combinational logic circuit which can be constructed using just a few basic logic gates. The Adder uses AND and ExOR gates and allow us to add together single bit binary numbers, to produce outputs called the SUM of the addition and a CARRY called the Carryout, ( C out ) bit. One of the main uses for the Binary Adder is in arithmetic and counting circuits.
Fast multipliers are essential parts of DSP systems. The speed of multiply operation is very important in DSP as well as in the general purpose processors. Earlier days, multiplication was done by add, Subtract and shift operations. Multiplication can be done by repeated additions.
Multiply Accumulator unit has become one of the essential building blocks in digital signal processing applications such as filtering of digital signals, processing of speech signals and in cell phones. Existing MAC unit consists of multiplier and an accumulator that contains the sum of the previous consecutive products. Modern computers may contain a separate MAC, has a separate multiplier implemented in combinational logic followed by an adder and an accumulator stores the outcome. The output is given back to one input of the adder, on every clock cycle, the outcome of the multiplier is added to the content of the accumulator.
A Fused MultiplyAdd is a floatingpoint multiplyadd operation performed in single step, with single rounding. An
unfused MultiplyAdd would compute the product and round it to N significant bits, add the result to another number, and round back to N bits. A fast FMA can be faster and improve the accuracy of many computations that involve the accumulation of products. Fused MultiplyAdd can usually be relied on to give more accurate results.
In this paper, a new technique is exploited for direct recoding of two numbers in the MB representation of their sum. An optimized design of the AM operator is based on the fusion of the adder and the Modified Booth (MB) encoding unit into a single data path block by direct recoding of the sum to its MB representation. The Fused AddMultiply (FAM) component contains only one adder at the end (final adder of the parallel multiplier). So that area savings are observed and the critical path delay of the recoding process is reduced and decoupled from the bitwidth of its inputs.
The rest of the paper is organized as follows: In Section II, we discuss about the literature survey. Section III describes on the Fused AddMultiply (FAM) Unit. Section IV gives the simulation, synthesis and analysis of the FAM unit Section V concludes our work.

LITERATURE SURVEY
An array multiplier is a parallel multiplier. It does the shifting and addition operation in a single step. It adds the partial products parallel. The structure of array multiplier is very compact and the resulting layout is very economic This multiplier has simple operation but it expend more time, so it will make the multiplication process very slow [6].
High speed and low power DSP application make use of array multipliers in parallel. One kind of parallel array multiplier is Brauns multiplier. The delay of the multiplier depends on the delay of the full adders and also on the delay of the last stage adder. One of the major disadvantages of the multiplier is that the number of components required increases quadratically with the number of bits which makes the multiplier inefficient. [7].
BaughWooley algorithm is used for binary multiplication of Signed numbers. This algorithm creates AND terms and then they are passed through an array of halfadders and full adders with the Carryouts chained to the next most significant bit. Signed operands may also be multiplied using a Baugh Wooley multiplier.
Combinational Multipliers is used for the multiplication of two unsigned numbers and signed numbers. Generation of intermediate products is easy in binary multiplication. If the multiplier bit is a 1, the product is an correctly shifted copy of the multiplicand; if the multiplier bit is 0 then the product is 0. In most of the systems combinational multipliers are slow and take lot of area.
A Wallace tree multiplier is an efficient hardware implementation of a digital circuit that multiplies two integer values. Wallace tree reduces the number of partial products [5].
Wallace tree has three steps:

Multiply each bit of multiplier with same bit position of multiplicand. Depending on the position of the multiplier bits generated partial products have different weights.

Reduce the number of partial products to two by using layers of full and half adders.

After second step we get two rows of sum and carry, add these rows with conventional adders.
Wallace tree multiplier has small delay. The number of logic levels required to carry out the summation can be reduced with Wallace tree. The major disadvantages of Wallace tree is complex layout and has irregular wires.
The Booth algorithm forms the base of Signed number multiplication algorithms that are easier to implement at the hardware level and speed up signed multiplication. The algorithm developed by Booth is based upon recoding the multiplier, to a recoded value, without changing the multiplicand(x) unchanged. In Booth recoding, each digit of the multiplier may assume signed, unsigned and zero values. It uses a superior notation, called signed digit (SD) encoding, to express these signed digits and is given in table 2.1. In SD encoding +1 is expressed as 1 and 0 as 0, but 1 is expressed
as 1.
TABLE I. BOOTH RECODING TABLE FOR RADIX2
Y
Yi1
Recoded value (z)
0
0
0
0
1
+
1
0
–
1
1
0
Y
Yi1
Recoded value (z)
0
0
0
0
+
1
0
–
1
1
0
i
The Booth recoding procedure is as follows:

Add 0 to the right of lsb of multiplicand

Make pairing of 2 bits from right to left and mark corresponding recoded value.

00 or 11 0(do nothing)

01 +1 ( add multiplicand to partial product)

101 ( subtract multiplicand from partial product)
The Radix2 booth multiplier has two drawbacks:

Inconvenient while designing parallel multipliers

Inefficient when there is isolated number of 1's
The literature survey reveals that there are various add and multiply unit each has its own advantages and disadvantages, But there is no specific fused add.


EFFICIENT FUSED ADDMULTIPLY UNIT This chapter presents the design of an efficient Fused
Add Multiply (FAM) Unit using Sum to Modified Booth
(MB) Recoder for both Signed and Unsigned numbers. Addition and the encoded multiply units are fused into a single block. Hence the propagation delay is reduced. It uses Signed bit Half Adder and Signed bit Full Adder for adding the numbers instead of conventional adders. For multiplication, redundant Signed digit Radix – 4 method is used. It decreases the number of intermediate products by half when compared to the conventional radix2 booth multiplication method.
Fig 1. Block diagram of FusedAdd Multiply unit

Adder Unit
FAM uses signed Half Adder (HA) and signed Full Adder (FA) for adding the numbers considering that their inputs and outputs are signed.

Signed Half Adders HA*
The binary inputs are considered as a and b, both are positive .The outputs are represented as carry (c) and sum (s) of a HA. The sum is considered as negative.
TABLE II. TRUTH TABLE FOR HA*
a (+)
b (+)
Sum(s) ()
Carry (c)(+)
0
0
0
0
0
1
1
1
1
0
1
1
1
1
0
1
By kmap simplification, the sum and carry equations are given by,
Sum = a exor b (1)
Carry = a + b (2)
HA* DUAL OPERATION
The binary inputs are a and b, both are positive . The sum is considered as negative.
By kmap simplification, the sum and carry equations are given by, Sum = a exor b (3)
Carry = (~a) . b (4)
TABLE III. TRUTH TABLE FOR HA* DUAL OPERATION
a()
b()
Sum(+)
Carr y()
0
0
0
0
0
1
1
1
1
0
1
1
1
1
0
1
HA**
The binary inputs are considered as a (signed) and b (unsigned).The outputs are carry (c) and sum (s) of a HA** . Where the sum is considered negatively signed. And the output takes one of the values 0,1.
TABLE IV. TRUTH TABLE FOR HA**
p ()
q (+)
c(+)
s()
0
0
0
0
0
1
1
1
1
0
0
1
1
1
0
0
By kmap simplification, the sum and carry equations are given by,
Sum = p exor q (5)
Carry = (~p) . q (6)

Signed Full Adders
FA*
The binary inputs are considered as a (signed), b (unsigned),cin (carry in).The outputs are carry (c) and sum (s) of a FA*. Where the sum is considered negatively signed and the carry output is positively. And the output takes one of the values 0,1.
TABLE V. TRUTH TABLE FOR FA*
p (+)
q ()
cin (+)
c(+)
s(
)
0
0
0
0
0
0
0
1
1
1
0
1
0
0
1
0
1
1
0
0
1
0
0
1
1
1
0
1
1
0
1
1
0
0
0
1
1
1
1
1
By kmap simplification, the sum and carry equations are given by,
Sum= a exor b exor cin (7)
Carry = (a+(~b)) cin+ a.(~b) (8)
FA**
The binary inputs are a , b both are signed, cin(carry in).The outputs are carry (c) and sum (s) of FA. The sum is positive and the carry output is negative. The output takes one of the values {0,Â±1,Â±2}.
By kmap simplification, the sum and carry equations are given by,
Sum = a exor b exor cin (9)
Carry =(a+(~b)) cin+ a.(~b) (10)
TABLE VI TRUTH TABLE FOR FA**
p ()
q ()
cin (+)
c ()
s (+)
0
0
0
0
0
0
0
1
0
1
0
1
0
1
1
0
1
1
0
0
1
0
0
1
1
1
0
1
0
0
1
1
0
1
0
1
1
1
1
1


SMB Recoding Schemes For Signed & Unsigned Numbers
The three forms of SMB recoding scheme based on FA, FA*, FA**, HA, HA, HA** are

SMB1 Recoding Scheme

SMB2 Recoding Scheme

SMB3 Recoding Scheme

SMB1 Recoding Scheme
The first scheme of the proposed recoding technique is referred as SMB1 scheme and is given in Fig. 3.2. It uses FA and FA*.For adding two,2 bit numbers one set of FA and FA* used. For adding two,8 bit numbers 4 sets of FA and FA* are used. Sum bits from the FA & FA* and carry out from the FA* are used to form the encoded digits.
Fig.2. Block diagram for SMB1 Recoding Scheme for 8 bits
If sign=0, then the result of addition of two numbers is Step 3: If the value of a digit equals to 1, one=1, else one=0
positive. If sign=1, then the result of addition of two numbers is negative.
Step 4: If the value Equations for fin
din
=Y
din
=Y
X
ied B
X
ied B
Sign(S)
igit equals to 2, two=1, else two=0 o signals are given by,
of a d g enc
Xor
of a d g enc
Xor
ding
Y
ding
Y
)O
for
)O
for
(11)

SMB2 Recoding Scheme
The second scheme o the proposed recoding technique is referred as SMB2 and is given in Fig. 3.4.This scheme uses FA , HA , HA*.For adding two,2 bit numbers one set of FA,HA,HA* are used. For adding two,8 bit numbers 4 sets of FA,HA,HA* are used. Sum bits from the FA & HA* and carry out from the HA* are used to form the encoded digits.
Fig.2. Block diagram for SMB2 Recoding Scheme for 8 bits
Sign=0, if the result of addition of two numbers is positive.
Sign=1,if the result of addition of two numbers is negative.

SMB3 Recoding Scheme
The third scheme of the proposed recoding technique is referred as SMB3 and is given in Fig 3.6 .This scheme uses FA , HA* , HA**. For adding two,2 bit numbers one set of FA,HA*,HA** are used. For summing two,8 bit numbers 4 sets of FA,HA*,HA** are used. Sum bits from the FA & HA** and carry out from the HA** are used to form the encoded digits.
Fig.3. Block diagram for SMB3 Recoding Scheme for 8 bits
Sign=0, if the result of addition of two numbers is positive. Sign=1,if the result of addition of two numbers is negative.


Modified Booth Form
Modified Booth (MB) is a prevalent form used in multiplication. It is a redundant signeddigit radix4 encoding technique. The main advantage is that it reduces by half the
One=Y Y (12)
Two =(Y or ne* (13) The table for Modif ooth m is given in TABLE VII.
TABLE VII. MODIFIED BOOTH ENCODING TABLE
Binary value
Output value
Sign
One
Two
y2j+1
y2j
y2i1
0
0
0
0
0
0
0
0
0
1
+1
0
1
0
0
1
0
+1
0
1
0
0
1
1
+2
0
0
1
1
0
0
2
1
0
1
1
0
1
1
1
1
0
1
1
0
1
1
1
0
1
1
1
0
1
0
0

Partial Product Generation
the med
=
e,
the med
=
e,
lti pro
lti pro
plier duc
=
plier duc
=
ing, the r and th ts are ex
~p 2
ing, the r and th ts are ex
~p 2
ecode e par presse
+
ecode e par presse
+
tial d a
tial d a
s
s
pro
2
pro
2
d
d
u
u
,
p
,
p
After the encod d multiplicand is multiplied with mu cts are produced. Inter iate
PP X. y , , (14)
X
Yj
=
n=2
X
Yj
=
n=2
multipli encoded x xor s
k (input b
multipli encoded x xor s
k (input b
er
m o it
er
m o it
ultipl ne + width)
ultipl ne + width)
and
x xor
and
x xor
wo
wo
Wher
ic
o
o
p , s t (15)
o n ac
o n ac
;j
it o
;j
it o
i 0 t 1 0 to k1
x E h b f multiplier (0< i<n1)
s , ne , two Encoding signals

Wallace Carry Save Tree and Fast CLA adder
The multiplier architecture consists of intermediate product generation stage, reduction stage and addition stage. The dormancy of the Wallace tree multiplier can be decreased by scaling down the number of adders in the partial products reduction stage.
d t m is
2
d t m is
2
he tr exp
he tr exp
+
+
rror on te
c
rror on te
c
a r
a r
cati sse (1
cati sse (1
m ult
)
m ult
)
rror
2
rror
2
After generating the partial products, they are added along with the correction terms through a Wallace carry save adder (CSA Tree).Correction term is used for reducing the mean
number of intermediate products in multiplication comparing
square e n
un on e in
ipliers.
to any other radix2 representation. The basic step of MB recoding is as follows,
Step 1: Each digit is represented by three bits called sign, one and two
Step 2: sign=1, if the digit is negative otherwise sign=0
Correcti
CT= , .
re d as,
2 + (16)
Fig.4. Overall structure of Wallace Carry Save Adder Tree and Fast CLA for adding partial products
Where,
=
ne w
=
ne w
es
es
+
a
+
a
)
od
)
od
ng
ng
c , (one two s (17)
A. Comparison between Existing Method and FAM Unit
FAM is compared with the existing method for area, power
o ,t o ,s re enc i signals j rang from 0 to k1
The output of Wallace carry save tree is given to the Carry
ed as
P
ed as
P
,
,
Look Ahead. Propagate and generate terms are calculated and the final result is produced.
The final result (z) is express
z = X. Y = CT + P . 2 (18)
CT Correction Term PPj Partial Products


RESULTS AND DISCUSSION
The FAM has been analyzed for area, power and delay and is given in Table 4.1. From the analysis it could be observed that the SMB1, SMB2, and SMB3 show as competitive performance.
and delay and is given in table 4.2. From the table, it could be observed that area and delay is 24.20 % and 63 % less compared to the existing method with a minimum of 52.46% tradeoff in power.
Parameters
Existing Method
Proposed Method
Area(um2)
5618
4258
Power(mW)
0.441104
0.672498 (increased due to more number of signals)
Delay(ns)
8.142
3.004
Parameters
Existing Method
Proposed Method
Area(um2)
5618
4258
Power(mW)
0.441104
0.672498 (increased due to more number of signals)
Delay(ns)
8.142
3.004
TABLE IX. COMPARISON BETWEEN EXISTING METHOD AND FAM UNIT
TABLE VIII. ANALYSIS OF FAM
Different techniques
Area (um2)
Power (mW)
Delay (ns)
SMB1
4264
0.686815
8.573
SMB2
4258
0.687116
3.004
SMB3
4258
0.672498
6.566

CONCLUSION
The FAM units depending on three Sum To Modified Booth Recorder for Signed and unsigned Digit has been successfully simulated using Xilinx ISE Design Suite 14.2 and synthesized by usingCadence Design Tool in 180 nm technology. The Area, Power and Delay of the three FAM units based on three SMB techniques were analyzed and it was compared with the Existing architectures for AM unit. The suggested architecture is found to be efficient in terms of area and delay.
REFERENCES

Kostas Tsoumanis, Student Member, IEEE, Sotiris Xydis, Constantinos Efstathiou, Nikos Moschopoulos, and Kiamal Pekmestzi ,An Optimized Modified Booth Recoder for Efficient Design of the AddMultiply Operator, IEEE transactions on circuits and systemsi: regular papers, vol. 61, no. 4, april 2014

Dr DC Hendry,Multipliers – Carry Save and Wallace Tree

VLSI IP, Booth Multiplier Implementation of Booths Algorithm using Verilog RTL, http://www.vlsiip.com

NTU, Booth encoded multiplier,Graduate institute of electronics engineering

Sukhmeet Kaur, Suman and Manpreet Signh Manna, Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix2), Advance in Electronic and Electric Engineering,ISSN 22311297, Volume 3, Number 6 (2013), pp. 683690,Â© Research India Publications

Soniya, Suresh Kumar,M.Tech. Student U.I.E.T., M.D.U. Rohtak Assistant Professor, U.I.E.T., M.D.U. Rohtak, A Review of Different Type of Multipliers and Multiplier Accumulator Unit, IJETTCS, Volume 2, Issue 4, July August 2013 ISSN 22786856

Anitha , Alekhya Nelapati, Lincy Jesima, V. Bagyaveereswaran, Comparative Study of High performance Brauns Multiplier using FPGAs, IEEE member, VIT
University, Vellore, IOSR Journal of Electronics and Communication Engineering (IOSRJECE) ISSN: 22782834 Volume 1, Issue 4 (MayJune 2012), PP 3337

Smiksha ECE, UIET A.P.,Analysis, Simulation and Comparison of Different Multiplier Algorithms, , International Journal of Management, IT & Engineering

Sukhmeet ,ECE, SSIET (P.T.U), Derabassi, Punjab, India,Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix2), Advance in Electronic and Electric Engineering,ISSN 22311297, Volume 3, Number 6 (2013), pp. 683690 Â© Research India Publications http://www.ripublication.com/aeee.htm

Digital Logic Design, A Combinational Multiplier Using the Xilinx Spartan II FPGA

Sandeep Shrivastava, Implementation of Radix2 Booth Multiplier and Comparison with Radix4 Encoder Booth Multiplier, , International Journal on Emerging Technologies 2(1): 1416(2011) ISSN : 09758364

Sakshi Rajput, Asst. Professor, Deptt. of Electronics and Communication, Maharaja Surajmal Institute of Technology, New Delhi, India, High Speed and Reduced Power Radix2 Booth Multiplier, IJCEM International Journal of Computational Engineering & Management, Vol. 16 Issue 2, March 2013,ISSN (Online): 22307893www.IJCEM.org.