 Open Access
 Total Downloads : 314
 Authors : Yamini Gayathri T
 Paper ID : IJERTV6IS060395
 Volume & Issue : Volume 06, Issue 06 (June 2017)
 DOI : http://dx.doi.org/10.17577/IJERTV6IS060395
 Published (First Online): 23062017
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
An Efficient Multi Precision Floating Point Complex Multiplier Unit in FFT
Mrs. Yamini Gayathri T
Assistant Professor, ACS College of Engineering,
Department of ECE, Bangalore560074, India
Abstract Discrete Fourier Transform (DFT) is a fundamental digital signal processing domain transformation technique used in many applications for frequency analysis and frequency domain processing. Fast Fourier Transform (FFT) is an algorithm to compute discrete transform and its inverse. FFT computation involves addition, multiplications and subtraction. Multipliers are slow performing units. Multiplier has the disadvantages of large area, long propagation delay and consumes power. Low power multiplier design has an important role in the design of low power VLSI systems. As twiddle Factor contains a floating point complex number, the FFT needs a Floatingpoint complex multiplier for computations. In some biomedical applications, apart from power, area and speed, the importance has given to the accuracy. The main aim of our study is to improve the accuracy of the multipliers used by using a precision floating point complex multiplier in the unit. For FFT analysis, theverilog code should framed for the various combinations given in the methodology and the ASIC design Flow(Front end and Backend simulation) is carried out using Cadence Verilog ASIC Simulator using 90nm technology to find out power, area and delay.
Keywords DoublePrecision, twiddle factor, IEEE754 Format, katarsubha urdhva tiryagbhyam algorithm, MixedRadix, Floating point

INTRODUCTION
The Discrete Fourier Transform (DFT) decomposes a sequence of values into components of different frequencies (Time domain to Frequency domain). The Fast Fourier transform (FFT) is the efficient algorithm to compute the DFT and its inverse. The FFT was proposed by Cooley and tukey to efficiently reduce the time complexity to O(Nlog2N), where N denotes the FFT size. The hardware implementation of FFT classified into memorybased architecture and pipeline architecture.The Memory based architecture design composes of main processing element (PE) and several memory units. It consumes less power and less hardware cost comparing to other styles. The main disadvantage is long latency, low throughput and cant be parallelized. The Pipeline architecture has get rid of the disadvantages of the memory based style at the cost of an acceptable hardware overhead. The types of the pipeline architecture are single path delay feedback (SDF), single path delay commutator (SDC) and multiple delay commutator (MDC). In addition, pipeline structure is highly regular, which can be easily scaled and
parameterized when Hardware Description Language (HDL) is used in the design.

OBJECTIVE OF THE STUDY
The main aim of the study is to implement an efficient Floating point complex multiplier in FFT for biomedical applications. The tool used for implementation is Cadence Verilog ASIC simulator with 90nm technology. As the twiddle factor has a complex value, the complex addition and complex multiplication is used. The complex multiplication method is varied and used in twiddle factor multiplication of N point FFT. Then the comparison of power, area and speed has to make for the implemented design of an N point FFT. In addition to power area and speed, Accuracy is improved for the Multiplier by increasing the significant figures for the twiddle factor..

LITERATURE SURVEY
Booth encoded Wallace tree multiplier uses, booth encoding to increase the speed of algebra by reducing the number of partial products and Wallace tree for decreasing the number of levels of additions [1]. The three term fused dot product unit implemented in the paper achieved a reduction in area compared to conventional multiplier [2].Golub method can reduce the complexity with a slight increase in latency [2]. To obtain a high throughput, Mixed radix and delay Feedback style is used for implementing the 128 point FFT Processor [3].CORDIC is an efficient and economic approach to compute trigonometric functions using addition, subtraction and shifting operations [3]. The Number of multipliers in complex multipliers has reduced from 4 to 3 and the design using this multiplier proved to be superior to the existing system with complex multipliers of 4 relating to area and time consumption [4]. The Urdhva tiryagbhyam method used in the paper [5] can reduce the time delay by adding the partial products concurrently with the multiplication operations.The Multifunctional floating point multiplier design proposed in the paper [6] has a advantage of Accuracy at the cost of area. The pipelined architecture has advantage of high data throughput, relatively small area and a relatively simple control [7]. As Accuracy requirement decreases, the width of the multiplier decreases [8]. The multiplication of lower bit length mantissa consumes less amount of power [8]. Karatsubha Algorithm is best suited for higher bit length [8].

PROPOSED METHODOLOGY

Complex Multiplication method

Conventional Complex Multiplication
The two complex value a+jb and c+jd can be multiplied as follows.
(a+jb) X (c+jd) =(acbd) + j(bc+ad) (1)
Then, Conventional complex multiplier needs 4 multiplications and 1 subtraction and 1 addition.
Figure 1. Conventional Complex Multiplication

Golub Complex Multiplication
The two complex value a+jb and c+jd can be multiplied as follows.
(a+jb) X (c+jd) =(acbd) + j(bc+ad) (2)
The complex multiplication can be reduced as follows Real =acbd ad+ad
=a(cd)+d(ab) (3)
Imaginary=ad+bcbd+bd
=d(ab)+b(c+d) (4)
Then, Golub complex multiplier needs 3 multiplication and 2 subtractions and 3 additions.
Figure 2: Golub Complex Multiplication

Three term fused dot product unit:

Figure 3. Fused ADDSUBTRACT unit
Figure 4. Fused DOTPRODUCT unit
Figure 5. Three term fused dot product unit
Fig (3) shows the add and subtract unit using MUX reducing the hardware complexity.
Fig (4) shows the multiplication, add and subtract unit using MUX reducing the hardware complexity.
Fig (5) shows the three multiplications fused with add and subtract unit using MUX reducing the hardware complexity.
The Three terms fused dot product unit is used for conventional complex multiplications. If fused unit is used for golub complex multiplication, the complexity of architecture will become complex.

Multi precision floating point Multiplier
Floating point multiplication is one of the crucial operations in many application domains such as image processing, signal processing etc. Every application requires different working features like Precision, low power consumption, low latency, low area etc.
The proposed work based on the multi precision floating point complex multiplier involving six modes of operations.
Mode1: Auto mode
Mode2: doubleprecision
floating
multiplier
4.2.3 Mantissa Multiplications:
having The mantissa multiplication is the most important and
mantissa size of 8 bit
complex part of the floating point multiplication.
Mode3: doubleprecision
floating
multiplier
Multiplication operation requires more time compared to
mantissa size of 16 bit/p>
increase in delay and power. As double precision multiplier
Mode4: doubleprecision
floating
multiplier
having contains 53 bits of mantissa, it requires 53 X 53 multiplier.
For the implementation of 53 X 53 multiplier requires
mantissa size of 23 bit
much time for the operations.
Mode5: doubleprecision
floating
multiplier
having 4.3 Karatsuba urdhva tiryagbhyam algorithm
mantissa size of 23 bit
It is the combination of the karatsubha algorithm and
Mode6: doubleprecision
floating
multiplier
having urdhva tiryagbhyam algorithm.Kartsubha algorithm is used
mantissa size of 52 bit
for the operands of higher bit lengths, But at lower bit
having
addition. As the no of bits increases, there will be an
The mode with less number of mantissa bits consumes less amount of power. The mode is chosen according to the accuracy needed for the application. If the application needs low power, mode2/mode3 is chosen. If accuracy is given priority mode4, mode5 and mode6 is used in the increasing order of accuracy.
The double precision floating point IEEE754 format is as follows.
1
11
52
Sign Exponent Mantissa
The multi precision Floating point format for the proposed work is as below
3
1
11
52
Mode Sign Exponent Mantissa
The different mode select bit combinations for different modes is shown below:
Mode
Mode select bits
Mode1
000
Mode2
001
Mode3
010
Mode4
011
Mode5
100
Mode6
101
Table 1. Modes of Multiprecision floating point multiplier
A Floatingpoint number is represented in IEEE754 format as
Â±s x be (or) Â±significant x baseexponent

Sign calculation:
The MSB of floating point number represents the sign bit. The sign of the product will be positive if both the numbers are of same sign and will be negative if numbers are of opposite sign. This logic can be implemented using XOR gate.

Addition of exponents:
The input exponents are added together to get the product exponent. The simple ripple carry adder and ripple borrow subtractor is optimal for exponent addition.
lengths it is not suited as it has higher bit lengths. So urdhva tiryagbhyam algorithm is used for lower bit lengths and provides less area and low power advantages when it is used for the lower bit length binary multiplications as the partial products are added in ripple manner. To increase the efficient, ripple adder can be replaced by the carry select and carry save adders. Kartsubha algorithm uses divide and conquer approach where it breaks down the inputs into Most significant half and least significant half and then multiplication is performed. It reduces the number of multipliers required by replacing multiplication operations by addition operations. The Kartsubha algorithm is optimal if width of the inputs is more than 16 bits.In urdhva tiryagbhyam algorithm speed of the multiplications is increased by reducing the number of steps for multiplication using the concept of vertically and crosswise use in Vedic mathematics.
Katarsubha Algorithm:
X=10^(n/2) * a+ b (5)
Y=10^(n/2) * c+d (6)
X*Y=10^n *ac + 10^(n/2)*(ad+bc) +bd (7) Example:
X= 5678, a=56 b=78 Y=1234, c=12, d=34
Step1: ac=672 Step2: bd=2652
Step3: (a+b)(c+d)=134*46=6164
Step4: ad+bc= step3 value – step2 value – step1 value= 2840
Step5: 10^n *ac 6720000
+ bd 2652
+10^(n/2)*(ad+bc) 284000
ADD three values= 70006652 (which is X*Y)
Urdhva tiryagbhyam algorithm:

Actual meaning is vertically and crosswise

Example:

X=ab; Y=cd (8)

X*Y=ab x cd (9)


a b (multiplicand)
c d(multiplier)
Left:a*b=l ; Middle:(a*d)+(b*c)=m Right:b*d =n
X*Y = lmn

Booth Encoded Wallace Tree multiplier
The Booth Encoded Wallace Tree multiplier is a combination of booth tree multiplier and Wallace tree multiplier. This algorithm shows a better performance in terms of power and area than booth multiplier. The booth encoding is used to increase the speed of algebra by reducing the number of partial products and Wallace tree module for decreasing number of levels of addition.
Example: 21 X 25 = ???
21 = 1 0 1 0 1 1 25 = 1 0 0 1 1 1
1 0 1 0 1 1 21 (Multiplicand)
1 0 0 1 1 1 25 (Multiplier) Booths Recoded Multiplier:
1 0 +1 0 0 1
2 +2 1 (Bit Pair recoding)
24 22 20 (power of 2)
0 0 0 0 0 0 0 1 0 1 0 1 1(21)(20)=+21>A
1 1 1 1 0 1 0 1 1 0 0 0 +2(21)(22)=168>B 0 0 1 0 1 0 1
0 0 0 0 0 2(21)(24)=+672>C
0 0 1 0 0 0 0 0 1 1 0 1
Final Result = +525 (add the values )
Steps involved:

Booths Recoded Multiplier: Take 1 as LSB. Next Lsb: If the values of the multiplier are same, the next LSB is Zero. If the values changes from right to left , then it is +1 and the next change of value will give 1.

Bit Pair recoding: The values are taken from the recoding table for booth recoded multiplier values.

Bit Pair recoding, respective power of 2 values and multiplicand values are multiplied. The sum values for the three combinations A, B and Care added gives the final result.
Booth Pair
Recoded Bit pair
0
0
0
0
+1
+1
0
1
1
+1
0
+2
+1
+1
—
+1
1
+1
1
0
2
1
+1
1
1
1
—
Table 2. Modified booth Recoding


MixedRadix Algorithm
Radix FFT Algorithm has less number of complex multiplications compared with radix2 FFT algorithm which is the simplest form of all FFT algorithms. In order to save the number of complex multiplications, Radix8 Algorithm is chosen. Since the 128point FFT is not a power of 8, the MixedRadix FFT Algorithm, including Radix2 and Radix8 FFT algorithm is used.
Hence the hardware can be reduced by using the Mixed Radix Algorithm.

DIT vs DIF
The two main types of FFTs are the Decimation In Time (DIT) and Decimation In Frequency (DIF) varieties. Both calculate two butterfly outputs, X and Y , from two butterfly inputs, A and B, and a complex coefficient W. The DIT approach calculates the outputs using the equations: X = A+BW and Y = ABW, while the DIF approach calculates its outputs using: X = A + B and Y = (A B)W.

Twiddle Factor of a 64 Point FFT
The 64 points FFT design needs (W64)0 to (W64)63
64 complex values as shown in figure 4. The real (I) component is cosine and the imaginary components (Q) is – sine.
W64=ej(2/64) (10)
W64=cos(2 /64)jsn (2/64) (11) Real part = cos(2/64) (12) Imaginary part = sin (2/64) (13)
Figure 4.Twiddle Factor for 64 FFT.
TWIDDLE
FACTOR
REAL
PART
IMAGINARY
PART
0
W64
1
0
W 1
64
0.99518
0.00980
2
W64
0.98070
0.19509
3
W64
0.95694
0.29028
4
W64
0.92387
0.38268
Table 3. Few Samples of Twiddle Factor for 64 point FFT
The twiddle factor is of floating point complex number. In DIT Algorithm the twiddle factor has to multiply with the various outputs of the stages. To implement this, floating point complex multipliers is used. The values for twiddle factor are given in the table 1. The value here is restricted
to 4digits.If the significant figures are increased for the twiddle factors, the Accuracy of the FFT analysis can be increased. The floating point numbers as to be converted into an IEEE 754 format.

FFT in Biomedical Applications
Electroencephalography (EEG) is a mechanism of measuring electrical activity of the brain. Upon studying EEG signals, various health conditions can be monitored and diagnosed ex: brain diseases like Alzheimer, tumors, epilepsy, human behavior etc. These signals are recorded from various positions on scalp through electrodes and conductive media. Diagnostic results are made from the spectral content of EEG signal analysis. The sample must be given as an input to the DITFFT and the outputs is obtained using the Mixed Radix for 128 point FFT.
5. EXPECTED RESULTS
The FFT is implemented using a DIT algorithm. The power, area, delays and Accuracy must be compared for the following combinations.
From all the above Combinations, an efficient combination is applied for the applications for analysis according to the requirement of power, area, speed and Accuracy because always there will be a tradeoff between these parameters.
REFERENCES

Arathi Ajay, Dr.R.Mary Loudre , BITS Pilani, Dubai, UAE VLSI Implementation of an improved multiplier for FFT computation in Biomedical Applications2015 IEEE computer society Annual Symposium on VLSI.

Sangho Yun and Gerald. E. Sobelman, University of Minnesota, Minneapolis, MN 55455 USA, A Low Complexity FloatingPoint Complex Multiplier with a Threeterm DotProduct Unit 2014 IEEE

Namarata sarode, Rajeev Atluri, P.K Dakhole, MixedRadix and CORDIC Algorithm for implementation of FFT2015 IEEE

Dr.Uma B V, Harsha R Kamath, Mohith s, Sreekar V, Shravan Bhagirath, ECE, RVCE, Bangalore,India Area and time optimized Realization of 16 point FFT and IFFT Blocks by using IEEE754 Single precision complex floating point multiplier 2015 ICSCTI, IEEE

Srinivasa Rao, M.Kamaraju,Gudlavalleru Engineering College, AP, India An FPGA implementation of high speed and area efficient Doubleprecision Floating point multiplier using Urdhva tiryagbhyam technique2015 IEEE. Conference on PCCCTSG, Kurnool, AP,India.

De Liu, Mingjiang Wang, Yiwen Wang, Hang Su, Harbin Institute of technology Shenzhen graduate school Shenzhen, china A multi functional Floating Point Multiplier 2015 IEEE.

Guihua Liu1,Quanyuan Feng, Institute of Microelectronics, Southwest Jiaotong University, Southwest University of Science and Technology, ASIC Design of Lowpower Reconfigurable FFT Processor in 2007 IEEE pp 4446.

Arish S, R K Sharma,NIT, Kurushetra, India, Runtime reconfigurable multiprecision floating point multiplier design for high speed, lowpower applicationsIEEE 2015 SPIN.