 Open Access
 Total Downloads : 793
 Authors : Enis Ã‡Erri, Marsida Ibro
 Paper ID : IJERTV4IS020424
 Volume & Issue : Volume 04, Issue 02 (February 2015)
 Published (First Online): 23022015
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
FFT Implementation on FPGA using Butterfly Algorithm
Enis Ã‡ERRI1
Aleksander Moisiu University Faculty of Information Technology Durres, Albania
Marsida IBRO2
Aleksander Moisiu University Faculty of Information Technology Durres, Albania
Abstract Today digital signals processing operation requires several multiplication and for the same we need very fast multiplier for a wide range of requirements for hardware and speed. This paper presents a FFT implementation using FPGA for fast and area efficient digital multiplier based on Butterfly algorithm. FFT is an efficient tool in signal processing in the linear system analysis. Complex arithmetic modules like multiplier and powering units are now being extensively are used in design. The parallel pipelined technology is introduced to increase the throughput of the circuit at low frequency. Based on low power technology FFT power saving is achieved. For the purpose of this implementation, we have used the Altera EPF8282ALC844 element and SNDSP54B device platform FPGA.
Keywords DSP, FFT algorithm, Butterfly algorithm, FPGA
INTRODUCTION
In the discrete Fourier transform (DFT), both the input and the output consist of sequences of numbers defined at uniformly spaced point in time and frequency, respectively. Accordingly, the DFT lends itself directly to numerical evaluation on digital computers. Moreover, the computation can be implemented most efficiently using a class of algorithms, called Fast Fourier Transform (FFT) algorithms. The FFT refers to a class of efficient algorithms for computing the DFT. The algorithms are efficient in that they use a greatly reduced number of arithmetic operations as compared with the brute force computation of the DFT. Fourier transform transforms a timedomain function into the frequency domain. Inversely, the Inverse Fourier Transform converse a frequencydomain function into the time domain. For an aperiodic continuous signal, the continuous Fourier Transform is expressed by:
and the continuous inverse Fourier Transform is:
Equations (1) and (2) are known as the continuous Fourier Transform pair for aperiodic signals which related the time and frequency domains. Similarly form an aperiodic digital
sequence, the discretetime Fourier Transform (DTFT) is given by:
and the discretetime inverse Fourier Transform is:
Equations (3) and (4) are known as the discretetime Fourier Transform (DTFT) pair for aperiodic digital sequences which related the time and frequency domains. Although, equation
(3) gives the frequency spectrum of a signal, there are two implementation problems in practice. The first problem is associated with the limits of summation which extend from to , implying the length of the signal must be infinitely long. The second problem is associated with the frequency variable which is continuous implying there is an infinite number of frequency points to be computed [1, 9]. To overcome the first problem, the limit of the summation are reduced, thereby truncating an infinitely long signal to a finite length signal. This is known as windowing because only a portion of the actual discrete signal is available for the transform operation. To overcome the second problem, the number of frequency points to be computed is confined to a finite number and the selected frequency points should be spaced evenly over the range to. Taking these factors into account, equation (3) can be expressed in a new form to give the discrete Fourier Transform (DFT), that is:
where is called the twiddle factor and .
The twiddle has two important properties: symmetry and periodicity.
The smallest transform used in 2point DFT which is known as radix2. It processes a group of two samples. Radix2 is the fastest method for calculating FFT. These are amongst the one of large number of FFT algorithm being developed. Radix2 algorithm are useful if N is a regular power of 2(N=2^p). The term FFT is actually slightly ambiguous
because there are several commonly used FFT algorithms. There are two different radix2 algorithms, they are: decimationintime (DIT) and decimationinfrequency (DIF) algorithms. A butterfly unit block consisting of N/2 butterflies. Each one containing two (N/2)*16bits ROMs to store the sine and cosine of the twiddle factors, four 16*16 multipliers in 2s complement, six 32bits accumulators and two special operators to adequate the data format [2, 4].
1. DIT FFT Algorithm
The decimationintime FFT (DIT FFT) is a process of dividing the Npoint DFT into two (N/2)point DFTs by splitting the input samples into even and odd indexed samples. The two (N/2)point DFTs are then further divided in the same way into (N/2)point DFTs and this decomposition process continues until 2point DFTs are obtained [5]. This approach is demonstrated below. Since
, the following equation can be derived by rearranging equation (5) and then:
where , and are the DFTs of the even and odd indexed samples respectively. Although, equation (6) still needs to be valuated N times for k varying from 0 to N1, each summation only needs to be computed N/2 times for k varying from 0 to because and are repetitive with an internal interval of N/2 (periodicity property). Consequently, the original DFT computation time is reduced by approximately 50%.
Consequently by restricting k to the range 0 to , equation (6) can be rewritten in two parts; one part for the first half of the frequency points and other part for the second half of the frequency points
, that is
The implementation of equation (7) for an 8point DFT can be shown as a butterfly diagram as in Figure 1. Each butterfly takes a pair of inputs and generates a pair of outputs.
Figure 1. Butterfly diagram for 8point DFT with one decimation stage
Applying the same decomposition technique again to divide two N/2points DFTs into four N/4point DFTs by splitting the even and oddindexed sequences into four subsequences, see the stage 2 in Figure 2. Consequently, the same technique is applied to divide four N/4point DFT into eight N/8point DFTs, see stage 3 in Figure 2.
Figure 2. Butterfly diagram for a 8point DIT FFT
Each decomposition stage doubles the number of separate DFTs, but halves the number of points in DFT. In computing an Npoint DFT, this decimation process can be repeated times. The number of computation stages is seen to be 3 since
In the stage 1 are required N additions/substractions and the other two stages require N complex additions/substractions (or 2N additions/substractions) and N/2 complex multiplications (or 2N multiplications and N additions/substractions, since each complex multiplication requires 4 ordinary multiplications and 2 additions/substractions). Since there are stages for an Npoint FFT, the total number of multiplications is and the total number of additions is
. This can be compared with the
multiplications and additions required for the direct implementation of DFT.

DIF FFT Algorithm
In contrast to the DIT FFT which decomposes the DFT by recursively splitting the input samples in the time domain into subsequences, the decimationinfrequency FFT (DIF FFT) decomposes the DFT by recursively splitting the sequence elements in the frequency domain into smaller subsequences [5]. Dividing equation (5) into two N/2point DFTs by splitting the input samples into halves yields:
The implementation of equation (9) for a 8point DFT is shown as butterfly diagram in Figure 3.
Figure 3. Butterfly diagram for 8point DFT with one decimation stage/p>
In contrast to Figure 2, Figure 4 shows that DIF FFT has its input data sequence in natural order and the output sequence in bitreversed order. For a 512point FFT, 512points cosine and sine tables should be built to involve this computation.
Figure 4. Butterfly diagram for 8point DIF FFT

Implementation
To implement the computation of butterfly with C54x instructions we have considered equations (8) and (9). If letting , and , these two equations can be rewritten as:
Figure 5. DIT Radix 2 Butterfly
Since sequences P and Q and twiddle coefficient W are complex, we can divide these butterfly computations into two parts (real and imaginary). In practical computation, data of each part is stored in memory buffer. This is called inplace operation.
At stage 1, there are only two values of twiddle factor (k=0 and k=n/2) required for DFT computations. If k=0, then
since . If
, then .
At stage 2, there are four twiddle required for the butterfly computation , and . Complex computations are required for the butterfly computations after stage 3. For a 1024point DIT FFT 10 stages are needed to achieve the butterfly computations.
In DSP system, imaginary part must be placed by zero for FFT computations. This process is called packing. The unpacking process is the reverse operation of packing. It is used to unpack the FFT outputs to Npoint complex values because the FFT outputs are N/2point complex values. After unpacking, the 1024 complex values are used to compute the power values of FFT outputs [7]. Each power values is computed by the formula , where R is the real part and I is the imaginary part [6, 8].
In this implementation we have used a deterministic 3 kHz signal with lowlevel white noise created by MATLAB. Once the FFT processor is executed, the deterministic data will be loaded and stored in input data buffer 1C00H1FFFH.
Figure 6. Sine and cosine waves
Figure 7. Spectrum of input signal
Figure 8. Power of FFT signal spectrum

CONCLUSIONS
In this paper we have described the basic algorithm for radix
2 FFTs. Modeling and hardware description of FFT approaches such as Butterfly algorithms by VHDL were introduced and the realization of them on Altera EPF8282ALC844 chip was proposed. The power impact parameter of the technique has been observed and compared with the conventional FFT blocks to analyze the performance. This paper presented a new, very high speed FFT architecture based on the radix2 butterfly algorithm. A fully pipelined, processing core of a 1024point FFT has been implemented in FPGA. The results shows a very high operating frequencies and low latencies of the implementation.
REFERENCES

Arman Chahardahcherik, Yousef S. Kavian, Otto Strobel, and Ridha (2011). Implementing FFT Algorithms on FPGA (IJCSNS). International Journal of Computer Science and Network Security, Vol. 11, No.11, November 2011.

Aniket Shukla, Mayuresh Deshmukh (2012), Comparative Study of Various FFT Algorithm Implementation on FPGA, International Journal of Emerging Trends in Signal Processing Vol.1, Issue 1, November 2012.

Chandan.M,S.L.Pinjare, Chandra Mohan Umapthy Chandan M, S.L Pinjare (2012). Optimized FFT Design using Constant Coefficient Multiplier.International Journal of Emerging Technologyand Advanced Engineering, Vol.2, Issue 6, June 2012.

Niladri Mandal, Souragni Ghosh (2012). A Modified Fast FFT Algorithm for OFDM. International Journal of Soft Computing and Engineering (IJSCE), Vol.1, Issue6, January 2012.

K.Sreekanth Yadav, V.Charishma, , Neelima Koppala (2013).Design and simulation of 64 point FFT using Radix 4 algorithm for FPGA Implementation, International Journal of Engineering Trends and Technology Volume 4 Issue2 2013

M. Kannan and S.K. Srivatsa (2007). Low Power Hardware Implementation of High Speed FFT Core. Journal of Computer Science. Vol. 3, Issue 6, 2007.

S. He and M. Torkelson, Design and implementation of a 1024point pipeline fft processor, in Proc. IEEE Custom Integrated Circuits Conf.,
Santa Clara, CA, May 1114 1998, vol. 2, pp. 131134

Senthil Sivakumar M & Banupriya M & Arockia Jayadhas S (2012). Design of Low Power High Performance 16Point 2 Parallel Pipelined FFT Architecture. International Journal of Electronics,`Communication& Instrumentation Engineering Research and Development (IJECIERD). Vol. 2, Issue 3 September 2012.

Sneha N.Kherde, Meghana Hasamnis (2011). Efficient Design and Implementation of FFT. International Journal of Engineering Science & Technology, Vol. 3 Issue Sup, p10, February 2011.