 Open Access
 Total Downloads : 423
 Authors : A Sreeharsha, B. Rambabu, Solomon Jv Gotham, A Venkata Krishna
 Paper ID : IJERTV2IS121104
 Volume & Issue : Volume 02, Issue 12 (December 2013)
 Published (First Online): 28122013
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
FFT Processor For High Speed Applications
FFT Processor For High Speed Applications
A Sreeharsha, B. Rambabu, Solomon JV gotham, A Venkata Krishna
AbstractA high speed 64point FFT processor based on FPGA is implemented using modified CooleyTookey algorithm. The architecture is implemented using an 8 point FFT in two stages along with precalculated twiddle factors, which also employs complex multiplier and bitparallel multipliers to achieve an FFT processor that consumes considerable low power and is almost 8 times faster than the existing architectures.
Keywords64point FFT processor; FPGA; CooleyTookey algorithm
INTRODUCTION
Fast Fourier Transformation is a popular algorithm in modern Digital Signal Processing, Telecommunications, radar and reconnaissance applications etc. Efficient FFT processors are required for applications in orthogonal frequency division multiplexing (OFDM) systems such as IEEE 802.11a/g, WiMAX, Digital video Broadcasting etc. FFT implementations on FPGA have been performed by using Distributed arithmetic, complex multipliers, CORDIC algorithm and Global Pipeline architecture. The above techniques do not support high speed applications.
The most commonly used FFT Processors based on FPGA uses single butterfly architecture. The DFT computations have a time complexity of N2 so the proposed modified COOLEY TUKEY algorithm efficiently reduces the time complexity to Nlog2N, where N is the FFT size.
There are var us methods of implementations of
Complex multipliers and bitparallel multipliers are used to store the twiddle factors.

The discrete Fourier transform (DFT) Xk of an Npoint discretetime signal xn is defined as:
N1
Xk = xn WnkN , 0k N1, (1)
n=0
Where the twiddle factor WnkN = ej2nk/N is the N point primitive root of unity. Usually, FFT analyzes an input signal sequence by using Decimation in Frequency (DIF) or a Decimation in time (DIT) Decomposition for the construction of signal flow graph.
If N is the product of two factors with N=N1*N2, the indices n and k are given as
nk
nk
n=N1*n2+n1, 0n2N21and 0n1N11, k=N2*k1+k2, 0k2N21and 0k1N11.
W N can be split as
= e = e
= e = e
= W
= W
W
W
W
W
nk j2nk/N j2(k N +k )(n N +n )/N N W N 1 2 2 2 1 1 1 2
io
FFT which can be mainly classified into memory based and pipeline architecture styles. The memory based architecture is
k n
2 2N2
k n
1 1N1
k n
2 1N
widely known as single processing element (PE) approach which consists of a main PE. This PE acts as single stage 8pt FFT. The Pipelined architecture can overcome the problems of long latency, lower throughput at the cost of marginal hardware overhead by reducing number of stages, compared to 64pt FFT. The pipeline FFT processor uses two design types of which one uses a single path delay feedback (SDF) architecture and the other uses a multiple path delay commutator (MDC).SDF uses less memory space and the computations involved in multiplications are reduced. So the SDF pipeline architecture FFT is employed in this work which is advantageous for implementation of lowpower designs. The complex multipliers used in the processor are realized with shiftandadd operations.
In order to improve the power consumption and chip area, an architecture that uses a radix2 pipeline architecture which consumes low power is proposed for the FFT processor.
The radix2 DIFFFT described here requires less number of
complex multipliers.

The PE approach has long latency, low throughput and cannot be parallelized. This problem can be overcome by Parallel pipelined architecture which implements only a single 8point FFT. A 64point FFT can be implemented with eight 8point FFTs. Hence, a single 8point FFT is first implemented and it is reused wherever required, thus reducing the design complexity. This design style reduces latency at the cost of a marginal hardware overhead. This is because an additional unit is required for the precalculated twiddle factors that are stored in the form of LUTs.
D E M U X
D E M U X
8 POINT PIPELI NE FFT
8 POINT PIPELI NE FFT
Re_In
Im_In
M U
X
M U
X
Re_O
Im_Op
COEFFICIENTS
COEFFICIENTS
MULTIPLIE
counts and power consumption are reduced. Especially, the proposed design consumes considerably less active power. The functional simulation is verified using Verilog HDL. The design is also implemented on FPGA chip acknowledging that the architecture is working well.
8 POINT PIPELI NE FFT
8 POINT PIPELI NE FFT
Fig. 1 Architecture of FFT Processor.
A DEMUX operation is used to deseries the input data into 8 parallel channels. A pipeline 8point FFT is employed to this parallel data. Then the outputs of the pipeline FFT are multiplied with 8 complex coefficients from the LUTs. Lastly another pipeline 8point FFT processor is employed to the outputs of the 8 complex multiplies. Later a MUX operation is performed to make a serial output from FPGA.
The pipelined architecture is illustrated in the form of a signal flow graph. The basic operation is a 2point DFT butterfly having the following form.
Xm+1(p) = Xm(p)+ Xm(q)*WNp (2)
Xm+1(q) = Xm(p) Xm(q)*WNp (3)
Fig. 2 Signal Flow Graph

The latency is very less compared to the existing architectures and in turn operates at higher speeds. The gate
A 64point FFT processor for low power and high speed applications is designed. A single 8point FFT is reused for the operations thereby greatly reducing the design and hardware complexity. Also, this design is a timeefficient approach as the parallel pipeline structure gives up to 8 times the operational frequency than the existing designs. This shows that the design is done with minimal hardware cost and reduced power consumption.
REFERENCES

H.L. Groginsky and G.A. Works, A pipeline fast Fourier transform, IEEE Transactions on Computers, vol. C19, no. 11, pp. 10151019, Nov. 1970.

Koushik Maharatna, Eckhard Grass, and Ulrich Jagdhold, A 64Point fourier transform chip for highspeed wireless LAN application using OFDM, IEEE Journal of SolidState Circuits, vol. 39, no. 3, pp. 484493, Mar. 2004.

Y.T. Lin, P.Y. Tsai and T.D. Chiueh, Lowpower variablelength fast Fourier transform processor, IEE Proc. Comput. Digit. Tech., vol. 152,no. 4, pp. 499506, July 2005.

Chu Yu, YiTing Liao, MaoHsu Yen, PaoAnn Hsiung, and SaoJie Chen, A Novel Low
Power 64 point Pipelined FF/IFFT Processor for OFDM Applications, in Proc. IEEE Intl Conference on Consumer Electronics. Jan. 2011, pp. 452453.

ChinTeng Lin, YuanChu Yu, and LanDa Van, A lowpower 64point FFTIFFT design for IEEE 802.11a WLAN application, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2006, pp. 4523 4526.

Yuan Chen, YuWei Lin, and ChenYi Lee, A Block Scaling FFT/IFFT Processor for WiMAX Applications, in Proc. IEEE Asian Solidstate Circuits Conf., 2006, pp. 203206.

ShengYeng Peng, KaiTing Shr, Chao MingChen,YuanHaoHuang, Energy Efficient28rv2048/1536 point FFT Processor with Resource Block Mapping for 3GPPLTE system, in Proc. 2010 InternationalConference on Green Circuits and Systems (ICGCS), 2010, pp. 1417

Minhyeok Shin and Hanho Lee, A HighSped FourParallel Radix24 FFT/IFFT Processor for UWB Applications, in Proc. IEEE Int. Symp. Circuits and Systems, 2008, pp. 960963. [14] A. V. Oppenheim, R. W. Schafer, and J. R. Buck:Discrete Time Signal Processing, Englewood Cliffs,NJ: PrenticeHall, 1999.

PEREZPASCUAL,A SANSALONI, T., and VALLS, J.: FPGA based radix4butterflies for HiperLAN2. Proc. IEEE International symposium on Circuits and Systems (ISCAS2002) , 2002, pp. III.277III.280

Z. Szadkowski: 16point Discrete Fourier transform based on the Radix2 FFT algorithm implemented into Cyclone FPGA as the UHERC trigger for horizontal air showers, Proc. of the 29th ICRC, Pune, 2005.

M. Sanchez, M. Garrido, M. LopezVallejo, J. Grajal, and C.
LopezBarrio: Digital channelised receivers on FPGA platforms, Proc. IEEE International Radar Conference, May 2005, pp. 816 821.

Shengmei Mou and Xiaodong Yang: Design of a Highspeed
FPGA based 32bit Floatingpoint FFT Processor. IEEE Eight ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2007, pp. 8487.

Xilinx Inc.: Xilinx LogiCore: Fast Fourier Transform v3.1,2004,

T. Sansaloni, A. PerezPascual, V. Torres and J. Valls: Scheme for Reducing the Storage Requirements of FFT Twiddle Factors on FPGAs, Journal of VLSI Signal Processing, 2007, 47(1), pp. 183187.
A Sreeharsha received the B
.Tech degree in Electronics and communication engineering from JNTUH in 2008. Currently pursuing M.Tech degree from Kaushik college of engineering (JNTUK). She has a teaching experience of 3 years in affiliated engineering colleges of JNTUK as Assistant professor. Her area of interest is in the field of VLSI design.
B Rambabu received the M.Tech degree in VLSI and Embedded System. He is currently working as an Assistant Professor in Kaushik college of engineering (JNTUK) and has a teaching experience of 6 years in JNTUK affiliated colleges. His areas of interest include VLSI and Embedded systems.
SOLOMON JV GOTHAM is the
Professor and HOD, Electronics and communication in Kaushik college of engineering, Visakhapatnam. He has a teaching experience of 22 years in various colleges.
A VENKATA KRISHNA is an
Assistant Professor in GVP college of Engineering for Women, Visakhapatnam. He has 2years of Industry experience in VLSI Design and 8 years of teaching experience. His areas of interest include Digital and Analog IC design, Low power VLSI.