FFT Processor For High Speed Applications

DOI : 10.17577/IJERTV2IS121104

Download Full-Text PDF Cite this Publication

Text Only Version

FFT Processor For High Speed Applications

FFT Processor For High Speed Applications

A Sreeharsha, B. Rambabu, Solomon JV gotham, A Venkata Krishna

AbstractA high speed 64-point FFT processor based on FPGA is implemented using modified Cooley-Tookey algorithm. The architecture is implemented using an 8 point FFT in two stages along with pre-calculated twiddle factors, which also employs complex multiplier and bit-parallel multipliers to achieve an FFT processor that consumes considerable low power and is almost 8 times faster than the existing architectures.

Keywords64-point FFT processor; FPGA; Cooley-Tookey algorithm


Fast Fourier Transformation is a popular algorithm in modern Digital Signal Processing, Telecommunications, radar and reconnaissance applications etc. Efficient FFT processors are required for applications in orthogonal frequency division multiplexing (OFDM) systems such as IEEE 802.11a/g, WiMAX, Digital video Broadcasting etc. FFT implementations on FPGA have been performed by using Distributed arithmetic, complex multipliers, CORDIC algorithm and Global Pipeline architecture. The above techniques do not support high speed applications.

The most commonly used FFT Processors based on FPGA uses single butterfly architecture. The DFT computations have a time complexity of N2 so the proposed modified COOLEY TUKEY algorithm efficiently reduces the time complexity to Nlog2N, where N is the FFT size.

There are var us methods of implementations of

Complex multipliers and bit-parallel multipliers are used to store the twiddle factors.

  1. The discrete Fourier transform (DFT) Xk of an N-point discrete-time signal xn is defined as:


    Xk = xn WnkN , 0k N-1, (1)


    Where the twiddle factor WnkN = e-j2nk/N is the N- point primitive root of unity. Usually, FFT analyzes an input signal sequence by using Decimation in Frequency (DIF) or a Decimation in time (DIT) Decomposition for the construction of signal flow graph.

    If N is the product of two factors with N=N1*N2, the indices n and k are given as



    n=N1*n2+n1, 0n2N2-1and 0n1N1-1, k=N2*k1+k2, 0k2N2-1and 0k1N1-1.

    W N can be split as

    = e = e

    = e = e

    = W

    = W





    nk -j2nk/N -j2(k N +k )(n N +n )/N N W N 1 2 2 2 1 1 1 2


    FFT which can be mainly classified into memory based and pipeline architecture styles. The memory based architecture is

    k n

    2 2N2

    k n

    1 1N1

    k n

    2 1N

    widely known as single processing element (PE) approach which consists of a main PE. This PE acts as single stage 8pt FFT. The Pipelined architecture can overcome the problems of long latency, lower throughput at the cost of marginal hardware overhead by reducing number of stages, compared to 64pt FFT. The pipeline FFT processor uses two design types of which one uses a single path delay feedback (SDF) architecture and the other uses a multiple path delay commutator (MDC).SDF uses less memory space and the computations involved in multiplications are reduced. So the SDF pipeline architecture FFT is employed in this work which is advantageous for implementation of low-power designs. The complex multipliers used in the processor are realized with shift-and-add operations.

    In order to improve the power consumption and chip area, an architecture that uses a radix-2 pipeline architecture which consumes low power is proposed for the FFT processor.

    The radix-2 DIF-FFT described here requires less number of

    complex multipliers.

  2. The PE approach has long latency, low throughput and cannot be parallelized. This problem can be overcome by Parallel pipelined architecture which implements only a single 8-point FFT. A 64-point FFT can be implemented with eight 8-point FFTs. Hence, a single 8-point FFT is first implemented and it is reused wherever required, thus reducing the design complexity. This design style reduces latency at the cost of a marginal hardware overhead. This is because an additional unit is required for the pre-calculated twiddle factors that are stored in the form of LUTs.

    D E M U X

    D E M U X





    M U


    M U







    counts and power consumption are reduced. Especially, the proposed design consumes considerably less active power. The functional simulation is verified using Verilog HDL. The design is also implemented on FPGA chip acknowledging that the architecture is working well.



    Fig. 1 Architecture of FFT Processor.

    A DEMUX operation is used to de-series the input data into 8 parallel channels. A pipeline 8-point FFT is employed to this parallel data. Then the outputs of the pipeline FFT are multiplied with 8 complex coefficients from the LUTs. Lastly another pipeline 8-point FFT processor is employed to the outputs of the 8 complex multiplies. Later a MUX operation is performed to make a serial output from FPGA.

    The pipelined architecture is illustrated in the form of a signal flow graph. The basic operation is a 2-point DFT butterfly having the following form.

    Xm+1(p) = Xm(p)+ Xm(q)*WNp (2)

    Xm+1(q) = Xm(p)- Xm(q)*WNp (3)

    Fig. 2 Signal Flow Graph

  3. The latency is very less compared to the existing architectures and in turn operates at higher speeds. The gate

A 64-point FFT processor for low power and high speed applications is designed. A single 8-point FFT is reused for the operations thereby greatly reducing the design and hardware complexity. Also, this design is a time-efficient approach as the parallel pipeline structure gives up to 8 times the operational frequency than the existing designs. This shows that the design is done with minimal hardware cost and reduced power consumption.


  1. H.L. Groginsky and G.A. Works, A pipeline fast Fourier transform, IEEE Transactions on Computers, vol. C-19, no. 11, pp. 1015-1019, Nov. 1970.

  2. Koushik Maharatna, Eckhard Grass, and Ulrich Jagdhold, A 64-Point fourier transform chip for high-speed wireless LAN application using OFDM, IEEE Journal of Solid-State Circuits, vol. 39, no. 3, pp. 484-493, Mar. 2004.

  3. Y.T. Lin, P.Y. Tsai and T.D. Chiueh, Low-power variable-length fast Fourier transform processor, IEE Proc. Comput. Digit. Tech., vol. 152,no. 4, pp. 499-506, July 2005.

  4. Chu Yu, Yi-Ting Liao, Mao-Hsu Yen, Pao-Ann Hsiung, and Sao-Jie Chen, A Novel Low-

    Power 64- point Pipelined FF/IFFT Processor for OFDM Applications, in Proc. IEEE Intl Conference on Consumer Electronics. Jan. 2011, pp. 452-453.

  5. Chin-Teng Lin, Yuan-Chu Yu, and Lan-Da Van, A low-power 64-point FFT-IFFT design for IEEE 802.11a WLAN application, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2006, pp. 4523- 4526.

  6. Yuan Chen, Yu-Wei Lin, and Chen-Yi Lee, A Block Scaling FFT/IFFT Processor for WiMAX Applications, in Proc. IEEE Asian Solid-state Circuits Conf., 2006, pp. 203-206.

  7. Sheng-Yeng Peng, Kai-Ting Shr, Chao- MingChen,Yuan-HaoHuang, Energy- Efficient28rv2048/1536 point FFT Processor with Resource Block Mapping for 3GPP-LTE system, in Proc. 2010 InternationalConference on Green Circuits and Systems (ICGCS), 2010, pp. 14-17

  8. Minhyeok Shin and Hanho Lee, A High-Sped Four-Parallel Radix-24 FFT/IFFT Processor for UWB Applications, in Proc. IEEE Int. Symp. Circuits and Systems, 2008, pp. 960-963. [14] A. V. Oppenheim, R. W. Schafer, and J. R. Buck:Discrete- Time Signal Processing, Englewood Cliffs,NJ: Prentice-Hall, 1999.

  9. PEREZ-PASCUAL,A SANSALONI, T., and VALLS, J.: FPGA based radix-4butterflies for HiperLAN2. Proc. IEEE International symposium on Circuits and Systems (ISCAS2002) , 2002, pp. III.277III.280

  10. Z. Szadkowski: 16-point Discrete Fourier transform based on the Radix-2 FFT algorithm implemented into Cyclone FPGA as the UHERC trigger for horizontal air showers, Proc. of the 29th ICRC, Pune, 2005.

  11. M. Sanchez, M. Garrido, M. Lopez-Vallejo, J. Grajal, and C.

    Lopez-Barrio: Digital channelised receivers on FPGA platforms, Proc. IEEE International Radar Conference, May 2005, pp. 816 -821.

  12. Shengmei Mou and Xiaodong Yang: Design of a High-speed

    FPGA based 32-bit Floating-point FFT Processor. IEEE Eight ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2007, pp. 8487.

  13. Xilinx Inc.: Xilinx LogiCore: Fast Fourier Transform v3.1,2004,

  14. T. Sansaloni, A. Perez-Pascual, V. Torres and J. Valls: Scheme for Reducing the Storage Requirements of FFT Twiddle Factors on FPGAs, Journal of VLSI Signal Processing, 2007, 47(1), pp. 183187.

A Sreeharsha received the B

.Tech degree in Electronics and communication engineering from JNTUH in 2008. Currently pursuing M.Tech degree from Kaushik college of engineering (JNTUK). She has a teaching experience of 3 years in affiliated engineering colleges of JNTUK as Assistant professor. Her area of interest is in the field of VLSI design.

B Rambabu received the M.Tech degree in VLSI and Embedded System. He is currently working as an Assistant Professor in Kaushik college of engineering (JNTUK) and has a teaching experience of 6 years in JNTUK affiliated colleges. His areas of interest include VLSI and Embedded systems.


Professor and HOD, Electronics and communication in Kaushik college of engineering, Visakhapatnam. He has a teaching experience of 22 years in various colleges.


Assistant Professor in GVP college of Engineering for Women, Visakhapatnam. He has 2years of Industry experience in VLSI Design and 8 years of teaching experience. His areas of interest include Digital and Analog IC design, Low power VLSI.

Leave a Reply