Cordic Based Design for FFT Processor Architecture

Download Full-Text PDF Cite this Publication

Text Only Version

Cordic Based Design for FFT Processor Architecture

Sabna A M

PG Scholar, Dept .of ECE

Muslim Association College of Engineering Trivandrum, India

Mrs. Suma J John Assistant professor, Dept .of ECE

Muslim Association College of Engineering

Abstract In communication field, Fast Fourier Transform is one of the most computationally intensive and power hungry modules. Some of the vital applications of the Fast Fourier Transform includes Signal analysis, Sound filtering, Data compression, Image filtering etc. Design of FFT hardware is a challenging task while balancing design parameters such as speed, power, area, flexibility and scalability. The work in this paper proposes scalable radix2 – 8 point FFT processor architecture. The scalability of FFT processor achieved by using ping pong logic. The scalable FFT processor was designed and implemented using VHDL, simulated using Modelsim. The future work about radix4 FFT processor developed using cordic algorithm technique.

  1. INTRODUCTION

    The rapid development of broadband wireless applications is driving new solutions for high throughput, area efficient, and reliable communications in wireless fading environment. Almost every branch of engineering and science uses Fourier methods. The Fast Fourier Transform is one of the rudimentary operations in the field of digital signal and image processing. The Fast Fourier Transform is simply a fast computationally efficient way to calculate the Discrete Fourier Transform. The FFT algorithm was presented by Cooley and Tukey in with an aim to compute Discrete Fourier Transform with significant reduction in number of computations. In fact reduced computations due to FFT algorithm helped to decrease power consumption, area and increase system throughput.

    The work in this paper proposes a scalable radix2-8 point FFT processor architecture. Major components of FFT processor are Butterfly unit, data memory, twiddle factor memory, Interconnect, Address generation unit, Control unit. In the processor, two butterfly units are used to compute two outputs per clock cycle. Data memory storing data samples includes two sets of RAM called SetA and SetB. Twiddle factors are stored in a ROM. The link which connects butterfly units and address generation unit with data memory is known

    butterfly outputs are stored in SetA .The scalable FFT processor was designed and implemented using VHDL,simulated using ModelSim.

    The modification work propose a hardware efficient algorithm known as CORDIC for the implementation of radix4 FFT processor. The algorithm is already famous for its simplicity in design ,less hardware utilization and low power consumption. So its use will certainly improve the FFT processor performance.

    The paper is structured as follows: the Section II deals with basics of FFT, Section III describes FFT processor architecture in detail, Section IV explains dataow algorithm for FFT processor architecture, Section V explains results .

  2. FAST FOURIER TRANSFORM

The FFT algorithm was rst presented by Cooley and Tukey in with an aim to compute Discrete Fourier Trans- form (DFT) with signicant reduction in number of computations. In fact, reduced computations due to FFT algorithm helped to decrease power consumption, area and increase system throughput. Direct computation of N-point DFT would require N2 N complex additions and N2 complex multiplication operations according to equation given by

1

() = () (2/) (1)

=0

Where k = 0, 1, 2….N 1.

However, the using FFT algorithm total number of additions and multiplications reduces to N*log2(N) and N/2*log2(N) respectively. An N point FFT is given by

as interconnect. Since two memory sets are employed we require two interconnects which are called interconnectA and interconnectB. An address generation unit is required to

1

=0

=0

X(K)=2

2

(

(

(2) /2) +

provide address for data samples and twiddle factors. And a control unit is needed to co-ordinate and synchronize activities of rest of the components. The scalability of FFT processor acheived by using ping pong logic.The ping-pong logic is as follows: In even numbered stages, butterfly inputs are read from SetA and butterfly outputs are stored in SetB. In odd numbered stages, butterfly inputs are read from SetB and

1

1

2=0 (2 + 1)

(2/

Fig 1:Radix-2 DIF FFT butterfly diagram

  1. SCALABLE FFT PROCESSOR ARCHITECTURE

    The FFT processor is a xed point processor which supports N-point complex value radix-2 FFT computation. Data path of the processor is 16-bit wide and the architecture is congurable at design time for required maximum FFT size Nmax. Once the processor is congured for Nmax, at runtime it supports any radix-2 FFT size from 16 to Nmax. The memories (data memory and twiddle factor memory) are suitably chosen to support Nmax-point FFT computation. The FFT processor consists of following major components as shown in Fig. 2

    • Buttery unit

    • Data memory (RAM)

    • Twiddle factor memory (ROM)

    • Interconnect

    • Address generation unit

    • Control unit

    Fig 2: Scalable FFT processor block diagram

    In the processor, two buttery units are used to compute two outputs per clock cycle. Data memory storing data samples includes two sets of RAM called SetA and SetB. Twiddle factors are stored in a ROM. The link which connects buttery units and address generation unit with data memory is known as interconnect. Since, two memory sets are employed we require two interconnects which are called interconnectA and interconnectB. An address generation unit is required to provide address for data samples and twiddle factors. And a control unit is needed to co-ordinate and synchronize activities of rest of the components.

    Overall dataow through the processor is pipelined and follows ping-pong logic. The ping-pong logic is as follows: In even numbered stages, buttery inputs are read from SetA and buttery outputs are stored in SetB. In odd numbered stages, buttery inputs are read from SetB and buttery outputs are stored in SetA.

    1. Butterfly unit

      Fig 3: Butterfly unit

      The buttery unit was designed by J.Takala et al. to support radix-2 DIF buttery operation. We modied and customized it to support Q-14 xed point computation required for our implementation. Buttery unit is shown in Fig. 3 and the dotted lines indicate imaginary data. The inputs and outputs of

      buttery unit are 16-bit xed point values and it has three input and two output ports. Two input ports are shared between QR, QI and PR, PI respectively. The third input port is for twiddle factor which is shared between WR and WI. Two output ports available are shared between YR, YI and XR, XI respectively. The critical path of FFT processor includes a multiplier and an adder which is part of complex multiplier. When buttery unit is in operation, Q and P are read every alternate clock cycle and same is the case with WR and WI.

    2. Data memory (RAM)

      The data memory stores data samples required for FFT computation. It is RAM based memory and consists of two memory sets SetA and SetB. Each memory set contains 4 memory banks. SetA includes RAM0, RAM1, RAM2, RAM3 while SetB includes RAM4, RAM5, RAM6, RAM7. Memory banks are clocked dual port memories enabling access within the processor as well as from external interface. Four memor banks were chosen since they offer concurrent access to four data samples at any given time. The read-write operation latency is one clock cycle. Since, two buttery units operate in parallel concurrent access (read-write) to four data samples is necessary for high throughput. Smallest unit of data stored in memory is a 16-bit word. The real and imaginary parts of complex data are 8-bit values which are packed into a 16-bit value. The lower 8-bits contain imaginary part while upper 8- bits contain real part. Size of a memory bank is congurable at design time and it varies depending on maximum number of FFT points Nmax.

    3. Twiddle factor memory (ROM)

      The twiddle factor memory is a ROM, its a clocked dual port memory which supplies twiddle factors for buttery unit0 and for buttery unit1 .The maximum number of twiddle factors required for an N-point FFT computation is N/2

    4. Interconnect

      Interconnect is the link between buttery unit and data memory. An interconnect receives address from address generation unit and routes it to data memory. The data from buttery units to memory or from memory to buttery units are routed via interconnect. The data or address routed by interconnect are controlled through control signals issued from control unit. Internal architecture of an interconnect consists of multiplexers and registers. The multiplexers act as switches to connect appropriate inputs to outputs wherein the outputs are registered. Two such interconnects are utilized in the processor architecture and are named interconnectA and interconnectB. The interconnectA connects buttery unit0 and buttery unit1 with memory SetA while interconnectB connects buttery unit0 and buttery unit1 with memory SetB.

    5. Address generation unit

      Address generation unit generates addresses in each stage of FFT computation for reading input data samples, twiddle factors and storing output data samples. A novel address gen- eration scheme based on conict free access of operands was developed for scalable FFT processor architecture. The address generation scheme is capable of supporting address generation for an N-point FFT computation. It uses two basic m-bit counters.

    6. Control unit

    Control unit is the master of FFT processor. It generates control signals at proper timing to control and coordinate activities of all the other blocks in the processor.

  2. DATAFLOW ALGORITHM FOR FFT PROCESSOR

    A novel algorithm based on dataow in the FFT processor architecture. The dataow algorithm is presented as follows. In even numbered stages, input data samples to buttery units are read from SetA memory which are routed to buttery units via interconnectA. The twiddle factors are read from ROM and supplied to buttery units. After computation the output data samples are routed via interconnectB before they are stored in SetB memory. In odd numbered stages, input data samples to buttery units are read from SetB memory

    are routed to buttery units via interconnectB. The twiddle factors are read from ROM and supplied to buttery units. After computation the output data samples are routed via interconnectA before they are stored in SetA memory.

  3. RESULTS

    A scalable radix-2 N-point novel FFT processor archi- tecture based on a reasonable balance between performance, power, area, exibility and scalability parameters was pro- posed. The scalable FFT processor was designed, implemented using VHDL, simulated using ModelSim.

    .

    Fig 4:Output of 8 point FFT

    Fig 5 : Output of 8 point FFT

  4. CONCLUSION

A scalable radix-2 N-point novel FFT processor archi- tecture based on a reasonable balance between performance, power, area, exibility and scalability parameters was pro- posed. The scalable FFT processor was designed, implemented using VHDL, and simulated using ModelSim.

The future work involves radix 4 FFT processor using CORDIC Algorithm.

REFERENCES

  1. Mounir Arioua, Said Belkouch, Mohamed Agdad, Moha Mrabet Hassani,VHDLimplementation of an optimized 8-point FFT/IFFT processor in pipeline architecture for OFDM systemsDepartment of Electrical Engineering,ENSAM, Department of Physics,Faculty of Science Semlalia,2010 IEEE

  2. Vasantha Sudheer N, Venu Gopal BFPGA Implementation of 64 Point FFT ProcessorInternational Journal of Innovative Technology and Exploring Engineering (IJITEE)ISSN: 2278-3075, Volume-1, Issue-4,

    September 2012

  3. Asmita HaveliyaDesign and Simulation of 32-Point FFT Using Radix-2 Algorith forFPGA ImplementationDOI 10.1109 /ACCT.2012.43,2012 IEEE

  4. Gijung Yang, Yunho JungDesign of Low-Complexity FFT Processor for MIMO-OFDM Based SDR SystemIEICE Electronics Express, volum4, no.23 pp.750-754,Dec.2007.

  5. Deepak Revanna,Omer Anjum,Manuele Cucchi,Roberto Airoldi,Jari NurmiA Scalable FFT processor Architecture for OFDM Based Communication Ststems,Department of Electronics and communication

Engineering,2013IEEE

[6]. Y.-T. Lin, P.-Y. Tsai, and T.-D. Chiueh, Low-power variable-length fast fourier transform processor, IEEE Proceedings Computers and

Digital Techniques, vol. 152, no. 4, pp. 499 506, july 2005.

Leave a Reply

Your email address will not be published. Required fields are marked *