 Open Access
 Total Downloads : 448
 Authors : Srivatsava P S V, Sarada V
 Paper ID : IJERTV3IS031752
 Volume & Issue : Volume 03, Issue 03 (March 2014)
 Published (First Online): 28032014
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Reconfigurable MDC Architecture Based FFT Processor
Srivatsava PSV 1, Sarada V2
Department of Electronics and Communication Engineering SRM University,Chennai,India.
Abstract – Fast Fourier Transform (FFT) is the crucial block in MIMOOFDM systems. Multipath Delay Commutator based pipelined architecture is used to for variable length MIMO OFDM systems by putting Ns data streams at the input. By implementing 4 stream data fixing radix4 at each pipelined stage with variable lengths of 16, 32, 64 and 128 respectively. To emphasize those, needs only one butterfly at each pipeline stage. So the input data stream needs to be scheduled properly before sending it to the processor. They can be used for major communication systems like WiMax, LTE etc..By implementing this MDC processor practically with multiple input multiple outputs data throughput can be increased and canbe achieved some major constraints like area and power expected to be decreased.
IndexTermsFast Fourier Transform (FFT), Multiple input multiple output (MIMO), Orthogonal frequency division multiplexing (OFDM), pipeline multipath delaycommutator (MDC).

INTRODUCTION
Discrete Fourier Transform (DFT) is used in many communication and signal processing systems[1]. But DFT is difficult to implement due to its computational complexity. So Fast Fourier Transform (FFT) is used to reduce the hard ware complexity. In the field of communications FFT has got wide range of applications in Orthogonal Frequency Division Multiplexing(OFDM) systems like WiMax or 3GPP long term evolution(LTE) also wireless modems etc.[2][3]. In OFDM systems Inverse Fast Fourier Transform(IFFT) converts modulated signals from frequency domain to time domain where as Fast Fourier Transform(FFT) is an vice versa. Use of multiple inputs multiple outputs for the OFDM devices data throughput can be increased drastically. In order to manage this we need to design a specific processor which can handle the complexity as the number of input paths increases.
In major of the FFT/IFFT processors pipelined type architecture is used in order to achieve good results. One among the pipelined architecture is continuous flow mixed radix algorithm (CFMR). Continuous Flow Mixed Radix
CFMR is used in MIMO systems; the required memory is increased in a trend proportional to 2xNs. Hardware reduction method such as complex multiplier rescheduling.Flexibility in terms of arithmetic complexity, memory sizes and computation cycles when compared with conventional methods.
Disadvantages:
Complexity of the CFMR becomes more as the input length i.e., Nsbecomesmore. Because of this memory area also becomes more.
Apart from in place memory scheduling the other two pipelined architectures that are mostly used are Single path Delay Feedback (SDF) architecture and Multipath Delay Commutator(MDC) architecture. SDF architecture uses feedback path to generate intermediate values in order to get output without delay [1]. The difference between the single path delay feedback (SDF) and the multiplepath delay commutator (MDC) architecture is the approach for data buffering. The multiplepath delay commutator (MDC) pipeline architecture adopts the DC approach [5] for data buffering. By using this approach, intermediate data are directly output to the next stage or coefficient multiplier instead of being written back in the MDC architecture. The input sequence has been break into r (depend on the radixr to be used) parallel data streams flowing forward with correct ordering for the data elements entering the butterfly unit by proper delays. The normal MDC structure radixr butterfly will be in idle state until the rthinput is received to the butterfly. So because of this 1/rth part of memory can be used. To overcome this, fix radix value should be the same as data input stream value. Considering a 4 path data stream so fix the value of butterfly unit as radix4 [6]. At each stage of FFT processor we use only one butterfly unit so that the memory utilization will be 100%. So by this throughput and memory utilization can be increased dramatically.

FFT ARCHITECTURE The Npoint FFT equation can be expressed as
N 1
nk
(CFMR) FFT processor that uses the mixed radix algorithm. This algorithm supports is an continuous flow of FFT inputs irrespective of the length. CFMR FFT uses two Nsample
X [k] FFT {x[n]} x[n]WN
n0
(1)
memories to create a continuous output data path[4]. One of the memories is used to calculate current FFT/IFFT values and the other stores the previously computed results and controls the output flow.
Advantages:
WhereW nk cos(2 nk / N) j sin(2 nk / N)
N
(2)
is the twiddle factor.
This paper mainly is an application of LTE and WiMax which requires four different lengths 16, 32, 64 and 128. For
larger communications systems the length of the input will be much higher of 2048, 1024,512 and 128 respectively in which the last three stages shares the same hardware.
128 point decomposition
4x4x8
512 point decomposition
4x4x4x8
1024 point decomposition
4x4x4x4x4
512 point decomposition
4x4x4x4x8
The decomposition of the above lengths can be given as shown in Table1.
Table.1.Decomposition of four different FFT/IFFT Lengths Based on the decomposition shown in Table.1 the equation
can be expressed as [1].
inputs has been sent to the processor. The radix8 can be designed at the last stage for N is a power of 2 with a flexibility of radix4 and radix2 combination. The block diagram of such an reconfigurable radix4/8 structure is shown in fig.2.
7 3 3 3 3
X [K ] {{{{ X [N ]W n1k1W N1k1}
n50 n40 n30 n20 n10
4 2048
*W n2k 2W N 2k 2 }W n3k 3W N 3k 3}W n4k 4W n5k 4 }W n5k 5
4
(3)
512 4 4
4 4 8
Where K = k1 + 4k2 + 16k3 + 64k4 + 256k5, k1 = 0 3, k2 = 0 3, k3 = 0 3, k4 = 0 3, k5 = 0 7, and N = 512n1 +
128n2 + 32n3 + 8n4 + n5, N1 = 128n2 + 32n3 +8n4 +n5, N2
= 32n3 +8n4 +n5, and N3 = 8n4 +n5.
Each brace includes computations of a radix4 butterfly and a twiddle factor multiplication. For nonpower of4 FFT/IFFT, a radix8 butterfly is placed at the last stage.
Hence, the last stage is configurable for both radix4 and radix8 computation. The proposed radix4/radix8 butterfly for the last stage is shown in Fig. 1.where a radix8 butterfly has the data path indicated by both solid and dashed lines, whereas a radix4 butterfly has the data path indicated by solid lines only. The regularity of the decomposition makes the processor scalable[1].
This means parameterized registertransferlevel source code is highly reusable to extend the number of stages for a large N. By choosing this type of architecture the hardware can be utilized efficiently by sharing of same structure at each stage of the processor.

DELAY COMMUTATORARCHITECTURE Storage elements are the crucial parts in MDC
architecture.For MIMOOFDM systems is inputs are scheduled properly such that memory utilization and also sharing of hardware can be increased from 25% to 100%. This makes processor is much suitable for MIMOOFDM systems [7]. So, go for memory scheduling principle in order to decrease the memory utilization on the architecture before the
Fig. 1. Signal Flow Graph for Proposed System
Fig.2.Reconfigurable Radix4/8 sructure.
Depending on the radix_8 enable signal the operation can be processed either it should work as radix4 or radix8. The value of iLSB is kept constant as high for entire operation.

Basic Radix4 MDC operation
The input sequence has been split into r (depend on the radix r to be used) parallel data streams flowing forward with correct ordering for the data elements entering the butterfly unit by proper delays. As we are using fixed radix4 the inputs are divided into 4 parallel streams [7]. The block diagram of Npoint radix4 DIF MDC pipeline FFT processor is shown in Fig 4. From the Fig.4, the elements between PEs consist of shift registers and a commutator switch. The goal of these elements is to form a proper set of data for the next PE. The R4MDC architecture contains 3log4N3multipliers,
8log4N adders and 2.5N 4 registers. The block diagram of MDC based architecture for MIMOOFDM systems is
showing in fig.3.
Fig.3 Block diagram of MDC based FFT processor for MIMOOFDM systems

Input stream b) Sorted inputs by memory scheduling c) unsorted outputs from processor d) sorted outputs


Input Memory Scheduling
As shown in fig.3 (a) initially the inputs streams are sent parallel which are four streams. All the four streams are of same length. Here the task is to convert the input streams as fig.3 (a) to fig.3 (b) i.e., the data in parallel is to be converted into serial inputs. In order to make this conversion, make use of memory banks or registers to store the sorted inputs and for the processing of the inputs to be scheduled properly.
Fig.4.The two stages of Npoint R4MDC Architecture There are totally 12 memory cells to store the 4 input streams similarly remaining streams first Â¾ parts each of 1/4th part stored in the 9 memory banks for the B, C and D streams respectively [8],[9]. Each stream divided into four sub parts a,b,c and d. Where a,b, and c are stored in 3 memory banksand fourth sub stream is directly fed to the radix4 butterfly unit which lies in the first stage of the architecture.


COMPLETE ARCHITECTURE OPERATION OF
THE MDC STRUCTURE
Thescheduled inputs are stored in the memory cells according to the stream wise. The first stream with scheduled inputs is sent to the stage1 radix4 butterfly unit. The corresponding generated outputs from the butterfly unit are sent to the FIFO registers in different clocks.
The FIFO units are fed to the MDC radix4 switch box structure as shown in fig.3. Later the crisscrossed outputs from the switch box circuit of the stage1 are passed to the stage2 butterfly unit structure via second set of FIFO units [9].The last stage of the MDC structure has reconfigurable radix4/8 structure such that depending on the input value length if the value is powers of 2 then it can be used as 8 if the value is powers of 4 then it can be used as radix4 butterfly structure.

SIMULATION RESULTS AND MODIFICATIONS
The current MDC structure consists of switch box circuits and FIFO units. The architecture can also be implemented using shift registers and multiplexers instead of FIFO and switch box units [11],[12]. The block diagram of such implementation is shown in fig.5. The MDC architecture is
implementedusing ModelsimVerilog. The fig.6. shows the simulation results of 16point MDC structure with the input sequence x(n)={0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3} and the corresponding output.Similarly fig.7.shows the last stage of
MDC structure when radix_8 enable signal is high i.e.,1. This means that the circuit operates in the mode of powers of 2 instead of powers of 4. Example, for an 32 point the above selection has to be done to obtain the results.
Fig.5.Basicdelaycommuataor block diagram
Fig.6. Simulated results of 16point MDC architecture output
Fig.7. Simulated results of butterfly when radix_8 enable is 1

CONCLUSION
For better performance of an MIMOOFDM the above MDC structured based FFT processor design makes the utilization of the memory cells more efficiently and also the utilization of the hardware can be increased. This type of FFT processor can be used in major of the communication systems like LTE, wireless modems. The power and area of this kind of architecture is less when compared with previous work reports that are available from standard architectures.

REFERENCES

KaiJiunYang,ShangHo Tsai, Senior Member, IEEE, and Gene C.H.Chuang, Member,IEEEMDC FFT/IFFT Processor with variable length for MIMOOFDM systems, IEEE transactions on very large scale integration(VLSI)systems, Vol.21.N0.4,April 2013.

VeryHighBitRate Digital Subscriber Line Transceiver 2(VDSL2), ITUTStandard G.993.2, Feb. 2006.

The Wireless LAN Media Access Control (MAC) and Physical Layer(PHY) Specifications, IEEE Standard 802.11, 1999

B. G. Jo and M. H. Sunwoo, New continuousflow mixedradix (CFMR) FFT processor using novel inplace strategy, IEEE Trans.Circuits Syst. I, Reg. Papers, vol. 52, no. 5, pp. 911919, May 2005.

S. He and M. Torkelson, A new approach to pipeline FFT processor, in Proc. IEEE Int.Parallel Process.Symp., Apr. 1996, pp. 766770.

John G. Proakis and DimitrisG.Manolakis Digital Signal Processing Principles, Algorithms and Applications Third Edition.

Chien SungLi Design and Implementation of Fast Fourier Transforms Processor

YuWei Lin and ChenYi Lee, Member, IEEE Design of an FFT/IFFT Processor for MIMO OFDM Systems, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 54, NO. 4, APRIL 2007.

P. Y. Tsai and C. Y. Lin, A generalized conflictfree memory addressingscheme for continuousflow parallelprocessing FFT processors with rescheduling, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 12, pp. 22902302, Dec. 2011.

Y.W. Lin, H.Y.Liu, and C.Y. Lee, A 1GS/s FFT/IFFT processor for UWB applications, IEEE J. SolidState Circuits, vol. 40, no. 8, pp. 17261735, Aug. 2005.

E. E. Swartzlander, W. K. W. Young, and S. J. Joseph, A radix 4 delay commutator for fast Fourier transform processor implementation, IEEE J. SolidState Circuits, vol. 19, no. 5, pp. 702 709, Oct. 1984.

CheinSung Li and Prof. ShuennShyang Wang, Design and Implementation of VeriableLegth FFT Processors,Tatung University July 2006.