 Open Access
 Total Downloads : 638
 Authors : Dr. D. Bhattacharya, Anil G L
 Paper ID : IJERTV1IS6344
 Volume & Issue : Volume 01, Issue 06 (August 2012)
 Published (First Online): 30082012
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Lowpower and High Speed 128Point Pipeline FFT/IFFT Processor for OFDM Applications
Vol. 1 Issue 6, August – 2012
DR. D. BHATTACHARYA1, ANIL G L2

Professor, Department of ECE at Vel Tech Technical University, Chennai. India

PhD Scholar, Department of ECE at Vel Tech Technical University, Chennai, India
ABSTRACT
This paper represents low power and high speed 128point pipelined Fast Fourier Transform (FFT) and its inverse Fast Fourier Transform (IFFT) processor for OFDM. The Modified architecture also provides concept of ROM module and variable length support from 128~2048 point for FFT/IFFT for OFDM applications such as digital audio broadcasting (DAB), digital video broadcastingterrestrial (DVBT), asymmetric digital subscriber loop (ADSL) and veryhighspeed digital subscriber loop (VDSL). The 128point architecture consists of an optimized pipeline implementation based on Radix2 butterfly processor Element. To reduce power consumption and chip area, special currentmode SRAMs are adopted to replace shift registers in the delay lines. In lowpower operation, when the supply voltage is scaled down to 2.3 V, the processor consumes 176mW when it runs at 17.8 MHz.
KEYWORDS
Low power, FFT, IFFT, OFDM INTRODUCTION
The FFT (Fast Fourier Transform) and its inverse (IFFT) are the key components of OFDM (Orthogonal Frequency Division Multiplexing) systems. Recently, the demand for long length, highspeed and lowpower FFT has increased in the OFDM applications. There are three kinds of main design architectures for implementing a FFT processor. One is the singlememory architecture. It has one processing element and one main memory. Hence, it occupies a small area. The second is the dual memory architecture, which has two memories. This architecture has a higher throughput than the singlememory architecture because it can store butterfly outputs and read butterfly inputs at the same time. The fast Fourier transform plays an important role in many digital signal processing (DSP) systems. Recent advances in semiconductor processing technology have enabled the deployment of dedicated FFT processors in applications such as telecommunications, speech and image processing. Specifically, in the OFDM communication systems, FFT and inverse FFT (IFFT) play a very important role. The OFDM technique, due to its effectiveness in overcoming adverse channel effects [1, 2] as well as spectrum utilization, has become widely adopted in wire line and wireless communication standards.
The OFDM technique has been adopted in several standards like digital audio broadcasting (DAB) [3], digital video broadcasting terrestrial (DVBT) [4], asymmetrical digital subscriber line (ADSL) [5] and veryhighspeed digital subscriber line (VDSL) [6]. Therefore, efficient and lowpower VLSI implementation of FFT processors is essential for successful deployment of these OFDMbased systems. According to the standards of DAB, DVBT, ADSL and VDSL, various FFT sizes are required, as shown in Table 1. From this Table, it is clear that variablelength FFT hardware is a crucial module in the lowcost solution of the above communication systems.
The Cooley Tukey Npoint FFT algorithm requires O(Nlog N) computations, which is a huge saving over direct computation of the discrete Fourier transform (DFT). However, hardware implementation of the algorithm is both computational intensive, in terms of arithmetic operations, and communication intensive, in terms of data swapping. For realtime processing of FFT, O(log N) arithmetic operations are required per sample cycle. High speed realtime processing can be accomplished in two different ways. In the conventional generalpurpose digital signal processor (DSP) approach, the computation is carried out by a single processor driven to a high clock frequency, which is O(log N) times the data sample frequency. In the application specific parallel or pipelined processor approach, the required operations are performed at the clock frequency equivalent to the sample frequency, and this approach usually consumes less power.
In this paper, we aim to implement a lowpower variablelength FFT processor. To this end, we adopt several optimization techniques in the circuit design to accomplish an area and powerefficient pipelined FFT processor.
Pipelined FFT/IFFT processor Architecture Radix2 FFT/IFFT architecture
The radix2 multipath delay commutator [7] is a pipelined implementation of the radix2 FFT/IFFT algorithm. A radix2 multipath delay commutator architecture with N Â¼ 8 is shown in Fig. 1. The input sequence is divided into two parallel data streams by a commutator and then, with proper scheduling for two streams, butterfly operation in a processing element (PE)
and twiddle factor multiplication is executed. In total, (log2 N 2) multipliers, log2 N radix2 butterfly units, and (3N/2) 2 delay elements are
Communication system 
OFDM Size 
ADSL 
512 
VDSL 
8192,4096,2048,1024,512 
DAB 
2048,1024,512,256 
DVBT 
8192,2048 
Required. With a proper input buffering scheme, the processing element can work at 100% utilization.Radix2 singlepath delay feedback architecture (shown in Fig. 2) utilizes the delay elements more efficiently by sharing the same storage between the butterfly outputs and inputs [8]. A single data stream goes through the multiplier at every stage. This architecture has the same number of processing elements (PEs) and multipliers as needed in the radix2 multipath delay commutator architecture, albeit only N 1 delay elements. Note that the butterfly units and multipliers work at 50% utilization since half of the time they are bypassed.

Radix2=4=8 FFT/IFFT algorithm and architecture
The Npoint DFT is formulated as
Vol. 1 Issue 6, August – 2012
Fig. 1 Radix2 multipath delay commutator FFT/IFFT architecture and PE
N 1
z
x xnW nz , z 0,1, 2,………N 1
n 0
PE
nz Fig. 2 Radix2 singlepath delay feedback FFT architecture, and
Where W nz
j 2
e N The basic concept underlying the radix2
FFT/IFFT algorithm is the use of symmetry between the twiddle factors Wnz and Wnz+N/2 (Wnz = Wnz+N/2).
Exploiting twiddle factor symmetry further, the multiplication by the twiddle factors of WN/8, W3N/8, W5N/8 and W7N/8 can be further simplified since their real and imaginary parts have equal magnitude. The complex multiplications by these four twiddle factors can be formulated as:
Note that these complex multiplications can be realized by two real multiplications and two additions.
The signal flow graph (SFG) of the radix2=4=8 FFT/IFFT algorithm is shown in Fig. 3 [9]. Instead of one single butterfly, the radix2=4=8 algorithm implements the radix8 butterfly using three radix2 stages. Therefore its SFG is equivalent to that of the radix23 algorithm [10]. Note that by modifying the radix
(a jb)W N /8
(a jb)W 5 N /8
2
2 singlepath delay feedback FFT/IFFT architecture, a radix
2=4=8 architecture was proposed in [9]. There are three types of basic processing elements, called PE1, PE2 and PE3, and each processes one FFT stage. The architecture is made up of a
(a b)
2
j(b a)
repeated cascade of PE1, PE2, PE3 and a general complex multiplier for twiddlefactor multiplication. The numbe of delay elements needed decreases by half in every stage. The block diagrams of these three types of processing elements are
a jb W 3N /8
a jb W 7 N /8
illustrated in Fig. 4.
2 (b a)
2
j(a b)

Proposed variablelength FFT/IFFT processor architecture
At the architecture level, to reduce power consumption and chip area, it is desirable to adopt the FFT algorithm which has least computational complexity and the architecture that corresponds to less hardware complexity. The block diagram of the proposed variablelength FFT processor based on the radix2=4=8 single path delay feedback architecture is depicted in Fig. 5. The proposed processor can perform FFT operations of three
different lengths: 2048point, 1024point and 128point. To accommodate different numbers of FFT stages, the first two stages are radix2 PEs, which have the same structure as the PE3 unit in the radix2=4=8 architecture, and each of the following three blocks is made up of a set of PE1, PE2 and PE3 and a twiddlefactor multiplier. If 512point FFT is executed, then input signals skip the first two stages through the control of the multiplexer, MUX2. If a 128point FFT is performed, the first stage is bypassed through MUX1.

Architecture considerations

Comparison of FFT architectures
Over the years, various FFT architectures have been proposed with a view to providing speedy and efficient implementation of the allimportant FFT operation. In Table 2, we list some computational features the radix2=4=8 FFT architecture used in the proposed IC and
ISSN: 22780181
Vol. 1 Issue 6, August – 2012
Fig. 3 Signal flow graph of the radix2=4=8 FFT/IFFT algorithm
Fig. 4 Block diagrams of the PE units in the radix2=4=8 architecturePE1 PE2 PE3
Fig. 5 Block diagram of proposed variablelength FFT/IFFT processor
Several other recent architectures. In the Table, we compare their computational complexity and memory requirements. It is apparent that the number of nontrivial complex multiplications decreases as the radix gets higher.
In addition, in bitparallel operation, higherradix algorithms also have better hardware utilization in multipliers. As to the adders in butterfly units, if the higher radix butterfly operation is implemented by concatenating radix2
Table 2: Comparison of several FFT architectures
Bitparallel Digitserial
Radix
Data flow
Comple x adder utilizatio n
Comple x multipli er utilizatio n
Data memory
Twiddle factor ROM
radix 2=4=8
feedba ck
2 log2 N
50%
log8 N 7 1
87:5%
N 7 1 0.25N
radix4
feedba ck
4 log4 N
50%
log4 N 7 1
75%
N 7 1 N
radix 4
Feed forwa rd
12
log4 N
100%
3(log4 N 7
1)
100%
2.5N N
radix 4
Feed forwa rd
8(log4 N + 1)
100%
3(log4 N 7 –
1)
100%
1.18N
0.5N
Proposed chip He & Torkelson [10] Hui et al. [ Chang & Parhi [12]
butterfly units, such as in [10], then only 50% adder utilization can be achieved. Note that in the digitserial architectures [11, 12], the wordlength of the data in adders and multipliers is reduced to one digit and thus fewer full adders are required. On the other hand, to achieve almost 100% utilization in adders and multipliers, the wordlength of the signals in these two architectures must be restricted to match the throughputs of the radix4 commutator 4 digits in these cases. Nevertheless, the occupied area of one complex multiplier overwhelms the area of one complex adder. Thus, a great saving in the cost of silicon can be accomplished with fewer complex multipliers.
Feedback FFT architecture needs the least amount of data memory, in the size of N 1. On the other hand, feedforward architecture requires more memory elements, as in [11, 12]. Other memory blocks are the lookuptable ROMs that store twiddle factors. If the number of nontrivial complex multiplications is decreased, then there are fewer twiddlefactor ROMs. The twiddlefactor ROM for the first multiplier stores twiddle factors with a phase spacing of 2p=N. In the later stages, the phase spacing increases. If the symmetry of the sine=cosine function is further exploited, more saving in ROM size can be had. In the proposed chip, the twiddlefactor ROMs store only oneeighth cycle of the sine=cosine waveforms and we take advantage of the symmetry of all the twiddle factors instead of the redundancy within each group of Wn, W2n and W3n, for n Â¼ 0; 1; . . . ; N=4, in the radix4 algorithm as in [12], and consequently a smaller ROM table is built.
In summary, the radix2=4=8 algorithm can bring forth a variablelength FFT processor with the least overall hardware complexity. Although its adder and multiplier utilization is not as good as other architectures, we decide to adopt this architecture because it strikes a balance between hardware complexity and computational efficiency.

Complex multiplier against CORDIC
The CORDIC algorithm has been used for the twiddle factor multiplication in FFT processors due to its efficiency in vector rotation [13]. In this subSection, we evaluate and compare the performance and complexity of a CORDIC and a complex multiplier in phase rotation. In Table 3, the conventional CORDIC algorithm refers to the radix2 CORDIC, and radix 2=4 CORDIC refers to the work in [14] that enhances operation speed and reduces 25% of the microrotation stages. The complex multiplier used in the proposed chip consists of three multiplications and five additions [15]. To make a fair comparison, we set the precision to 16 bits in all algorithms. To avoid rounding error propagation [14, 16], 19 bits are allocated in the data path of the CORDICbased architectures.
In the conventional CORDIC algorithm, a ROM table that stores the rotation sequences with N=4 16bit words in the range of Â½0; p=2& is used. Two 19bit adders are required in each microrotation stage and the conventional CORDIC architecture needs 2 16 19 Â¼ 608 full adders for 16 microrotation stages. Additional constant multiplication by 0.100110110111010 as the scaling factor is performed in the final scaling stage and it needs 2 9 19 adders. Without pipelining, its critical path delay is 19 16 times the full adder delay (TFA) in the 16 microrotation stages plus 28 TFA in the scaling stage.
In [14], the ROM table is further reduced to N=8 words with 23 bits per word due to the higher radix adopted in the later stages. According to the authors, each stage is based on a similar cell with a 42 adder=subtracted using twolevel carry save adders (CSA) and redundant arithmetic representation intended to improve the performance. Two registers are used to buffer the intermediate sum and carry in each stage. Meanwhile two full adders are connected to perform the 42 compression. As a result, a total of 2 19 17 2 full adders are provided in the 17
ISSN: 22780181
stages including additional microrotationrepeVtoitli.o1nIssstuaeg6e, sAaugnudst2 2012
scaling stages. Because of pipelining in every stage, the critical
path delay is reduced to about 2TFA with a penalty of a large number of Ã°17 19 2 2Ãž pipeline registers. Actually, its CORDIC outputs are still in the form of redundant arithmetic representation and will be transformed back to the binary format after butterfly operation by carrylook ahead adders.
In the proposed chip, complex multiplication consists of five real additions and three real multiplications. The real addition is implemented by carryselected adders with a maximum delay of about 8TFA and each utilizes 30 full adders in the first 16bit addition and 63 full adders in the last 33bit addition. Because Wallace tree multipliers are adopted for the tree 16 17 multiplications, the critical path delay is reduced to 7TFA. One 16 17 Wallace tree multiplier needs about 280 full adders, and two pipeline stages are inserted before and after the multiplication.
We can see that the CORDIC algorithm may be too slow without pipelining. On the other hand, Wallace tree multiplication reduces the critical path delay of the complex multiplier approach. Considering all aspects of speed area tradeoff and that the application of the FFT processor is low power consumption rather than high speed, we use the complex multiplier for twiddle factor multiplication.

Circuit design
To serve as a key component in OFDM communication systems, the variablelength FFT processor must be designed to reduce its power consumption as well as chip

Wordlength minimization
In the design of this applicationspecific variablelength FFT processor, the word lengths of various signals are minimized according to their respective signaltonoise ratio (SNR) requirements. To decide the optimal word length, input waveforms with Gaussian noise are fed to the FFT with fixed point arithmetic implementation. The frequencydomain FFT output signals are obtained and the output signaltonoise ratio (SNR) is computed. Figure 6a shows the output SNR against the FFT input word length under different input SNR conditions. Accordingly, the word length of the input is set to 9 bits. As to the precision of the sine and cosine tables, the output SNR against the word length of the twiddle factors is shown in Fig. 6b when the input signal has an SNR of 30 dB. A word length of 9 bits is thus chosen for the twiddle factors. The wordlength minimization process then goes on module by module and the word lengths of all signals in the processor are determined, and are labeled in Fig. 5. Conventional address decoder since data to and from the SRAM is accessed sequentially. To further conserve power consumption, truesinglephaseclock (TSPC) flipflops are used in the ring counters.
4.3 Currentmode SRAM
The currentmode technique has been used in reading SRAM cell contents. It has been proposed that the currentmode technique can also be applied to the writing operation of SRAM so as to further reduce power consumption [18]. This is because voltage swings of the SRAM bit lines and data lines can be kept very small in the currentmode read=write operations and thus the dynamic power dissipation can be significantly decreased.
Fig. 6 Output SNR against word length of the FFT processor input and of twiddle factor
4.2 RAMbased delay line
A singlepath delay feedback FFT processor needs several long and wide delay lines. Conventionally, delay lines are mostly implemented in shift registers, made up of cascades of data registers, as shown in Fig. 7a. At each clock edge, all data move forward in a lockstep fashion and approximately half of the registers change states, wasting much power. To save power and chip area, SRAM has been utilised to replace the shift registers. Since the read and write operations must be performed in one clock cycle, intuitively a dualport memory is required. Two singleport SRAMs are adopted in [17], and the authors claimed that a singleport memory can save 33% in area over a dualport memory. Here we use one singleport SRAM as shown in Fig. 7b. The SRAM is designed manually. In the first half clock cycle, the read operation is performed while the write operation follows in the next half clock cycle. To prevent the data access of the SRAM becoming critical paths, two registers, one before the PE and the other after, are inserted. Furthermore, a ring counter is used instead of the
Fig. 7 Conventional shiftregisterbased delay line and proposed SRAMbased delay line
The currentmode SRAM cell used is basedVool.n1 tIhssautep6r, oApuogussetd 2012
in [18], and it consists of seven transistors, one more than the
conventional 6transistor SRAM cell, and it is depicted in Fig. 8a. An extra transistor, Meq, is inserted to equalize the output voltages of the two inverters before each write operation, and therefore a small current difference can be sensed through access transistors controlled by the
Fig. 8 Schematic diagrams of proposed 7T currentmode SRAM memory cell and of SRAM write circuitry using N type current conveyor
Wordline enable signal and amplified by the inverters. When Meq is off, the cell performs as the conventional 6T SRAM memory cell.
During write access, a current difference, DI, appears on the write data lines wdlp and wdln. The Ntype current conveyor (shown in Fig. 8b) is enabled by the signal WY. Then the currents are conveyed to the bit lines blp and bln without attenuation. Because the control signal WY is enabled, a virtual short circuit exists between the write data lines wdlp and wdln. Both the voltages at wdlp and wdln are equal to VDD Ã°V1 Ã¾ V2Ãž, which can be designed to approach the ground voltage. Thus the voltage swing on data lines can be kept as small as possible. The read operation in this SRAM is implemented by a sense amplifier, which has the same structure as the conventional SRAM, and a column decoder. As in conventional SRAM, a read access starts with the word line being enabled and the pair of bit lines driven by a differential current, which is then steered to the sense amplifier, where the data are sensed and buffered.
4.4 Complex multiplication and twiddlefactor ROM
p
In the proposed FFT processor, due to the radix2=4=8 algorithm, each complex multiplication of WN=8, W3N =8, W5N=8 and W 7N=8 is reduced to two real multiplications by the constant 2=2 as shown in (2) and (3), which can be further simplified to shift and add operations [9].


Experimental results
The whole chip, except for the SRAM modules, was designed by a gatelevel hardware description language. The critical path lies in the complex multiplier. The layout of the SRAM modules containing the ring counters, timing control units as well as the SRAM cells are all designed manually. This proposed FFT processor is fabricated using a 0:35 mm CMOS process. The chips die photo is shown in Fig. 10. The multipliers are marked as MUT with their corresponding twiddlefactor ROMs right beside, and the PEs for processing elements are labeled as Ux. Considering circuit overheads in SRAM, all delay lines longer than 64 are implemented by SRAM, while shorter ones are realized by registers. A brief summary of the chip is given in Table 3.There was an error made in some of the ROM values but, discounting that error, the rest of the chip can operate as designed. The FFT processor can operate up to 17.8 MHz and dissipates 176mW at 2.3V supply voltage and it can operate up to 45 MHz at 3.3V supply voltage when it consumes 640 mW Comparisons of the proposed chip with several FFT processors [9, 17, 20, 21], including FFT size, algorithm, process, supply voltage, power consumption, clock rate, execution time and area. Because these FFT processors are fabricated in different CMOS technologies and the FFT sizes are also different, it is not easy to make a fair comparison. We adopted three indices to make comparisons and adjusted the numbers by estimation assuming all processors perform a 1024point FFT. We use the normalized area, a metric in [21], and it is given by
Vol. 1 Issue 6, August – 2012
Normalized area =Area of 1024 – point
FFT/Technology/(0:35m)2.
FFT/Energy =Technology/Power of 1024 – point FFT * Execution Time *106
Another metric considering both energy efficiency and speed performance is the energytime product, and it is given by
Energy * Time =Execution Time/FFT/Energy
We can see from the Table that the proposed chip has the smallest normalized area and the smallest energytime product. Although the FFT processor in [21] has the best energy efficiency when operating at 1.1 V, its slow execution speed at that low voltage prevents it from highspeed applictions in Table 1.
TSMC 0.35 1P4M
3:9mm_ 5:5mm 598 078
45MHz at 3.3V
640mW (at 45 MHz, 3.3 V)
176mW (at 17.8 MHz, 2.3 V)
68 PGA
Table 3 chip summary
Process Area
Transistor count Maximum frequency
Power consumption (at highest speed)
Power consumption (at lowest voltage)
Package
Fig. 9 Block diagram of twiddlefactor ROM
Fig. 10 Die photograph of proposed FFT processor

Conclusions
In this paper, we have reported the design of an FFT/IFFT processor chip that is suitable for OFDM communication systems, such as DAB, DVBT, ADSL and VDSL, for performing complex FFTs/IFFT of lengths 128=1024=2048. The proposed variablelength FFT processor not only achieves efficient hardware utilization but also low power consumption. Its a dualpath delay feedback FFT/IFFT architecture requires fewer delay elements and the radix2=4=8 FFT algorithm replaces some complex multipliers with shift andadd operations. In addition, some other circuit techniques have been applied for saving complexity as well as power consumption. The chip was implemented using a 0:35 mm CMOS process. The measured results show that the chip can operate up to 45MHz under a 3.3V supply voltage and it consumes 640 mW. When the supply voltage is scaled down to 2.3 V, this processor consumes only 176mW when it runs at 17.8 MHz

References

Bingham, J.A.C.: Multicarrier modulation for data transmission: an idea whose time has come, IEEE Commun. Mag., 1990, 28, (7),pp. 514

Cimini, L.J.: Analysis and simulation of a digital mobile channel using orthogonal frequency division multiplexing, IEEE Trans. Commun.,1985, 33, (7), pp. 665675

ETSI EN 300 401 (v1.3.2): Radio broadcasting systems; digital audio broadcasting (DAB) to mobile, portable and fixed receivers, Sep. 2000

ETSI EN 300 744 (v1.2.1): Digital video broadcasting (DVB); framing structure, channel coding and modulation for digital terrestrial television, Jul. 1999

T1E1.4/98007R4: Standards project for interfaces relating to carrier to customer connection of asymmetrical digital subscriber line (ADSL) equipment, Jun. 1998

ETSI TS 101 2702 (V1.1.1): Transmission and multiplexing (TM);access transmission systems on metallic access cables; very high speed digital subscriber line (VDSL); Part 2: Transceiver specification, Feb. 2001

Rabiner, L.R., and Gold, B.: Theory and application of digital signal processing (PrenticeHall, Inc., NJ, 1975)

Groginsky, H.L., and Works, G.A.: A pipeline fast Fourier transform,IEEE Trans. Comput., 1970, 19, (11), pp. 10151019 9 Jia, L., Gao, Y., Isoaho, J., and Tenhunen, H.: A new VLSI oriented FFT algorithm and implementation. Proc. IEEE ASIC Conf., 1998,pp. 337341

He, S., and Torkelson, M.: Designing pipeline FFT processor for OFDM (de)modulation. Proc. IEEE URSI Int. Symp. Signals, Systems and Electronics, 1998, pp. 257262

Hui, C.C.W., Ding, T.J., and McCanny, J.V.: A 64point Fourier transform chip for video motion compensation using phase correlation,IEEE J. SolidState Circuits, 1996, 31, pp. 17511761

Chang, Y.N., and Parhi, K.K.: An efficient pipelined FFT architecture,IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process, 2003,50, (6), pp. 322325

Hu, Y.H.: CORDIC based VLSI architecture for digital signal processing, IEEE Signal Process. Mag., 1992, (4), pp. 1635

Sarmiento, R., Tobajas, F., de Armas, V., EsperChain, R., Lopez, J.F., MontielNelson, J.A., and Nunez, A.: A CORDIC processor for FFT computation and its implementation using gallium arsenide technology, IEEE Trans. VLSI Syst., 1998, 6, (1), pp. 1830

Wenzler, A., and Luder, E.: New structures for complex multipliers and their noise analysis. Proc. IEEE Int. Symp. on Circuits and Systems, May 1995, Vol. 2, pp. 14321435

Hu, Y.H.: The quantization effects of the CORDIC algorithm, IEEE Trans. Signal Process., 1992, 40, (4), pp. 834 844

Li, W., and Wanhammer, L.: A pipeline FFT processor. Proc. Workshop Signal Processing Systems Design and Implementation,1999, pp. 654662

Wang, J.S., Tseng, W., and Li, H.Y.: Lowpower embedded SRAM with the currentmode write technique, IEEE
J. SolidState Circuits,2000, 35, (1), pp. 119124

Tan, L.K., and Samueli, H.: A 200MHz quadrature digital synthesizer/mixer in 0.8mm CMOS, IEEE J. SolidState Circuits, 1995, 30,(3), pp. 19320020 Bidet, E., Castelain, D., Joanblanq, C., and Senn, P.: A fast singlechip implementation of 8192 complex point FFT, IEEE J. SolidState Circuits, 1995, 30, (3), pp. 300305
21 Baas, B.M.: A lowpower, highperformance, 1024point FFT processor, IEEE J. SolidState Circuits, 1999, 34, (3), pp. 380387
Dr. D. BhattachaVroyla. 1 Ifsisnuies6h,eAduguhsits 2012
Master Degree in 2002 from Calcutta
University in the field of Electronics & Communication Engineering. He obtained his PhD from Lancaster University, UK as an International Student in 2007 from Department of Communication System. Presently, he is working as Professor in the Department of ECE at Vel Tech Technical University, Chennai. He has more than 5 years of experience in the field of Engineering Education and 5 years of experiences in Research. He worked almost 5 renouned universities through out Europe. Currently he is associated with Telecom Centre of Excellence (TCOE) in Vel Tech University as Head for continuing his research activities.