Twiddle Factor Angle Generation using Modified Cordic for DSP Applications

DOI : 10.17577/IJERTCONV3IS16166

Download Full-Text PDF Cite this Publication

Text Only Version

Twiddle Factor Angle Generation using Modified Cordic for DSP Applications

P. Rengaprabhu

Associate Professor

Dept. of ECE Oxford Engg. College Trichy, TN, India

N. Sivarajan

Student (PG)

Dept. of ECE Oxford Engg. College Trichy, TN, India

B. Keerthana

Assistant Professor Dept. of ECE ACCET Karaikudi, TN, India

G. Seetharaman

Professor

Dept. of ECE Oxford Engg. College Trichy, TN, India

Abstract In this paper an area efficient approach is presented to design and implement a high speed Fast Fourier Transform for DSP applications. The design has been implemented using Modified Coordinate Rotation Digital Computer (CORDIC) algorithm which uses two elementary angles and ROM to store number of microrotation.The proposed design is synthesized and simulated with Quartus II software and implemented on FPGA. Using proposed CORDIC algorithm, enhanced performance is achieved in terms of speed, area and dynamic power constraints.

Key Words: CORDIC, DSP application, FPGA, FFT, Micro rotation, Twiddle factor

  1. INTRODUCTION

    CORDIC (COordinate Rotation Digital Computer) which was introduced in 1959 by Jack E. Volder. It is a Hardware Efficient iterative Algorithm for Circular Rotation and to compute trigonometric functions, hyperbolic functions, multiplications, divisions and data type conversions. Compared to other approaches, CORDIC is utilized where hardware multiplier is unavailable. (eg.microcontroller) where the Delay/Hardware cost is comparable to division or square rooting. Two basic CORDIC modes are: the rotation mode and the vectoring mode. For both modes the CORDIC algorithm can be realized as an iterative sequence of additions/subtractions and shift operations, which are rotations by a fixed rotation angle but with variable rotation direction [1]. Due to this simplicity, this algorithm is well suited for VLSI implementation. In this paper modified CORDIC algorithm is used to generate angle for twiddle factor on FFT computation. For FFT applications is known in advance will be used again and again. In these situations, the angle updating equation is evaluated in advance (off line), and the corresponding set of micro rotation is stored; instead of, in the memory. A particular advantage is that there will be no need to implement the angle updating formula, resulting in a cost saving of hardware. This paper is organized as follows: Section II gives the basics of CORDIC algorithm. Section III

    gives FFT computation and prior work. Section IV explains about the modified CORDIC based algorithm. Section V explains about scaling free CORDIC. Section VI shows the performance of proposed technique and the last section VII concludes the paper followed by references.

  2. CORDIC ALGORITHM

    The basic CORDIC algorithm can be explained as follows:

    Fig.1 CORDIC rotation

    As shown in Fig. 1, the rotation of a vector P0=[x0 y0] through an angle , to obtain a rotated vector Pn= [xn yn] could be performed by the matrix product Pn=RP0, where R is the rotation matrix [3].

    Each iteration, the matrix product is expressed as and the new coordinate is calculated by:

    — (1)

    The rotation matrix will be:

    — (12)

    Equation 1 can be rewritten as

    It allows only iterative rotations so that

    Then equation 3 and 4 becomes

    — (2)

    — (3)

    — (4)

    — (5)

    — (6)

    We can compute A offline for all n iterations. A approaches 1.647, if n goes to infinity. The corresponding architecture for this algorithm is shown below:

    The key ideas used in CORDIC to achieve simplicity are: (i) decomposition of the rotations into a sequence of elementary rotations through predefined angles and (ii) avoidance of scaling. The purpose of scaling [2] is to make the final coordinate has the same m-norm as the initial coordinate after rotation. A main research issue is to reduce the computation overhead due to the scaling operation.

    Depending on the direction of rotation, di =±1, which is determined by

    — (7)

    Here arctan (2-i) values are pre computed (a0, a1, …) and it can be scaled to binary number range, e.g. 2=256.If Zi <0, di

    =-1.otherwise di =+1; to evaluate Zi, simply use the Zi sign bit (MSB).It will be iteratively repeated for n times to get [xnyn].

    By factoring out the cosine term in Eq.2, the rotation matrix R can be rewritten as

    — (8)

    And can be interpreted as a product of a scale-factor K=[(1+tan2 )-1/2] with a pseudo rotation matrix Rc,given by

    — (9)

    The pseudo rotation operation rotates the vector P0 by an angle and changes its magnitude by a factor K= cos , which is equivalent to cos (tan-1(2-i)) to produce a pseudo-

    rotated vector Pn=RcP0.

    The cosine is symmetric:

    — (10)

    Then

    — (11)

    We can compute K offline for all n iterations. It approaches 0.6037, if n goes to infinity. In order to compensate the gain, we have to scale the result with the reciprocal value of the gain:

    Fig.2 Bit parallel Iterative CORDIC Hardware

  3. FAST FOURIER TRANSFORM

    An FFT computation is implemented with a complex multiplier but due to demand of higher point FFTs, the size of ROM in this implementation for the twiddle factors becomes the major concern with larger chip area [4, 5]. The aim of this paper is to design an FFT with modified CORDIC algorithms in order to replace complex multipliers through reduced ROM size.

    The Fourier transform for N-point is expressed by

    — (13)

    Where

    is the twiddle factor. As shown in Eq. 13, the key operation of FFT is Rotation of input x(n) by an angle corresponds to twiddle factor to get an output X(K) .It can be related to the CORDIC algorithm without any complex multiplications where old coordinate is rotated by elementary angle corresponds to target angle to get new coordinate. An FFT processor stores the twiddle factors in memory. CORDIC-based FFT doesnt have twiddle factors but stores the rotation angle in a memory bank. For radix-2, N Point, m- bit FFT, mN/2 bits memory needed to store N/2 angles [6]. But the modified CORDIC FFT design needs to store only number of shifts corresponding to micro rotation which is presented in next section where CORDIC uses only two elementary angles [7].

  4. PROPOSED CORDIC ALGORITHM

    In FFT, cosine and sine of angles have to be found which lies in the range of 0o to 180. The conventional CORDIC algorithms are used to handle rotations only in the range of angles [-99°, 99°]. Moreover, they are serial in nature and require a ROM to store the lookup table and hardware expensive barrel shifters. So we go for a reduced hardware CORDIC. Our objective is to reduce the complexity which is expected to result in considerable savings in FPGA resources (in comparison to conventional CORDIC). This is designed for rotation mode. The key idea in this is representing all the angles in the range of [0o, 180°] using combinations of two elementary angles 450 and 7.125o.

    In CORDIC computation, the basic idea is to decompose the desired angle into sum of predefined elementary angles such that the rotation can be accomplished with simple shift- and-add operations. For twiddle factor, predefined elementary angles are considered as 45 and 7.125[8].Then the number of shifts which are stored in ROM are the rotation values corresponding to angles 22.5o, 45o, 67.5o, 90o, 112.5o,135o, 157.5o, 180o. The set of predefined angles for target angle is represented by m0,m1,m2,m3,m4,m5 if six micro rotations are used where m0,m1,m2,m3,m4,m5 are ntegers which depend

    Scaling

    Fig.3.Optimized rotation with micro rotation

    on the value of target angle [1]. Instead of storing rotation angles we only need to store values of number of shifts. Then the corresponding rotations provide architecture for 16 point FFT butterfly with merely shifters and adder.

    The following table shows micro rotation values corresponding to the above given angles for FFT.

    Other simplifications performed by the Volders algorithm

    [2] is the removal of scaling from the iterative micro rotations leads to a pseudo-rotated vector instead of the desired rotated vector .Since the scale-factor does not depend on the direction of micro rotations, the final scale-factor converges to some value depending on number of micro rotation. Therefore, instead of scaling during each micro rotation, the magnitude of final output could be scaled by K.

    Table.1.Optimized rotation with six micro rotations

    Where n is the number of micro rotation.

    — (15)

    angle

    m0, c0

    m1

    ,c1

    m2

    ,c2

    m3,c3

    m4,c4

    m5

    ,c5

    22.5

    3,1

    3,1

    3,1

    45

    0,1

    67.5

    0,1

    3,1

    3,1

    3,1

    90

    0,1

    0,1

    112.5

    0,1

    0,1

    3,1

    3,1

    3,1

    135

    0,1

    0,1

    0,1

    157.5

    0,1

    0,1

    0,1

    3,1

    3,1

    3,1

    angle

    m0, c0

    m1

    ,c1

    m2

    ,c2

    m3,c3

    m4,c4

    m5

    ,c5

    22.5

    3,1

    3,1

    3,1

    45

    0,1

    67.5

    0,1

    3,1

    3,1

    3,1

    90

    0,1

    0,1

    112.5

    0,1

    0,1

    3,1

    3,1

    3,1

    135

    0,1

    0,1

    0,1

    157.5

    0,1

    0,1

    0,1

    3,1

    3,1

    3,1

  5. SCALING FREE CORDIC

    The details of the scaling free CORDIC algorithm are provided in the reference [9, 10]. The working equation of the scaling free CORDIC is given as

    –(16)

    Each of the elementary rotational stages of the scaling free

    C

    The architecture for modified CORDIC [1] is as shown below in the figure 3.

    ORDIC costs two adders and two shifters more compared to that of the conventional CORDIC. But it is performed in an alternate cycle to increase the speed. Furthermore, since this formulation completely eliminates the requirement of scale factor compensation circuit, the overall hardware complexity of the scaling free CORDIC is less than the conventional one.

    clk

    clock

    4' hF — address[3..0]

    arctan:q10

    q[23..0]

    clk

    dfp1:e3

    8' hFF —

    OPER A[31..0]

    B[31..0]

    asb1:q9

    car RES[31..0]

    d clk

    reset

    dfp2:e4

    q

    OPER A[7..0]

    B[7..0]

    asb:q8

    RES[7..0]

    x1[7..0]

    s z0[7..0]

    x0[7..0]

    y0[7..0]

    i[7..0]

    rst

    16' h0000–

    8' h00–

    s

    a[31..0]

    b[31..0]

    s a[7..0]

    b[7..0]

    mux21:q13

    mux2:q11

    o[31..0]

    o[7..0]

    reset d[31..0]

    clk reset d[7..0]

    dfp:e1

    q[31..0]

    q[7..0]

    rst data_in[7..0]

    i[7..0]

    shifter:q4

    data_out[7..0]

    OPER A[7..0]

    B[7..0]

    asb:q6

    RES[7..0]

    s a[7..0]

    b[7..0]

    mux2:q12

    o[7..0]

    clk reset d[7..0]

    dfp:e2

    q[7..0]

    rst data_in[7..0]

    i[7..0]

    shifter:q5

    data_out[7..0]

    z1[7..0]

    y1[7..0]

    Fig.4.Scaling free CORDIC Fig.6.RTL View for conventional CORDIC

  6. SYNTHESIS AND SIMULATION RESULTS Figure 5, 6 and 7 shows the RTL view for the proposed

    clk rst

    clk reset

    tff1:e3

    q[7..0]

    1' h0 —

    oper[2..1]

    IO_BUF (TRI)

    dfp:q3

    tristate_buffer1:d1

    oper

    asb:q8

    mux2:q2

    clk

    dfp:q7

    enable

    tristate_buffer2:d3

    output_x[7..0]

    oper

    asb:q6

    architecture, conventional CORDIC and scaling free CORDIC

    s x0[7..0]

    s a[7..0]

    mux2:q1

    o[7..0]

    clk

    reset d[7..0]

    q[7..0]

    enable

    input_x[7..0]

    output_x[7..0]

    A[7..0]

    RES[7..0]

    s

    a[7..0]

    b[7..0]

    o[7..0]

    reset

    d[7..0]

    q[7..0]

    input_x[7..0]

    shifter:q5

    A[7..0]

    RES[7..0]

    x2[7..0]

    respectively. Figure 8 shows the simulation result of proposed

    architecture followed by scaling free CORDIC simulation

    b[7..0]

    enable

    tristate_buffer1:d2

    output_x[7..0]

    tristate_buffer2:d4

    enable

    output_x[7..0]

    input_x[7..0]

    B[7..0]

    4'NC —

    rst i[7..0]

    data_in[7..0]

    data_out[7..0]

    B[7..0]

    result.

    Table.2.Result Analysis

    1'NC —

    scalingfree:z1

    clock

    q[3..0]

    input_x[7..0]

    4'NC —

    rst i[7..0]

    shifter:q4

    data_out[7..0]

    addr[4..0]

    address[5..0]

    data_in[7..0]

    Design

    Area

    Speed

    Conventional CORDIC

    192 (LEs)

    170.24MHz

    Scaling free CORDIC

    99 (LEs)

    174.03MHz

    Modified CORDIC (Proposed)

    82 (LEs)

    185.94MHz

    Design

    Area

    Speed

    Conventional CORDIC

    192 (LEs)

    170.24MHz

    Scaling free CORDIC

    99 (LEs)

    174.03MHz

    Modified CORDIC (Proposed)

    82 (LEs)

    185.94MHz

    y2[7..0]

    y0[7..0]

    Fig.7.RTL View for scaling free CORDIC

    s x0[7..0]

    s a[7..0]

    b[7..0]

    mux2:q1

    micro:z1

    o[7..0]

    clk reset d[7..0]

    dfp:q3

    q[7..0]

    4'NC —

    rst

    shifter:q5

    A[7..0]

    B[7..0]

    asb:q6

    RES[7..0]

    x2[7..0]

    clk addr[4..0]

    clock address[4..0]

    q[2..0]

    4'NC —

    rst

    shifter:q4

    A[7..0]

    asb:q8

    RES[7..0]

    s a[7..0]

    b[7..0]

    mux2:q2

    o[7..0]

    clk reset d[7..0]

    dfp:q7

    q[7..0]

    1' p– i[7..0]

    data_in[7..0]

    data_ou[7..0]

    1' p– i[7..0]

    data_in[7..0]

    data_out[7..0]

    B[7..0]

    y2[7..0]

    y0[7..0]

    rst

    Fig.5.RTL View for proposed architecture

    Fig.8 Simulation waveform for proposed architecture

    Fig.9 Simulation waveform for scaling free CORDIC

  7. CONCLUSION

In this paper, a modified CORDIC algorithm based design of twiddle factor angle generation for Fast Fourier Transform in DSP is presented. From the table 2, it is found that the Modified CORDIC algorithm based design shows better results with 2.34% lesser area and 1.09 times faster in FFT calculation than conventional one. Although it is used only for fixed angle, it is suitable for scaling free CORDIC by approximating sin i=2-i and cos i=1-2-i. The proposed design operates at a maximum frequency of 185.94 MHz for micro rotation along with efficient speed and area utilization than the conventional CORDIC which requires separate scaling architecture to provide cost effective solution. With reduced ROM size, dynamic power dissipation is reduced without delay penalties.

REFERENCES

  1. Pramod Kumar Meher, SeniorMember, IEEE, and Sang Yoon Park, Member,CORDIC Designs for fixed angle of rotation IEEE transactions on very large scale integration (VLSI) systems, vol. 21, no. 2, February 2013.

  2. J.E.Volder, The CORDIC trigonometric computing technique, IRE Transactions on Electronic Computers, vol. EC-8, pp. 330334, Sep.1959.

  3. P. K. Meher, J. Valls, T.-B. Juang, K. Sridharan, and K. Maharatna, 50 Years of CORDIC: Algorithms, Architectures, and Applications in IEEE transactions on circuits and systemsI: regular papers, vol. 56, no. 9, September 2009.

  4. J. S. Walther, A unified algorithm for elementary functions, in Proceedings 38th Spring Joint Computer Conference, Atlantic City, New Jersey, 1971, pp. 379385.

  5. Y. H. Hu, CORDIC-based VLSI architectures for digital signal processing, IEEE Signal Processing Magazine, vol. 9, no. 3, pp. 16 35,Jul. 1992.

  6. K. Maharatna, S. Banerjee, E. Grass, M. Krstic, and A. Troya, Modified virtually scaling free adaptive CORDIC rotator algorithm and architecture, IEEE Trans. Circuits Syst. for Video Technol., vol. 15, no.11, pp. 14631474, Nov. 2005.

  7. S.Karthick, P.Priya, S.Valarmathy, CORDIC based fft for signal processing System , International journal of advanced research in electrical, electronics and instrumentation engineering vol. 1, issue 6, December 2012.

  8. A.A.Mushina, T.Y.Siby, B.D.Parameshachari, V.R.Athira, Reduced hardware cordic based 16 point fft processor , International journal of advanced research in computer science, volume 4, no. 10, September- October 2013.

  9. E. Grass, B. Sarker and K. Maharatna, A Dual Mode Synchronous/Asynchronous CORDIC Processor, Proc.8th IEEE International Symposium on Asynchronous Circuits and Systems, pp. 76 83, Manchester, U. K., April 2002.

  10. K Maharatna, A. S. Dhars and Swapna Banerjee, A VLSI Array Architecture for Realization of DFT, DHT, DCT and DST, J. Signal Processing, vol. 81, pp. 1813 1822, 2001.

Leave a Reply