FPGA Implementation Of DFT By Systolic Arrays

DOI : 10.17577/IJERTV2IS4827

Download Full-Text PDF Cite this Publication

Text Only Version

FPGA Implementation Of DFT By Systolic Arrays

FPGA Implementation Of DFT By Systolic Arrays

Ramanjaneyulu A, Asst. Prof in Dept. of ECE, VITAE, Deshmuki-508284 Pavan Kumar K, Adhoc Lecturer in Dept. of ECE, JNTU, Hyderabad-500085

Field Programmable Gate Arrays (FPGAs) provide an efficient platform for physical hardware realization of digital signal processing (DSP) circuits. Discrete Fourier Transform (DFT) is a most frequently used block in many communication systems. In this work a recently developed high performance systolic DFT structure is realized in FPGA hardware. Virtex II device (XC2V8000) is used for physical implementation of this structure, which results in utilization of 26% slices. The ISE simulation results of circuit behavior is compared with the MATLAB based simulation outputs for verification of the DFT performance.

  1. Field Programmable Gate Arrays (FPGAs) with their concurrent processing capability are suitable for the implementation of many signal and image processing algorithms for diversified applications. The baseband processing in almost all modern communication system is being carried out in FPGAs. In communication systems, digital signal processing and image processing applications the discrete Fourier transform (DFT) is widely used functional block [1]. In order to convert time domain into frequency domain DFT is used in transmitter side and IDFT is used in receiver side of communication systems. For fast computation and parallel implementation of DFT, several algorithms have been proposed in literature [2], [3]. To achieve high computing speed and low computation complexity cyclic convolution and circular correlation are used. To maintain regularity in processing elements and to implement with less time systolic arrays are used [5].The N length DFT is converted into (N-1/2) point circular convolution. The systolic structures have same processing elements with control tag bits, based on which the addition or Subtraction operation is decided. In [8] a reduced complexity algorithm has been reported for

    This realisation in FPGA paves the way of integration of DFT with other baseband processing in FPGA for communication systems.

    In section 2, the details of algorithm to be implemented in Very High Speed Integrated Circuit Hardware Description Language (HDL) language is presented. Section 3, provides the FPGA implementation of DFT. In section 4, the simulation and synthesis results are discussed. The findings of this study are concluded in section 5.

  2. The objective of this work is to realize the DFT algorithm systolic array proposed in [8] in FPGA and observe the performance. The major governing equations of DFT algorithm are as follows.

    The DFT of a sequence y (n) for n=0, 1, 2, 3. . . N-1 is given as,

    X (k) A(k) jB(k), for 0 k N-1 (1a)

    N 1

    A(k) y(n) cos[4N kn] (2a)


    N 1

    B(k) y(n) sin[4N kn] (2b)


    for N / 2N .When N is even, we can reduce into sum of {N/2+1} terms as

    N /2

    A(k) a(n) cos[4N kn] (2c)


    computation of DFT in which a systolic architecture is also presented. In this work the above systolic array for DFT is implemented and the performance is verified.

    N 1

    B(k) b(n) sin[4N


    kn], for 0 k N-1 (2d)


    a(n) y(n) y(N n)

    b(n) y(n) y(N n),

    for 1 n (N/2)-1



    A(k) A1 (k) A2 (k)



    a(n) y(n) y(N n)

    b(n) y(n) y(N n),

    for 1 n (N/2)-1



    A(k) A1 (k) A2 (k)


    A(N / 2 k) A1 (k) A2 (k), for 0 k N/4 (7a) A(N / 2 k) A((N / 2) k), for 1 k N/4 (7b) A(N k) A(k), for 1 k N/4-1 (7c)

    B(k) B1 (k) B2 (k), for 0 k N 1 (4b)

    When (N/2) is even then N=4M, where M is an integer



    A1 (k) a1 (n) cos[2M kn] (5a)


    B(N / 2 k) B1 (k) B2 (k), for 0 k N/4 (7d)

    B(N / 2 k) B((N / 2) k), for 1 k N/4-1 (7e)

    B(N k) B(k), for 1 k N/4-1 (7f)

    1. A1 (2k) a1 (0) a1 (M ) S(k) (8a)

      M 1

      A2 (k) a2 (n) cos[M


      M 1

      k(2n 1)] (5b)

      A1 (2k 1) a1 (0) a1 (M ) D(k) (8b)

      for 1 k (M-1)/2

      B1 (k) b1 (n) sin[2M kn] (5c)


      M 1

      S ( )

      ( M 1)/ 2


      {signS ( , n)}s(n) cos[M ( , n)] (9a)

      B2 (k) b2 (n) sin[M


      k(2n 1)] (5d)

      D( )

      ( M 1)/ 2


      {signD ( , n)}d (n) cos[M ( , n)] (9b)

      for 1 k M-1

      The equations (5b), (5d) are discrete cosine and discrete sine transforms. The above equations are analysed by

      s(n) a1 (n) a1 (M n) (9c)

      round(2 n/M)

      round(2 n/M)

      d (n) a1 (n) a1 (M n) (9d)

      the methods followed in [6-7].

      signS ( , n) (1)




      A1 (0) a1 (n) (5e)


      ( , n) (1)round((2 1) n/M)





      A2 (0) a2 (n) (5f)

      M ( , n)

      [2 n)M / M , if (2 n)M M / 2



      M [((2 n) ] / M , if (2 n) M / 2

      B1 (0) B2 (0)=0

      a1 (n) a(2n)


      M M

      M ( , n)

      a (n)=a(2n+1)

      [((2 1)n)M / M , if ((2 1)n)M M / 2


      b1 (n)=b(2n)

      b2 (n)=b(2n+1), for 1 n (N/4)-1 (6a)

      a1 (0) y(0)

      a2 (0)=y(1)+y(N-1)

      b1 (0)=0

      b2 (0)=y(1)-y(N-1). (6b)

      The N components of A (k) and B (k) can be computed into {(N/4) +1} components

      M [((2 1)n)M ] / M , if (2 -1)n)M M / 2

      for 1 , (M-1)/2. (9f)

      When M is prime, the sequence S () and D () for 1, (M-1)/2 in (9a) and (9b), can be converted into two (M-1)/2 point circular convolution.

    2. Cooley Turkey algorithm is used for complex twiddle factor multiplication. If the complex twiddle factor is =C + S. Then the computed terms taken as C, C+S, C-S. For n bit complex multiplication the

      twiddle factor value is multiplied by 2n-1. The value is finally divided by 2n-1.

    3. The equations A1 (k), B1 (k), A2(k) and B2(k) converted into circular form. The conversion of A1 (k) into required circular convolution form for M=7 is shown for (9a) and (9b) . Similarly B1 (k) is converted into circular convolution form. The equations (5b) and (5d) are converted into circular convolution form by the methods followed in [6-7].

      S(1) cos(2 ) -cos(3 ) -cos( ) s(1)

      S(2) cos(3 ) -cos( ) cos(2 ) s(2)


      cos( ) cos(2 ) -cos(3 ) s(3)

      The dependence graph consists of 9 processing elements, arranged in 3 rows and columns. If tag =1 then addition operation is performed otherwise subtraction is performed. Each processing element has two multipliers and two adder/subtractor. One



      cos( ) cos(2 ) cos(3 ) d (1)

      cos(3 ) -cos( ) -cos(2 ) d (2)

      multiplier and one adder/subtractor are used to caculate real values and the other for imaginary.

      D(3) cos(2 ) -cos(3 ) cos( ) d (3)

      where = /7 and

      s(1) = a1(1) + a1(6)

      s(2) = a1(2) + a1(5)

      s(3) = a1(3) + a1(4)

      d(1) = a1(1) – a1(6)

      d(2) = a1(2) – a1(5)

      d(3) = a1(3) – a1(6)

    4. Every processing element in systolic structure is identical to each other. Based on tag bit value the addition or subtraction is decided. In systolic arrays the pipelining of the data is observed.

      If tag = 1 Xout = Xin + Yin. Zin Else Xout = Xin – Yin. Zin Yout = Yin, Zout = Zin

  3. A field-programmable gate array (FPGA) is a programmable integrated circuit designed to be configured by the customer or designer after manufacturing. The logic realization in FPGA is performed by describing the functionalities through

    VHDL. In this work DFT is basically divided into 8 sub modules. According to the equations all the modules are designed and are coded in VHDL language. For enabling proper testing in the simulation environment the structure as shown in fig.4 is implemented in FPGA. The data which is coming serially is converted into parallel data. The parallel data is given to respective DFT block and the DFT calculations are performed. The computed parallel DFT parallel data are converted to serial by the parallel to serial converter. Based on the tag bit in systolic structure the addition or the subtraction is used in each processing element. Ripple carry adders are used for less area usage. The targeted device is Virtex II xc2v8000. The design, synthesis and simulation is carried out by ISE simulator.

  4. The input real and imaginary values are taken as x_r and x_i. For every positive edge of clock the input is taken serially. After 28 clocks the input values are taken DFT block and processing is done. The output is serially taken as y_r and y_i for real and imaginary values respectively. In fig 6 the behavioural simulation results are depicted.

    Resources Used % utilization

    No. of Slice Flip Flops



    No. of Slices



    No. of 18*18 Multipliers

    No. of 4 input LUTs

    168 100

    23181 24

    No. of GCLKs 1 6

    Fig. 3. Complex multiplication result

    If (X +j Y) (C +j S) = R +j I then in the multiplication result R and I are expressed as R=Y(C – S) + (X – Y) C and I=X(C + S) (X – Y) C. For complex multiplication the values of C, C+S ,C-S are calculated and stored in memory. The behavioural simulation of twiddle factor multiplication is reported in fig.5.

    Fig. 4. Functional Simulation of 28 point DFT

This work presents FPGA implementation of an efficient systolic DFT structure. To minimize the area ripple carry adders and ripple carry subtractions are used in the design. 28-point DFT is considered for physical implementation in the Virtex II FPGA. The functional simulation results of ISE simulator and MATLAB environment are in well agreement with each other. The systolic DFT processor is of high throughput and low latency owing to its systolic structure.

  1. J. G. Proakis and D. G. Manolakis, Digital Signal Processing: Principles, Algorithms, and Applications. Upper Saddle River, NJ: Prentice- Hall, 1996.

  2. P. Duhamel and M. Vetterli, Fast Fourier transforms: A tutorial review and a state of the art, Signal Process., vol. 19, pp. 259299, 1990.

  3. R.N. Bracewell, The Fourier Transform and its Applications, McGraw-Hill, Third Edition, 2000.

  4. Fundamentals of Digital Image Processing. Englewood Cliffs,NJ: Prentice-Hall, 1989.

  5. H. T. Kung, Why systolic architectures,Comput. Mag., vol. 15, pp.3745.

  6. C. Cheng and K. K. Parhi, "A Novel Systolic Array Structure for DCT," IEEE Trans. Circuits and Systems II: Express Briefs, Accepted for future publication, 2005.

  7. D. F. Chiper, M. N. S. Swamy, M. O. Ahmad, and

    T. Stouraitis, "A systolic array architecture for the discrete sine transform," IEEE Trans. Signal Processing, vol. 50, pp. 2347-2354, 2002.

  8. P.K.Meher,Efficient Systolic Implementation of DFT Using aLow-Complexity Convolution-Like Formulation, IEEE Transactions on Circuits and Systemsii: Express Briefs, vol. 53, no. 8, August 2006.

  9. P. K. Meher, A New Convolutional Formulation of the DFT and Efficient Systolic Implementation, Submitted to IEEE International Region 10 Conference TENCON05, Melbourne, Australia- Nov2005.

Leave a Reply