FPGA Implementation of MIMO Based Hybrid QR Decomposition

Download Full-Text PDF Cite this Publication

Text Only Version

FPGA Implementation of MIMO Based Hybrid QR Decomposition

  1. Backia Lakshmi PG Student

    Department of Electronics and Communication Engineering Parisutham Institute of Technology and Science,Thanjavur, Tamilnadu, India.

    Mr. D. Sellathambi M.E., Assistant Professor

    Department of Electronics and Communication Engineering, Parisutham Institute of Technology and Science,Thanjavur,

    Tamilnadu, India.

    Abstract- This paper proposes the increases throughput and



    reduces latency by using hybrid QR decomposition with


    11 1N



    MIMO systems with large antenna configurations, high mobility and, high data rates. In MIMO systems, the main challenge is to

    MIMO systems with large antenna configurations, high mobility and, high data rates. In MIMO systems, the main challenge is to









    pipelined and parallel design. The 4G wireless standards require

    implement a QR decomposition process that efficiently utilizes hardware resources. By using MIMO technology in LTE- Advanced to achieves the highest detection throughput of 1Gbps

    y h

    y h

    M M 1

    h x

    h x

    MN N




    data rates in downlink side. The proposed QR decomposition method is synthesized on Xilinx XC6VLX550T-2FF1759. Test

    So the ML detector detects x is to difficult the

    multiplication of channel matrix H and x is formulated as [5].

    results for the FPGA implementation, shows that the proposed design achieves the lowest latency of 100ns at 300MHz and

    x arg min

    y Hx 2


    reduces area.

    KeywordsFPGA, Hybrid QR decomposition, K-best detector, LTE-A, MIMO (Multiple Input Multiple Output).


      The emerging 4G wireless technology standards, such as IEEE 802.11n and LTE-Advanced, require MIMO systems with high data rates (up to 1Gbps), large constellation orders (64-QAM) and large antenna configurations (4×4 and 8×8). A multiple-input-multiple-output (MIMO) is a system that, utilizes more than one antenna at each of the transmitter and the receiver side [3]. The main aim of using these multiple antennas is to increase the data rates. LTE-A (Long Term Evolution- Advanced) is to use wide bandwidth, up to 100 MHz of spectrum supporting very high data rates. It further improves the capacity and coverage. LTE-Advanced can use

      So we use QR Decomposition to solve matrix multiplication problem and to reduce the amount of computations needed.

      QR Decomposition is required by most MIMO detection schemes to decompose the channel matrix H into a unitary matrix Q and an upper triangular matrix R [9]. The objective of this paper is to develop a hybrid QR Decomposition with pipelined design that allows low-complexity decomposition of large complex matrices, by reducing the number of computations required and by increasing their execution parallelism, to resolve the throughput.

      H QR (4)


      p1p2 p3 q11q12 q13 r11r12 r13

      up to 8×8 MIMO and data rates are 1Gbps at downlink and

      500Mbps at uplink side.

      The baseband equivalent model can be described in,

      H p1p2 p3 , Q q21q22 q23 , R 0

      r22 r23


      y = Hx + n (1)

      Where, N is the number of transmit antennas, M is the number

      p1p2 p3

      q31q32 q33

      0 0


      of receive antennas, y is the N dimensional received symbol, x in the M dimensional transmitted symbol, n is the N dimensional noise vector, and H is the channel matrix.

      Channel estimation is required to provide information for

      The ML detection method used to minimizes the average error probability. The ML detector calculates the Euclidean distances (EDs) between the received signal vector and lattice points Hx, and returns the vector x with the smallest distance,

      i.e. it minimizes,

      further processing of the received signal. The basic information of this channel estimator block provides the

      x arg min

      y Hx 2


      channel matrix H, which contains the complex channels gains between the different transmitter and receiver antennas.


      y Rx n . (7)

      The goal of the MIMO ML detection method is to find the closest transmitted vector x based on the observation.

      The ML detection method is not effective to implement for large constellation sizes (i.e., 64-QAM and larger) because of its exponential complexity nature. The difficulty in solving this problem is the computational complexity, which increases exponentially with the number of transmitting and receiving antennas, hence it is not solvable in polynomial time. Therefore different detection schemes are proposed as an alternative to the ML detector.

      We propose, K-best detector is used in LTE-A receiver side. The K-best algorithm is a breadth-first search based algorithm, which keeps the K nodes which have the smallest Euclidean distances at each level [2]. The K-best algorithm explores the tree from the root to the leaves by expanding each level and it selects the best candidates with the lowest PED in each level that are the surviving nodes of that level. So we multiply matrix R and x is easy compared to matrix multiplication of H and x.

      and upper triangular matrix (R). This fig 1 represents a block diagram of proposed system. The existing method of Householder reflection requires large hardware area and computation time.

      Householder transformations also provide the capability of annihilating multiple elements simultaneously by reflecting a multi-dimensional input vector onto a plane. However, a straightforward VLSI implementation of the Householder algorithm requires square-root operations, division and multiplication and hence it leads to very high hardware complexity. To resolve this issue, presents novel Householder algorithms that use sequences of simple Householder reflections, which can be easily implemented using simple arithmetic operations.

      x arg min

      y Rx 2



      In this existing system, they use an 4 x 4 complex matrix using hybrid QR decomposition with semi pipelined design to convert H into unitary matrix Q and upper triangular matrix and to detect the signal with ML (Maximum Likelihood) detector [6]. The Hybrid QR decomposition, use only householder transformation algorithm and givens rotations algorithm. In givens rotations using CORDIC algorithm to perform the operations. Householder transformations, which can be used to transform the input channel matrix to the final upper- triangular matrix, by eliminating all of the elements below the diagonal in a column simultaneously. The Givens rotations performed by CORDIC algorithm can be used to compute a wide range of functions. The basic concept of the CORDIC computation is to decompose the matrix into upper triangular matrix [10].


      One of the objectives of this project is to develop low- complexity matrix decomposition using hybrid QR decomposition. The proposed scheme should reduce the total number of operations required and should not cause signficant degradation in the BER performance compared to the ML detection. Another objective of this project is to develop an efficient FPGA implementation that offers high detection throughput and low power consumption. In this project we focus on 4×4 complex matrix for 64-QAM K-best detector, with sustained detection throughput of up to 1Gbps.


      In proposed system, we used 4 x4 complex channel matrix using hybrid QR decomposition with pipelined and parallel design to perform the operation and make use of this algorithm to split the channel matrix H into unitary matrix (Q)

      Fig. 1: Block diagram of proposed system

      The pipelining transformation leads to a reduction in the critical path, which can be performed to increase the clock speed or to reduce power consumption at same speed. But, the major use of pipelining is to increase throughput. Parallel processing, multiple outputs are computed in parallel in a clock period. So speed is increased by the level of parallelism. Combination of both is used for higher speed and lower power design (using lower supply voltage). The K-best detector is a breadth-first search algorithm and it provides fixed throughput and reduced complexity [7]. The K-best detector computes the partial Euclidean distances (PEDs) at each level using K best nodes from all the nodes [10].


        Householder transformations, which can be used to transform the input channel matrix to the final upper- triangular matrix, by eliminating all of the elements below the diagonal in a column simultaneously [1]. In Householder transformation use algorithm to solve the H matrix and obtained QR by giving parallel inputs due to that reduce latency.


        Givens rotations is used for zeroing a particular entry of the given matrix. It introduces zeros one at a time. Givens rotations to be performed by using CORDIC algorithm. In

        CORDIC (COordinate Rotation DIgital Computer) algorithm [12] use only two modes i.e. rotation mode and vectoring mode. In CORDIC does not use any square root or divide operations. Here, use only bit-shifts, addition, and subtraction. Also, CORDIC rotations work efficient in the pipelined architectures. We use the CORDIC to determine the angle that a vector makes with the horizontal axis and then use this angle to rotate the vector such that its y component is annihilated.


      In Fig 2 represents the architecture of the proposed QR decomposition with pipelined design. Unrolled processor and global controller is used for all four stages [11]. In each stage, the serves the purpose of selecting the input operands for the CORDIC processor in that stage every clock cycle [13]. The output of each stage is used to re-direct the householder CORDIC [4] outputs to appropriate registers and to hold them until the current stage completes its desired computations and all outputs are ready to be passed to the next stage as inputs.

      Fig 2: Architecture diagram for hybrid QR decomposition with pipelined design

      Finally, we get the result of upper triangular matrix and reduce the matrix multiplication complexity. After that, we use K- best detector and this architecture achieves low latency.


      In this paper, we are using VHDL program coding and we are using the software, XILINX ISE 14.7.


        XILINX software tool is used for hybrid QR decomposition to split the complex matrix into unitary matrix and upper triangular matrix. We are expecting further reduction in power comparing to the existing technique. Fig 3

        shows that the simulation result for proposed method. By using this increase the throughput and reduce latency.


      To demonstrate the performance of the proposed system, a Xilinx Virtex-6 FPGA was used to process the QR decomposition with a block size of 4 × 4. This design is coded in VHDL and synthesized on a Xilinx Virtex-6 device (XC6VLX550T) using the Xilinxs ISE software. Fig 4 shows an RTL view of the proposed method. According to the

      floorplan area synthesis result, our proposed method fig 5 shows implementation can occupied lesser area than existing method. Table1 shows the comparison report for proposed method.

      Fig 3: simulation result for proposed system

      Table 1: Comparison Report

      S. No


      Shabany (Existing)

      This work


      QRD algorithm used

      Hybrid QR decomposition

      Hybrid QR decomposition with Pipelined and Parallel


      System size and Mode

      4 X 4


      4 X 4



      Max clock Frequency




      QRD latency





      (logic utilization)











      1. Slice Reg

      2. Slice LUT

      3. LUT FF

      4. Bonded IOBs

      5. DSP48E1s


In this paper, we proposed hybrid QR decomposition method is synthesized and implemented on Xilinx XC6VLX550T- 2FF1759. It reduces the computation time and also reduced in hardware area compared to previous algorithm presented in the QRD. In this paper, we presented hybrid QR decomposition with pipelined and parallel design architecture which can achieve a low latency. The proposed detection algorithm guarantees the fixed throughput and reduced complexity. As a result, we increase the K nodes then it gives better performance and provides high data rates with low latency. The FPGA implementation and test results demonstrate that it attains the lowest processing latency of 100ns at 300 MHz for QRD of 4×4 complex matrices.

Fig 4: RTL view of proposed method

Fig 5: Floorplan design view for proposed method


I would like to thank my guide Mr. D .sellathambi M.E., Assistant Professor in the Electronics and Communication Engineering Department at the Parisutham Institute of Technology and Science, Thanjavur. For his help and guidance to enable us to propose this paper.


  1. G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd ed. Baltimore, MD: John Hopkins Univ. Press, 1996.

  2. Kalyani.K, Siva.S, Sellathambi.D, Rajaram.S, Parallel Distributed Arithmetic Based K-Best List Sphere Detection Algorithm for LTE Standard, in Proc. IConDM 2013, 133-141.

  3. Biglieri E, Calderbank R, Constantinides A, Goldsmith A, Paulraj A, and Poor H. V, 2007, MIMO Wireless Communications. Cambridge Univ. Press.

  4. J. Delosme and S. Hsiao, Householder CORDIC algorithms, IEEE Trans. Comput., vol. 44, no. 8, pp. 9901001, Aug. 1995.

  5. M. O. Damen, H. E. Gamal, and G. Caire, On maximum-likelihood detection and the search for the closest lattice point, IEEE Trans. Inf. Theory, vol. 49, no. 10, pp. 23892402, Oct. 2003.

  6. P. Salmela, A. Burian, H. Sorokin, and J. Takala, Complex-valued QR decomposition implementation for MIMO receivers, in Proc. IEEE ICASSP 2008, Apr. 2008, pp. 14331436.

  7. K. W. Wong, C. Y. Tsui, R. S. K. Cheng, and W. H. Mow, A VLSI architecture of a K-Best lattice decoding algorithm for MIMO channels, Proc. IEEE Int. Symp. Circuits Syst., vol. 3, pp. 273276, May 2002.

  8. J. Volder, The CORDIC trigonometric computing technique, IRE Trans.

    Electronic Computers, vol. 8, no. 3, pp. 330334, Sep. 1959

  9. Y. T. Hwang andW. D. Chen, A low complexity complex QR factorization design for signal detection inMIMO OFDM system, in Proc. IEEE ISCAS 2008, May 2008, pp. 932935.

  10. S. Chen, T. Zhang, and Y. Xin, Relaxed K-best MIMO Signal Detector Design and VLSI Implementation, IEEE Trans. on Very Large Scale Integration VLSI Systems, vol. 15, no. 3, pp. 328337, Mar. 2007.

  11. E. Deprettere, P. Dewilde, and R. Udo, Pipelined CORDIC architectures for fast VLSI filtering and array processing, in IEEE International Conference on Acoustic, Speech, Signal Processing, ICASSP84, March 1984, volume 9, pp.250253.

  12. Andraka R, Feb. 1998, A survey of CORDIC algorithms for FPGA based computers, in Proc. 1998 ACM/SIGDA 6th Int. Symp. Field Programmable Gate Arrays, pp. 191200.

  13. Y. H.Hu, CORDIC-based VLSI architectures for digital signal processing, IEEE Signal Processing Magazine, vol. 9, no. 3, pp.1635, 1992.

Leave a Reply

Your email address will not be published. Required fields are marked *