 Open Access
 Total Downloads : 11
 Authors : M. Backia Lakshmi , D. Sellathambi
 Paper ID : IJERTCONV3IS04043
 Volume & Issue : NCRTET – 2015 (Volume 3 – Issue 04)
 Published (First Online): 30072018
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
FPGA Implementation of MIMO Based Hybrid QR Decomposition

Backia Lakshmi PG Student
Department of Electronics and Communication Engineering Parisutham Institute of Technology and Science,Thanjavur, Tamilnadu, India.
Mr. D. Sellathambi M.E., Assistant Professor
Department of Electronics and Communication Engineering, Parisutham Institute of Technology and Science,Thanjavur,
Tamilnadu, India.
Abstract This paper proposes the increases throughput and
h
h
reduces latency by using hybrid QR decomposition with
y1
11 1N
x1
n1
MIMO systems with large antenna configurations, high mobility and, high data rates. In MIMO systems, the main challenge is to
MIMO systems with large antenna configurations, high mobility and, high data rates. In MIMO systems, the main challenge is to
y
y
Hx
Hx
n
n
(2)
(2)
pipelined and parallel design. The 4G wireless standards require
implement a QR decomposition process that efficiently utilizes hardware resources. By using MIMO technology in LTE Advanced to achieves the highest detection throughput of 1Gbps
y h
y h
M M 1
h x
h x
MN N
n
n
M
data rates in downlink side. The proposed QR decomposition method is synthesized on Xilinx XC6VLX550T2FF1759. Test
So the ML detector detects x is to difficult the
multiplication of channel matrix H and x is formulated as [5].
results for the FPGA implementation, shows that the proposed design achieves the lowest latency of 100ns at 300MHz and
x arg min
y Hx 2
(3)
reduces area.
KeywordsFPGA, Hybrid QR decomposition, Kbest detector, LTEA, MIMO (Multiple Input Multiple Output).

INTRODUCTION
The emerging 4G wireless technology standards, such as IEEE 802.11n and LTEAdvanced, require MIMO systems with high data rates (up to 1Gbps), large constellation orders (64QAM) and large antenna configurations (4Ã—4 and 8Ã—8). A multipleinputmultipleoutput (MIMO) is a system that, utilizes more than one antenna at each of the transmitter and the receiver side [3]. The main aim of using these multiple antennas is to increase the data rates. LTEA (Long Term Evolution Advanced) is to use wide bandwidth, up to 100 MHz of spectrum supporting very high data rates. It further improves the capacity and coverage. LTEAdvanced can use
So we use QR Decomposition to solve matrix multiplication problem and to reduce the amount of computations needed.
QR Decomposition is required by most MIMO detection schemes to decompose the channel matrix H into a unitary matrix Q and an upper triangular matrix R [9]. The objective of this paper is to develop a hybrid QR Decomposition with pipelined design that allows lowcomplexity decomposition of large complex matrices, by reducing the number of computations required and by increasing their execution parallelism, to resolve the throughput.
H QR (4)
Where,
p1p2 p3 q11q12 q13 r11r12 r13
up to 8×8 MIMO and data rates are 1Gbps at downlink and
500Mbps at uplink side.
The baseband equivalent model can be described in,
H p1p2 p3 , Q q21q22 q23 , R 0
r22 r23
(5)
y = Hx + n (1)
Where, N is the number of transmit antennas, M is the number
p1p2 p3
q31q32 q33
0 0
r33
of receive antennas, y is the N dimensional received symbol, x in the M dimensional transmitted symbol, n is the N dimensional noise vector, and H is the channel matrix.
Channel estimation is required to provide information for
The ML detection method used to minimizes the average error probability. The ML detector calculates the Euclidean distances (EDs) between the received signal vector and lattice points Hx, and returns the vector x with the smallest distance,
i.e. it minimizes,
further processing of the received signal. The basic information of this channel estimator block provides the
x arg min
y Hx 2
(6)
channel matrix H, which contains the complex channels gains between the different transmitter and receiver antennas.
Where,
y Rx n . (7)
The goal of the MIMO ML detection method is to find the closest transmitted vector x based on the observation.
The ML detection method is not effective to implement for large constellation sizes (i.e., 64QAM and larger) because of its exponential complexity nature. The difficulty in solving this problem is the computational complexity, which increases exponentially with the number of transmitting and receiving antennas, hence it is not solvable in polynomial time. Therefore different detection schemes are proposed as an alternative to the ML detector.
We propose, Kbest detector is used in LTEA receiver side. The Kbest algorithm is a breadthfirst search based algorithm, which keeps the K nodes which have the smallest Euclidean distances at each level [2]. The Kbest algorithm explores the tree from the root to the leaves by expanding each level and it selects the best candidates with the lowest PED in each level that are the surviving nodes of that level. So we multiply matrix R and x is easy compared to matrix multiplication of H and x.
and upper triangular matrix (R). This fig 1 represents a block diagram of proposed system. The existing method of Householder reflection requires large hardware area and computation time.
Householder transformations also provide the capability of annihilating multiple elements simultaneously by reflecting a multidimensional input vector onto a plane. However, a straightforward VLSI implementation of the Householder algorithm requires squareroot operations, division and multiplication and hence it leads to very high hardware complexity. To resolve this issue, presents novel Householder algorithms that use sequences of simple Householder reflections, which can be easily implemented using simple arithmetic operations.
x arg min
y Rx 2
(8)

BACKGROUND
In this existing system, they use an 4 x 4 complex matrix using hybrid QR decomposition with semi pipelined design to convert H into unitary matrix Q and upper triangular matrix and to detect the signal with ML (Maximum Likelihood) detector [6]. The Hybrid QR decomposition, use only householder transformation algorithm and givens rotations algorithm. In givens rotations using CORDIC algorithm to perform the operations. Householder transformations, which can be used to transform the input channel matrix to the final upper triangular matrix, by eliminating all of the elements below the diagonal in a column simultaneously. The Givens rotations performed by CORDIC algorithm can be used to compute a wide range of functions. The basic concept of the CORDIC computation is to decompose the matrix into upper triangular matrix [10].

SCOPE OF RESEARCH
One of the objectives of this project is to develop low complexity matrix decomposition using hybrid QR decomposition. The proposed scheme should reduce the total number of operations required and should not cause signficant degradation in the BER performance compared to the ML detection. Another objective of this project is to develop an efficient FPGA implementation that offers high detection throughput and low power consumption. In this project we focus on 4Ã—4 complex matrix for 64QAM Kbest detector, with sustained detection throughput of up to 1Gbps.

PROPOSED METHODOLOGY
In proposed system, we used 4 x4 complex channel matrix using hybrid QR decomposition with pipelined and parallel design to perform the operation and make use of this algorithm to split the channel matrix H into unitary matrix (Q)
Fig. 1: Block diagram of proposed system
The pipelining transformation leads to a reduction in the critical path, which can be performed to increase the clock speed or to reduce power consumption at same speed. But, the major use of pipelining is to increase throughput. Parallel processing, multiple outputs are computed in parallel in a clock period. So speed is increased by the level of parallelism. Combination of both is used for higher speed and lower power design (using lower supply voltage). The Kbest detector is a breadthfirst search algorithm and it provides fixed throughput and reduced complexity [7]. The Kbest detector computes the partial Euclidean distances (PEDs) at each level using K best nodes from all the nodes [10].

HOUSEHOLDER TRANSFORMATION
Householder transformations, which can be used to transform the input channel matrix to the final upper triangular matrix, by eliminating all of the elements below the diagonal in a column simultaneously [1]. In Householder transformation use algorithm to solve the H matrix and obtained QR by giving parallel inputs due to that reduce latency.

GIVENS ROTATIONS
Givens rotations is used for zeroing a particular entry of the given matrix. It introduces zeros one at a time. Givens rotations to be performed by using CORDIC algorithm. In
CORDIC (COordinate Rotation DIgital Computer) algorithm [12] use only two modes i.e. rotation mode and vectoring mode. In CORDIC does not use any square root or divide operations. Here, use only bitshifts, addition, and subtraction. Also, CORDIC rotations work efficient in the pipelined architectures. We use the CORDIC to determine the angle that a vector makes with the horizontal axis and then use this angle to rotate the vector such that its y component is annihilated.

ARCHITECTURE DESIGN
In Fig 2 represents the architecture of the proposed QR decomposition with pipelined design. Unrolled processor and global controller is used for all four stages [11]. In each stage, the serves the purpose of selecting the input operands for the CORDIC processor in that stage every clock cycle [13]. The output of each stage is used to redirect the householder CORDIC [4] outputs to appropriate registers and to hold them until the current stage completes its desired computations and all outputs are ready to be passed to the next stage as inputs.
Fig 2: Architecture diagram for hybrid QR decomposition with pipelined design
Finally, we get the result of upper triangular matrix and reduce the matrix multiplication complexity. After that, we use K best detector and this architecture achieves low latency.


EXPERIMENTAL RESULTS
In this paper, we are using VHDL program coding and we are using the software, XILINX ISE 14.7.

SIMULATION RESULTS
XILINX software tool is used for hybrid QR decomposition to split the complex matrix into unitary matrix and upper triangular matrix. We are expecting further reduction in power comparing to the existing technique. Fig 3
shows that the simulation result for proposed method. By using this increase the throughput and reduce latency.

SYNTHESIS RESULTS
To demonstrate the performance of the proposed system, a Xilinx Virtex6 FPGA was used to process the QR decomposition with a block size of 4 Ã— 4. This design is coded in VHDL and synthesized on a Xilinx Virtex6 device (XC6VLX550T) using the Xilinxs ISE software. Fig 4 shows an RTL view of the proposed method. According to the
floorplan area synthesis result, our proposed method fig 5 shows implementation can occupied lesser area than existing method. Table1 shows the comparison report for proposed method.
Fig 3: simulation result for proposed system
Table 1: Comparison Report
S. No
Reference
Shabany (Existing)
This work
1,
QRD algorithm used
Hybrid QR decomposition
Hybrid QR decomposition with Pipelined and Parallel
2,
System size and Mode
4 X 4
Real
4 X 4
Real
3,
Max clock Frequency
278MHz
300MHz
4,
QRD latency
144ns
100ns
5,
Area
(logic utilization)
570
2245
194
577
318
7
1671
0
432
117

Slice Reg

Slice LUT

LUT FF

Bonded IOBs

DSP48E1s


CONCLUSION

In this paper, we proposed hybrid QR decomposition method is synthesized and implemented on Xilinx XC6VLX550T 2FF1759. It reduces the computation time and also reduced in hardware area compared to previous algorithm presented in the QRD. In this paper, we presented hybrid QR decomposition with pipelined and parallel design architecture which can achieve a low latency. The proposed detection algorithm guarantees the fixed throughput and reduced complexity. As a result, we increase the K nodes then it gives better performance and provides high data rates with low latency. The FPGA implementation and test results demonstrate that it attains the lowest processing latency of 100ns at 300 MHz for QRD of 4Ã—4 complex matrices.
Fig 4: RTL view of proposed method
Fig 5: Floorplan design view for proposed method
ACKNOWLEDGMENT
I would like to thank my guide Mr. D .sellathambi M.E., Assistant Professor in the Electronics and Communication Engineering Department at the Parisutham Institute of Technology and Science, Thanjavur. For his help and guidance to enable us to propose this paper.
REFERENCES

G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd ed. Baltimore, MD: John Hopkins Univ. Press, 1996.

Kalyani.K, Siva.S, Sellathambi.D, Rajaram.S, Parallel Distributed Arithmetic Based KBest List Sphere Detection Algorithm for LTE Standard, in Proc. IConDM 2013, 133141.

Biglieri E, Calderbank R, Constantinides A, Goldsmith A, Paulraj A, and Poor H. V, 2007, MIMO Wireless Communications. Cambridge Univ. Press.

J. Delosme and S. Hsiao, Householder CORDIC algorithms, IEEE Trans. Comput., vol. 44, no. 8, pp. 9901001, Aug. 1995.

M. O. Damen, H. E. Gamal, and G. Caire, On maximumlikelihood detection and the search for the closest lattice point, IEEE Trans. Inf. Theory, vol. 49, no. 10, pp. 23892402, Oct. 2003.

P. Salmela, A. Burian, H. Sorokin, and J. Takala, Complexvalued QR decomposition implementation for MIMO receivers, in Proc. IEEE ICASSP 2008, Apr. 2008, pp. 14331436.

K. W. Wong, C. Y. Tsui, R. S. K. Cheng, and W. H. Mow, A VLSI architecture of a KBest lattice decoding algorithm for MIMO channels, Proc. IEEE Int. Symp. Circuits Syst., vol. 3, pp. 273276, May 2002.

J. Volder, The CORDIC trigonometric computing technique, IRE Trans.
Electronic Computers, vol. 8, no. 3, pp. 330334, Sep. 1959

Y. T. Hwang andW. D. Chen, A low complexity complex QR factorization design for signal detection inMIMO OFDM system, in Proc. IEEE ISCAS 2008, May 2008, pp. 932935.

S. Chen, T. Zhang, and Y. Xin, Relaxed Kbest MIMO Signal Detector Design and VLSI Implementation, IEEE Trans. on Very Large Scale Integration VLSI Systems, vol. 15, no. 3, pp. 328337, Mar. 2007.

E. Deprettere, P. Dewilde, and R. Udo, Pipelined CORDIC architectures for fast VLSI filtering and array processing, in IEEE International Conference on Acoustic, Speech, Signal Processing, ICASSP84, March 1984, volume 9, pp.250253.

Andraka R, Feb. 1998, A survey of CORDIC algorithms for FPGA based computers, in Proc. 1998 ACM/SIGDA 6th Int. Symp. Field Programmable Gate Arrays, pp. 191200.

Y. H.Hu, CORDICbased VLSI architectures for digital signal processing, IEEE Signal Processing Magazine, vol. 9, no. 3, pp.1635, 1992.