An Experimental Analysis of MU-MIMO Precoding in Many-Antenna Base Stations

Risu Kumari; Sourav Kumar; Deepshikha Bhakat

doi:10.17577/IJERTCONV3IS25007

NCRAEEE - 2015 (Volume 3 - Issue 25)

An Experimental Analysis of MU-MIMO Precoding in Many-Antenna Base Stations

DOI : 10.17577/IJERTCONV3IS25007

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 74
Total Downloads : 12
Authors : Risu Kumari, Sourav Kumar, Deepshikha Bhakat
Paper ID : IJERTCONV3IS25007
Volume & Issue : NCRAEEE – 2015 (Volume 3 – Issue 25)
Published (First Online): 30-07-2018
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

An Experimental Analysis of MU-MIMO Precoding in Many-Antenna Base Stations

1Risu Kumari, 2Sourav Kumar, 3 Deepshikha Bhakat

Dept. Of. ETC, TAT, Bhubaneswar

Abstract :- Many-antenna base stations promise manyfold spectral ca- pacity increases in theory. However, our recent experimen- tal work has shown a significant performance gap between the traditional MU-MIMO linear precoding method, zero- forcing, and the method proposed for many- antenna base stations, conjugate. Thus, a critical question in the field of many-antenna base stations is: Under what scenarios, if any, does conjugate precoding outperform zero- forcing in real systems?

Towards answering this question, we leverage our expe- rience in building many-antenna base stations to derive a model for the performance of linear precoders in real-world systems. We isolate the primary factors which discrepantly affect these linear precoders, then capture their complex in- teractions in an analytical model. By combining our real- world capacity results with this analytical model, we find new insight in to the tradeoffs between conjugate and zero- forcing precoding. Our results suggest that conjugate will outperform zero-forcing when there are many simultaneous users, the users have high mobility, or the implementation employs less-capable hardware. We find that our model is not only useful for guiding the hardware design of base sta- tions, but can also facilitate dynamically switching to the op- timal linear precoding algorithm in realtime, through adap- tive precoding.

Keywords:- Large-scale Antenna Systems (LSAS), Many- Antenna, Mas- sive MIMO, Multi-User MIMO, Beamforming, Linear Pre- coding, Conjugate, Zero-forcing

INTRODUCTION

Recent work has proposed using many-antenna base sta- tions to vastly improve spectral capacity in cellular networks by serving tens of users simultaneously. However, traditional linear precoding techniques do not scale up well with the number of antennas. For example, the predominant multi-

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

user multiple input multiple output (MU-MIMO) linear pre- coding technique, zero-forcing, leverages a pseudo-inverse of the channel matrix to nullify interference within multiple spatial streams; this requires centralized processing, utilizes non- parallelizable algorithms, and has polynomial complex- ity with regard to both the number of base station antennas and users served. Thus, to overcome this scalability chal- lenge, recent theoretical work proposed applying the sim- plest form of linear precoding, conjugate beamforming, to many-antenna base stations, and showed that as the num- ber base station antennas increases it approaches optimal [5]. A modified form of conjugate beamforming can not only be fully distributed and parallelized, but also has linear com- plexity with the number of base station antennas [8].

Unfortunately, our recent experimental work has shown that even with a substantial number of base station anten- nas that conjugate performs significantly worse than zero- forcing. For example, it only achieves 45% capacity with 64 base station antennas [8]. However these results only indi- cate the channel capacity after the channel state informa- tion (CSI) has been collected and the required computation completed, thus it neglects the computational overhead and the realtime requirements of a practical system. This leads us to an important question in the field of many-antenna base stations: Under what scenarios, if any, does conjugate precoding outperform zero-forcing in real systems?

Towards answering this question, we draw on our experi- ence in building many-antenna base stations to isolate the key practical factors which affect the performance of a real- world system. At a high level these factors can be classified in to two categories: environmental and design. The en- vironmental factors include channel coherence and precoder spectral efficiency. These factors are completely independent of the base station implementation, and can be measured for a given location. The design factors include number of an- tennas and hardware capability.

These factors exhibit complex and nuanced interaction in practice. We derive an analytical model that captures this behavior to predict the achieved spectral capacity of lin- ear precoding techniques in realtime systems. Using results from our implementation of a many-antenna base station, we leverage this model to identify and investigate the trade- off points at which conjugate can outperform zero-forcing. We find that in a low- end, cost-effective, base station con- jugate outperforms zero- forcing at coherence times of up to

38 ms, when serving a modest 15 users. However, this coher- ence tradeoff point is reduced substantially as the number of users decreases or the capability of the hardware increases. By utilizing our performance model, base station design- ers can optimize their cost vs. performance tradeoffs and tailor their design to

fit specific deployments. Furthermore,

since channel coherence and the number of users can vary substantially in real-world deployments, our results suggest that it will be advantageous for base stations to dynamically switch between precoding techniques to optimize capacity, which we call adaptive precoding.

The rest of this paper is organized as follows: We provide a brief background in Section 2. In Section 3 we discuss the factors which affect performance, then use them to build a performance model in Section 4. We leverage this model to predict tradeoff points between the precoding techniques, which we present with other results in Section 5. In Sec- tion 6 we discuss future work, followed by a brief overview of related work in 7, then conclude in Section 8.
BACKGROUND

There are many forms of MU-MIMO; we focus on linear precoding since other methods are computationally infeasi- ble in practice, or do not take advantage of the potential capacity gains from many-antenna systems. Let s denote a K Ã— 1 vector representing the data-bearing symbols to K users. Linear precoding creates a downlink transmission vector sr for M antennas, by multiplying the original data vector s by a M Ã— K matrix W: sr = W Â· s. In the uplink the data symbols from the K terminals can be recovered similarly, by performing s = WT Â· sr.

The beamforming weights, W, are computed according to the precoding algorithm; in this work we analyze the two predominant algorithms: conjugate and zero-forcing. Conjugate uses beamforming weights which arethe complex

conjugate of the channel matrix, H, Wconj = cH, where

H, which maximizes the SNR to each user, regardless of

interference. Zero-forcing calculates the beamwe.ights as .a

takes the user or something in the path of the user to move

1/4 wavelength. For example, at a carrier frequency of 2.4

GHz (wavelength of 12.5 cm) a user moving at 140 mph has a coherence time of 500 Âµs. However, this neglects movement in the environment itself and experimental evaluation has shown that vehicular mobility near users results in less than

300 Âµs coherence intervals in the 2.4 GHz band [2]. Previous work based on LTE channel models often use approximately

1 ms cohernce times [5].

Coherence bandwidth is the approximately flat frequency interval of the channel. Delay spread in multipath environ- ments causes the channels frequency response to become rough. However, channels can still be approximated as smooth over the coherence bandwidth, usually derived as the inverse of the delay spread. This effectively requires the channel to be estimated at regular intervals across the spec- trum to obtain accurate CSI. In LTE models the coherence bandwidth is 210 kHz, as described in further detail in [5].

Channel coherence determines the latency of CSI acquisi- tion and how long that CSI is valid. Since the CSI is only valid temporarily, the overhead of CSI collection and pre- coding computation results in a direct loss of capacity. More importantly, however, this overhead is fixed with respect to channel coherence time. Thus, as channel coherence is re- duced, the relative capacity loss grows. Since conjugate and zero-forcing have drastically different computational over- heads they behave differently as coherence time varies.

3.1.2 Precoder Spectral Efficiency

Zero-forcing and conjugate provide vastly different spec- tral efficiencies during actual data transmissions [8]. We define

pseudo-inverse of the channel matrix, Wzf = cH

which forces inter-user interference to zero.

HTH

, 1

precoder spectral efficiency as the capacity achieved (bps/hz) using M antennas to serve K users in a given en- vironment

For more detailed background, we suggest [5, 8, 3].

PERFORMANCE FACTORS

The factors which affect the performance of base stations employing linear precoding can be classified as either envi- ronmental or by design. The propagation environment af- fects the channel coherence and the precoders spectral effi- ciency. The base station design determines the number base station antennas, the number of users that can be served, and the precoding algorithms latency. We next define each factor and their effect on performance, identify how they cause discrepant behavior in conjugate and zero-forcing pre- coding, and characterize them in real-world systems.

Environmental Factors
1. Channel coherence
  
  Channel coherence describes how smooth the physical wireless channel is, in both time and frequency. Essentially, it determines how often CSI must be collected. If the channel changes too much over time, then the previously estimated channel state becomes useless. The duration of this interval is the coherence time. Similarly, one channel estimate is not valid for the entire spectrum. Thus, the channel state must be estimated at intervals across the entire wideband channel; the width of this interval is the coherence bandwidth.
  
  Coherence time is determined by user mobility. Theoreti- cal models simulate coherence time as the amount of time it
  
  neglecting all CSI and computational overhead. Because these factors are neglected, precoder spectral effi- ciency is independent of base station implementation (for a given M and K).
  
  This spectral efficiency is determined by the propagation environment, specifically channel orthogonality, user distance, noise, and interference. It is important to note that the rel- ative spectral efficiency of conjugate and zero-forcing varies significantly with SNR, as further explored in [9, 8]. How- ever, zero-forcing is known to perform poorly in low SNR regimes,
  
  so a slightly modified form, often referred to as MMSE, should be used in these scenarios. MMSE has neg- ligibly increased performance overhead when compared to zero-forcing, but performs much better at low SNRs, as shown nicely in [7].
  
  While the relative performance to con- jugate still varies with SNR, it is not as drastic.
  
  One approach to approximate spectral efficiency is to mea- sure each environmental property to create a channel model and simulate precoder spectral efficiency. Alternatively, we employ a more accurate approach that uses a many-antenna base station to measure spectral efficiency directly, thus cap- turing the combined effect of these properties on capacity.
Design Factors
1. Number of Antennas
  
  The number antennas, both on the base station or with each additional user, drastically affects the capacity in two ways. While more antennas increase spectral efficiency, they also increase CSI collection and precoding computation over- head, decreasing the amount of time available to send data.
  
  Typically, each additional base station antenna provides a power gain (both by increasing the total transmit power and improving directionality), as well as a potential multiplexing gain (by increasing the possible number of users served si- multaneously). However, when zero-forcing, each additional antenna also increases the amount of data sent to the central processor, increasing transport and processing overhead. In contrast, conjugate can be distributed in a manner requiring no additional overhead with more base station antennas.
  
  Each additional user provides a multiplexing gain at the expense of a data slot being converted to a pilot slot, and less transmit power per user. However, in low coherence chan- nels, it may be impossible to collect CSI for all available users and still have time left to send data, thus limiting the number of users that can be optimally served. Notably, the complexity and relative performance of each precoder grow at different rates with the number of base station antennas and users. Since zero- forcing has polynomial unparalleliz- able complexity, it suffers more as M and K increase. This indicates that the optimal number of users to serve is de- pendent on the precoding technique due to these differences in computational overhead.
2. Hardware Capability

The base stations hardware determines computation and data transport latency. After CSI estimation, the base sta- tion must perform the linear precoding computation before data transmission. Any delay caused by this processing re- sults in a direct capacity loss. All linear precoding tech- niques require the same computation to apply the beam weights. Additionally, even traditional baseband processing for wideband systems, such as OFDM, can cause substantial delay. However, since these overheads are common to both zero-forcing and conjugate, we omit them from our analysis as they do not provide additional insight in the performance tradeoffs; they essentially have the effect of further shorten- ing the coherence time.

While conjugate beamforming requires negligible compu- tation beyond the basic linear precoder, zero-forcing has polynomial time complexity with regard to the number of base station antennas and users, and its matrix inverse oper- ations have internal data dependencies which prevent them from being fully parallelized. Additionally, zero-forcing has a central data dependency: i.e., it requires CSI from each base station antenna at a central location to compute the beamforming weights, then these weights must be sent back to each of the radios. When the base station has a large number of radios serving many users across a large band- width, this simple data transportation results in significant overhead thereby decreasing the amount of usable coherence time. Thus, the performance of zero-forcing is dependent on the base stations matrix inverse and data transport perfor- mance, as well as channel bandwidth, as further described below.

Matrix Inversion. Matrix inversions have internal data dependencies which prevent full parallelization of the algo- rithm. As the number of simultaneously served users in- creases, the resulting inverse latency increase cannot be com- pensated for with additional hardware.

Matrix inversion is an operation that is O(M K 2 ) and thus the incurred latency scales cubically with the number of con-

currently served users (since M K). Each of the compo- nent operations are CORDIC rotations and divisions whch

are orders of magnitude more time and resource intensive than simple multiplications and additions (matrix multipli- cation is also O(M K2) but far less complex and can be fully parallelized).

Additionally, the inversion must be performed for each coherence bandwidth interval across the entire wide band. For example, a system similar to LTE with a 40 MHz band- width and a coherence interval of 210 kHz requires 191 of these inverses.

Examples of realtime performance for such a system are dependent on the type of hardware employed. We consider two realistic inversion engines. On the lower, cheaper end, we consider a high performance desktop (Intel-i7, 4 core, us- ing MKL/SSE) CPU and benchmark the matrix inversion performance. Given that each inverse can be computed in parallel, this system can perform 4 inverses at a time, thus, such a system can perform 191 15×15 matrix inversions in approximately 2500 Âµs. The best case method of performing a matrix inverse is to use dedicated inversion hardware such as an FPGA or ASIC. This method is far more expensive to implement, but would be appropriate for use in a next generation base station. We consider the FPGA complex matrix inversion specified in [1] and compute the expected inverse latency. For this ideal system, 191 15×15 inversions can be computed in approximately 260 Âµs, almost an order of magnitude less than the CPU method. Note that due to the non- parallelizable nature of the inverse algorithm, this overhead is not easily addressed by Moores law, as addi- tional cores cannot reduce the latency of an inverse, which grows with the number of users being served.

Data Transport Performance. Current data transport hard- ware, such as Ethernet or InfiniBand, range in throughput from 1 Gbps to over 40 Gbps. Along with inversion latency, data transport latency significantly detracts from the per- formance of zero- forcing transmissions due to the inherent, centralized data dependency.

This requires each channel vector to be transported from the radio, through a switch, to the central controller. Once the inverse is computed, the beamforming weights must be sent back to the radios. Thus this process requires two data transmissions (CSI forward and weights backward), each of which include the hop latency of traveling through the switch, as well as propagation delay. The propagation delay exceeds 5 Âµs per kilometer, given the reduced speed of light in fiber optic cables. In general, the amount of data in both directions is symmetric, as there is both a CSI estimate and a beamweight required for each antenna on each coherence bandwidth.

Gigabit Ethernet (GbE) can transport data at a rate of

1 Gbps to 40 Gbps and has an incurred hop latency of approximately 20 Âµs [6]. Common Public Radio Interface (CPRI), which has a similar performance to Ethernet, is typically used for data transport in cellular systems, how- ever it is specialized for sending continuous synchronized I/Q samples, and would have to be altered to support this application. For the round trip transportation of 191 15×15 matrices (with 32 bit complex values), a 10 GbE system in- curs a latency of at least 355 Âµs. InfiniBand is a faster, more expensive transportation system intended for supercomput- ing clusters that is capable of 40 Gbps throughput with only

1 Âµs hop latencies [4]. For the round trip transportation of

191 15×15 matrices, this system incurs a latency of approx- imately 70 Âµs.

where:

Variable	Description	Unit
Ct Cb	Coherence time Coherence bandwidth Spectral efficiency per user	s hz bps/hz/u
K	# users	u
M	# base station antennas
S	Data transport throughput	bps
L	Data transport hop latency	s
T-1	Time to perform an inverse	s
Nb B	# bits per CSI Bandwidth	bits hz
E P	% of time transmitting data Channel est. overhead Total processing time Achieved aggregate capacity	% s s bps/hz

Variable	Description	Unit
Ct Cb	Coherence time Coherence bandwidth Spectral efficiency per user	s hz bps/hz/u
K	# users	u
M	# base station antennas
S	Data transport throughput	bps
L	Data transport hop latency	s
T-1	Time to perform an inverse	s
Nb B	# bits per CSI Bandwidth	bits hz
E P	% of time transmitting data Channel est. overhead Total processing time Achieved aggregate capacity	% s s bps/hz

= Ct E P

Ct

(2)

For each user, it takes 1/Cb time to collect accurate channel information for the whole spectrum (since each spectrum block can be measured in parallel), thus:

K

E = (3)

Cb

Since conjugate does not require central processing, it has no processing overhead, so PC = 0. However, due to central- ized processing requirements of zero-forcing, it must spend a large amount of time in data transport and computing inverses, and thus has a substantial additional overhead:

Table 1: Parameters. Upper set are model inputs

categorized by environment and design. Lower set

. .

M Â· K Â· B Â· Nb B

are model variables. PZF = 2 Â·

Cb + L +

S Cb

Â· T-1 (4)

Notably, the data being sent to each user must also be distributed to all of the radios, however this is a common requirement for all precoding techniques, would likely use a separate data link, and is much less sensitive to latency.

Channel Bandwidth. Practical communication systems use wide channel bandwidths in order to increase capacity. Unfortunately, as mentioned above, the frequency response of this channel is not flat, thus CSI estimation and pre- coding computation has to be repeated at regular intervals across the band. Thus, the number of inverses and amount of data transport required scale linearly with the bandwidth. In current LTE standards the largest channel bandwidth is

40 MHz (20 MHz downlink and 20 MHz uplink, in FDD), whereas the next generation of WiFi, 802.11ac, goes up to 160 MHz bandwidths (two bonded 80 MHz bands).

PERFORMANCE MODEL

Using the factors discussed in the previous section, we now

The first part of the equation accounts for the time it takes to send the B/Cb channel vectors, each with K entries that have Nb bits from the M antennas to the central processor over a connection with a speed of S and hop latency of L (which includes propagation delay due to cable length). This is doubled, since the central processor then has to send the beamweights back to each of the M radios. If the size of the beamweights and CSI differ, due to the use of codebooks, compression, or quantization, the forward and reverse links can be trivially separated to account for this asymmetry. The second component accounts for the amount of time it takes to perform the K Ã— K inverses for each of the B/Cb coherence bandwidths.

4.3 Complete Model

Combining all of the factors we see that the modeled throughput for conjugate is:

C

C

Ct K

b

present the model which dictates the real-world performance

of these linear precoding techniques. These factors exhibit coplex interactions in real-world systems; we use our model

C =

And for zero-forcing is:

Ct Â· C Â· K (5)

to capture these interactions and analyze their impact on

practical performance. C K 2Â·

C b

C b

MÂ·KÂ· B Â·N

b +L + B Â·T

t Cb S Cb -1

SIMULATION

Â· K (6)

forcing and conjugate, respectively.

Model Derivation

The goal of this model is to find the real-world achieved capacity of a linear precoding system when given the chan- nel coherence, number of base station antennas, number of users, hardware capability, precoder spectral capacity, and bandwidth. At a high level, the system capacity, , can be shown in terms of , which is determined by the environ- mental factors, and , which is a result of the design factors:

= Â· Â· K (1)

This equation describes simultaneous data transmission to K users at a rate of bps/hz each, however due to the over-

head of channel estimation (E) and processing (P ), we can

actually only transmit percent of each coherence time (Ct),

Leveraging our model we analyze the performance of prac- tical

many-antenna linear precoding under realistic constraints. We focus on scenarios where the performance of conjugate and zero-forcing cross, as they highlight the conditions when

it is important to consider the tradeoffs between the two pre- coding techniques.

Simulation Methodology

Using the performance model described in Section 4, we input a range of realistic parameter values and analyze their impact on performance. As defined in Table 1, there are

11 input parameters to the model; in order to reduce the dimensionality in the presented results, we hold Cb , M , Nb , and B constant, as they yield the least interesting impacts on performance. For all experiments we base the coherence bandwidth, Cb, and channel width, B, on LTE, which defines Cb = 210 kHz and B = 40 MHz (20 MHz uplink and 20 MHz

80

Achieved Capacity (bps/Hz)

70

60

50

40

30

20

10

0

10 4

Conjugate

10 3

10 2

35

Achieved Capacity (bps/Hz)







					ZeroForcing Conjugate
					ZeroForcing Conjugate







					ZeroForcing Conjugate
					ZeroForcing Conjugate

30

25

20

15

10

5

101

Coherence Time (s)

Type S L Inv. Type Sym.

0

2 4 6 8 10 12 14

Number of Users

Super	InfiniBand	40 Gbps	1 Âµs	FPGA
Cluster	4x10GbE	40 Gbps	20 Âµs	8xIntel i7		Figure 2: Zero-forcing and conjugate performance comparison for
High	2x10GbE	20 Gbps	20 Âµs	4xIntel i7		number of terminals and fixed co- herence time of 30 ms with low-
Mid Low	10GbE GbE	10 Gbps 1 Gbps	20 Âµs 20 Âµs	2xIntel i7 Intel i7	$ K	end hardware. 5.2.2 Number of Users

Figure 1: Zero-forcing and conjugate performance comparison for different hardware configurations in a M=64, K=15 system.

downlink). Our platform supports up to 64 base station antennas, so M = 64. We choose the number of bits in channel estimates and beamweights to be 32 (16 real and 16 imaginary), as this offers low quantization error, and is the width used by our implementation.

We then vary the remaining 7 parameters as follows: We look at channel coherence times, Ct , that range from 500 Âµs to 100 ms, which are reasonable for real-world mobility, and in-line with the LTE parameters. Using the many-antenna base station implementation described in [8] we collect the real-world spectral efficiency, , achieved by conjugate and zero-forcing precoding asthe number of users, K, varies from

1 to 15. In order to assess the impact of hardware capabil- ity, S, D, L, and T-1 , on capacity, we devise four base sta- tions which range from low-end hardware using Ethernet to high-end custom FPGA designs using InfiniBand; the spec- ifications are provided in Figure 1 [6, 4]. We assume that processing is local, and thus propagation delay is negligible.

Results

The main factors which affect the performance tradeoffs between conjugate and zero-forcing are coherence time, hard- ware capability, and number of users. We design simulations which analyze each of these factors, and clearly show their impact on the tradeoff between conjugate and zero-forcing.

5.2.1 Coherence Time and Hardware Capability

We first look at the achieved capacity of conjugate and zero- forcing with regard to coherence time. Figure 1 shows that while serving 15 users simultaneously, conjugate beam- forming outperforms zero-forcing at coherence times up to

38 ms in the low-end base station. We clearly see that as the coherence time drops, the overhead of zero-forcing dom- inates its capacity.

However, we can also see in Figure 1, that given the specialized super high performance central processor and switch we can reduce this tradeoff point to below 1.5 ms. Even using very high-end servers, it is still very difficult to reduce the tradeoff point to below 5 ms.

Finally, we note that as the number of users grows, the performance of zero-forcing quickly degrades under the con- straint of low coherence times, as the overhead from data- transport and processing dominate its capacity. Figure 2 demonstrates a scenario where conjugate begins to outper- form zero-forcing with more users; with 4-6 users their per- formance is equivalent, but as the number of users grows to

15, zero-forcing achieves only 65% the capacity of conjugate. This also demonstrates the criticality of choosing the opti- mal number of users to serve, as the capacity of zero-forcing peaks at 11 users under these constraints. We use the low- end hardware to demonstrate these effects, however higher- end hardware will also show this behavior as the number of users increases; our models show that Â· K (an indicator of peak capacity), under the same 30 ms coherence and 64 base station scenario, is maximal at 49 users, 73 users, 83 users, and 101 users, for the mid, high, cluster, and super hardware configurations, respectively.

5.3 Implications

These results indicate that our model can play two im- portant roles in the development of many-antenna base sta- tions: (i ) guiding base station design and (ii ) enablig adap- tive precoding. We find that conjugate beamforming will be better suited for high frequency bands where coherence is lower and antenna arrays have much smaller form fac- tors, whereas zero- forcing will be more appropriate at lower frequencies with fewer antennas. The actual tradeoff fre- quencies between these regimes will be a function of user mobility and hardware implementation, and in the tradeoff region adaptive precoding will be useful.

Base station design. Using our model, base station archi- tects can appropriately provision their design to meet real- world performance requirements. By measuring the environ- mental factors, they can determine the design constraints they need to meet in order to achieve their performance goals. This can help them avoid costly mistakes, such as investing in a zero- forcing system for an environment with very short coherence time.

Adaptive Precoding. The optimal precoding technique varies according to factors which change in realtime, such as the number of users or channel coherence. Thus, for deploy-

ments that encompass the tradeoff points highlighted by our results, it will be advantageous to dynamically switch between conjugate and zero-forcing through adaptive pre- coding. Since users exhibit widely varying mobility, their coherence time may drop below the threshold where zero- forcing is optimal, and thus the system should dynamically switch to conjugate. Notably, users can be scheduled in groups based on mobility, and thus the precoding can not only be adaptive across time and frequency, but user group- ing as well.

DISCUSSION AND FUTURE WORK

It is typically very difficult to capture the behavior and performance of complex real-world systems using an analyt- ical model. Our approach addresses this issue by separating the erratic and complex behavior of the environment from the deterministic overhead imposed by the hardware de- sign. This enables system architects to identify and address critical high-level design factors which affect performance from a hardware design perspective then leverage empirical measurements of the environmental factors from the target topology to estimate real- world performance.

Clearly every system design has much more complex in- ternal interactions, such as multiple levels of hardware, soft- ware, and data interconnects, which determine the actual overhead of the high-level factors. These design details can easily be incorporated in to the model. As we develop our own realtime adaptive precoding system we are iteratively refining this abstract model to incorporate concrete imple- mentation details specific to our design. Additionally, as we collect more experimental data from various propagation en- vironments, with more simultaneous users, we will further hone the accuracy and applicability of the model.

We also note that the simulation results presented are a very conservative estimate of the real-world tradeoff points; the parameters chosen are reasonable estimates intended to demonstrate the behavior and trends of the model. Many of the common overheads, such as cyclic prefix, synchroniza- tion, control, etc., are omitted from the analysis, and have essentially the same effect as reducing the coherence time. Furthermore, many of the overhead estimates represent ide- alized, lower-bound, overhead rather than values expected in a full implementation, e.g., data-transport, computation, and CSI collection. However, these values are design and environment specific, and should be determined on a per- system basis, then incorporated in to the model accordingly.
RELATED WORK

While there is plethora of theoretical work on many-antenna base stations, due to the recent nature of this area, to the best of our knowledge, only one explores the tradeoffs be- tween linear precoding techniques. In [9], Yang et al. ana- lyze the radiated power and computational requirements of conjugate and zero- forcing linear precoders. However, when determining the performance of the precoders, the authors do not account for the time it takes to perform these ad- ditional computations, nor do they consider other practical implementation issues, such the data transport overhead or the non-parallelizable nature of inverses. Their simulations assume a channel coherence time of 933 Âµs, which, as we have shown, can cause serious performance degradation in zero-forcing. While this work is very insightful from a the-

oretical perspective, particularly with regard to energy and spectral efficiency, it neglects the practical implementation challenges facing many-antenna precoding, which drastically affect real-world performance.
CONCLUDING REMARKS

Many-antenna base stations show enormous potential in multiplying the spectral capacity of wireless systems. How- ever it is imperative to discover and understand at the real- world factors which affect their performance in order de- sign systems which achieve their potential capacity gain. We have analyzed and described the critical system fac- tors which discrepantly affect the performance of the two predominant linear precoders envisioned for many-antenna beamforming. Contrary to some existing theoretical theo- retical analysis, our results indicates that conjugate beam- forming likely outperforms zero- forcing in many realistic sce- narios. Our robust model can not only be used to help guide system design and provisioning, but also indicates that base stations can greatly benefit from adaptive precoding, en- abling them to dynamically switch to the optimal precoding technique as the users and environment vary.

ACKNOWLEGEMENTS

This work was funded in part by NSF grants CRI 0751173, MRI 0923479, NetSE 101283, MRI 1126478 and CNS 1218700. Clayton

Shepard was supported by an NDSEG fellowship.

We thank Ashutosh Sabharwal, Edward Knightly, Chris Hunter, and Patrick Murphy for their input and support.

REFERENCES

Altera. Floating-Point Megafunctions User Guide, Nov. 2011. Available at:

www.altera.com/literature/ug/ug_altfp_mfug.pdf.
E. Aryafar, N. Anand, T. Salonidis, and E. Knightly. Design and experimental evaluation of multi-user beamforming in Wireless LANs. In Proc. ACM MobiCom, 2010.
F. Fernandes, A. Ashikhmin, and T.L. Marzetta. Inter-cell interference in noncooperative TDD large scale antenna systems. IEEE Journal on Selected Areas in Communications, 2013.
InfiniBand. Available at: www.infinibandta.org. [5] T.L. Marzetta. Noncooperative cellular wireless with unlimited numbers of base station antennas. IEEE Trans. on Wireless Communications, 2010.

Netgear. PROSAFE 52-Port Gigabit Stackable Switch. Available at: www.netgear.com/business/products/ switches/stackable- smart-switches/GS752TXS.aspx#two.
H. Ngo. Performance Bounds for Very Large Multiuser MIMO Systems. PhD thesis, LinkA uping University, The Institute of Technology, 2012.
C. Shepard, H. Yu, N. Anand, E. Li, T. Marzetta, R. Yang, and L. Zhong. Argos: Practical many-antenna base stations. In Proc. ACM MobiCom, 2012.
H. Yang and T.L. Marzetta. Performance of conjugate and zero- forcing beamforming in large-scale antenna systems. IEEE Journal on Selected Areas in Communications, 2013.

An Experimental Analysis of MU-MIMO Precoding in Many-Antenna Base Stations

Leave a Reply