Low Power Correlators for Multi-Standard OFDM Synchronisation

Download Full-Text PDF Cite this Publication

Text Only Version

Low Power Correlators for Multi-Standard OFDM Synchronisation

S. Lallu

Dept. of Electronics and Communication Muslim Association College of Engineering Trivandrum, India

Mrs. Kavitha Radhakrishnan

Assistant Professor

Dept. of Electronics and Communication Muslim Association College of Engineering Trivandrum, India

AbstractOrthogonal Frequency Division Multiplexing (OFDM) is a widely used baseband modulation scheme in wireless communication systems. Being a synchronous system, the reception quality of OFDM largely depends on the receiver clock/timing synchronisation, which is achieved by deriving the receiver's sampling clock and center frequency offsets from the received frame. OFDM data frames are prependedwith special cyclic prefixes (CP) andpreambles from which the receiver estimates synchronisation parameters using standard correlation techniques.For emerging multi- standard radio applications like vehicle to vehicle communication, flexible platforms like FPGAs are ideal processing paradigms, because of their computational capabilities and run-time adaptability. However, direct implementation of correlators using built-in DSP modules on modern FPGAs is not ideal for mobile wireless systems, because of the dynamic power consumption. In this paper, we present the architecture of an adaptive multiplier-less correlator that can be dynamically tuned for precision and/or power consumption at run-time. Alsocomputational flexibility is integrated in the design, making them ideally suited for multi-standard OFDM systems that deploy different preamble standards.

Keywords Field programmable gate arrays; multiplier- less correlators; orthogonal frequency division multiplexing.

  1. INTRODUCTION

    Orthogonal frequency division multiplexing (OFDM) is a highly popular modulation scheme due to their ability to combat noise, multipath fading while providing high spectral efficiency. This has resulted in widespread application for ODFM in both wired and wireless environments, and also for high throughput communication systems. Beyond standard applications, OFDM is also a key enabler for emerging wireless technologies such as vehicle- to-vehicle (V2V) communication and cognitive radios, where the requirements of dynamic spectrum access and ad- hoc communication are few of the challenges handled by OFDM techniques.

    However, being a synchronous system, the performance of OFDM is sensitive to the frequency and timing offsets introduced by the communication channel and requires precise synchronisation mechanisms to maintain acceptable levels of performance.A frequency offset at the receiver causes inter-carrier (or inter-subcarrier) interference resulting in loss of information, while asynchronous sampling (timing offset), resulting in loss of data frames. To counter this, OFDM systems employ multiple techniques based on special patterns that precede OFDM data frames,

    called preamble. The properties of the preamble are exploited using standard correlation techniques (autocorrelation or cross-correlation) to estimate the synchronisation parameters [1][2][3]. With autocorrelation, the estimate is computed from the received RF data, while for cross-correlation, the received samples are correlated with the known preamble. While autocorrelation-based techniques are best suited for implementation on processor- based computational platforms, their performance is inferior to cross-correlation-based synchronisation schemes [4], especially in noisy environments. Thus cross-correlation synchronisers are preferred in dynamic wireless systems despite their much higher (5x) hardware requirement.

    On standard processor platforms, correlation can be implemented by modelling tapped-delay lines using multiply accumulate (MAC) operations. For novel applications like V2V where adaptability in spectrum access is also a key, reconfigurable hardware provides the optimal platform, providing multiple levels of flexibility and high computational capacity. On modern FPGAs which feature embedded processing blocks, the pipelined DSP blocks are the natural choice for implementing the tapped-delay line structure. However, for high bit-rate OFDM systems, the synchronisation parameters need to be estimated within the duration of the preamble, requiring highly parallel implementation. This requires multiple DSP blocks to operate in tandem, resulting in large switching activity and thus high dynamic power consumption [5]. Furthermore, large preamble structures that are common in many wireless systems also require large number of DSP blocks to be available in the FPGA; this demands the use of more power hungry high performance series of FPGAs, further increasing the power consumption (static) and the cost.

    Multiplier-less correlatorprovides an efficient solution, where such extensive multiplier operations are replaced by shift-add operations using coefficient decimation technique [6].The key idea is to express correlator coefficients as the sum of powers of 2, thus replacing multiply operations by shift-add operations in the binary representation, without sacrificing the synchronisation accuracy. However, a statically designed multiplier-less correlator requires the precision of the decimation to be chosen at design time, while for many scenarios, the adaptability of precision can be traded off for the dynamic power consumption. In this paper, we propose one such architecture that extends the static multiplier-less correlator to a dynamic structure that can switch to the required level of precision at run-time based on the current channel conditions. This would allow

    our design to dynamically adapt to a lower power consuming state with good signal reception (using lower precision) while enabling migration to higher precision modes as the signal quality deteriorates. Also, the tap length (bit-size) can be dynamically chosen, enabling the same structure to be reused for multiple preamble standards, thus creating a truly dynamic multi-standard correlator structure.

  2. RELATED WORK

    Many research publications describe methods to improve the performance of OFDM systems by improving the performance and accuracy of the receiver synchronisation. Most of them for practical applications are based on autocorrelation-based schemes, primarily due to their low computational complexity. Synchronisation based on CP and preamble symbols was proposed in [2,3,7,8] and rely on the OFDM frame structure, which begins with predefined CP and preamble symbols (standard specific), as shown in the example fig. 1 for an IEEE 802.16 frame. In this structure, the synchronisation is achieved in two steps similar to the proposal in [9]; the short symbols with 64 samples may be used to estimate coarse symbol timing offsets (STO) and fractional carrier frequency offset (CFO), followed by the long sequence which may be used to further improve the performance by estimating the fine STO and coarse CFO. In [9], the authors propose to use autocorrelation for estimating coarse STO and fine CFO while fine STO and coarse CFO are estimated using cross correlation for high accuracy and reasonable computational performance.

    Implementation of FPGA-based correlators was first explored in [10], where the authors implemented an autocorrelation-based synchroniser for OFDM systems. In [11], the authors present a two-stage autocorrelation-based synchronisation scheme on FPGAs and show that the hardware cost can be substantially reduced. In [4], the authors present a comparison between cross-correlation and autocorrelation based synchroniser implementations on FPGAs, on the basis of their performance and verheads. The authors show that their improved cross-correlator achieves significant accuracy gains over autocorrelation method, but consumes over 5x more resources. This is primarily because of the large number of multiply- accumulate blocks that had to operate in tandem in the cross-correlator design, to meet the performance goals. Multiplier-less correlators were initially proposed in [6] using coefficient decimation technique that is widely applied in digital filters. In [5], the authors present a comparison between the multiplier-less correlators and multiplier-based correlators for an 802.16d system. Their work uses a standard specific implementation of multiplier- less correlator, where the preamble is hardcoded at design time. Their results show that multiplier-less correlator can achieve similar precision to multiplier-based cross correlator, while consuming a fraction of the power.

    Our work extends the correlator structure in [5] to a generic

    Correlatorthat is independent of the underlying standard, allowing them to be reused across different application scenarios.Moreover, we propose to improve the power consumption further, by adapting theprecision

    Fig. 1 Preamble structure for IEEE 802.16

    Fig. 2 Transpose direct form representation of a cross correlator

    levels dynamically, based on the system conditions, thus providing an optimized multiplier-less correlator structure that can be applied to multiple OFDM standards or otherwise.

  3. ARCHITECTURE OF CORRELATOR

    As shown in Fig. 1, the preamble in OFDM signal is comprised of consecutive 64 or 128 samplesof preamble, or a combination of the same, like in case of 802.16 standard shown here [12]. For our experiment, we make use of both short (64 sample) and long (128 sample) symbols to compute the cross correlation with known preamble. The input data to the correlators are the real and imaginary 16- bit samples in a fixed point format, with 1-bit real part and 15 bit decimal representation (Q1.15 format). The correlator output against 64/128 samples produces a complex number output, less than unity, and is represented in a 21-bit format (Q6.15).

    1. Mathematical operation

      Correlation can be represented as a series of multiply- accumulate operations across different delay taps. This mirrors the structure of a tapped FIR filter,as shown in the transposed direct form representation in Fig. 2. Here, the coefficients Pr[n] correspond to the complex conjugate of the nth sample of the preamble, which can be precomputed for any standard and Ri represents the ithinput sample.The output of the correlation operation can thus be represented mathematically as

      Xcorr=Pr[63]Ri+z-1(Pr[62]Ri+ z-1(Pr[61]Ri+

      + z-1Pr[0] Ri) )) (1)

    2. Multiplier-less correlator

    The key idea of multiplier-less correlator is to represent the coefficients of the FIR structure in powers of 2, which allows multiplication by coefficients to be replaced by shift- add operations on the input sequence Ri. By properly choosing the quantisation level, this approximation technique can attain the same level of accuracy as multiplier-based correlators. In our design, multiple

    Fig. 3 Structure of Proposed dynamic correlator

    quantisation levels are provided for the preamble, with quantisation steps of 1.0, 0.5, 0.25 and 0.125 simultaneously. This allows our design to choose the appropriate accuracy required for the current channel conditions (like SNR), and thus optimize the power consumption at run-time.

    The architecture of our proposed multiplier-less correlator is shown in Fig. 3. This is based on the transposed direct form FIR filter representation in Fig. 2, with the multiply-add units replaced by a common shift-add block, adders and delay lines built using registers. The quantised preamble values are stored in the register stack at the highest precision level (0.125 quantisation step). The register stack implementation allows these values to be loaded at run-time, allowing the structure to be adapted to different wireless preamble formats. The shift-add block produces 8 outputs each for the real and imaginary parts, each representing a shift-added version of the input sample, at different precision. This approach enables a single shift add block to be re-used across all tap values, rather than having 64/128 different shift-add blocks corresponding to each tap. The quantisation selection input to the shift-add block enables the precision values to be dynamically altered by selectively enabling/disabling paths, thus reducing the dynamic power consumption (by reducing toggling).

    The required output for each tap is then further chosen by the multiplexers at the input of each adder unit. The multiplexers are controlled by the preamble symbol corresponding to each delay line. The correct scaled version of the input symbol (representing the multiplication operation) is thus accumulated at each tap to produce the correlator response. The entire structure is parameterised and can generate delay lines up to 128 taps long, which has to be chosen at design time. The run-time adaptation (with in the maximum design time specification) is chosen by another multiplexer unit, that chooses one of the tap values as the correlated output based on a register setting, which can be managed at run-time.

    The flexibility added to our proposed architecture could however increase the power consumption slightly over fixed quantisation static multiplier-less correlators. For highly power constrained designs, where adaptability is not a concern, the parameters could be frozen at design time and replaced by hardwired constants, thus creating a customised correlator that is tuned to the application under test. In our experiments, we evaluate the performance of hardwiring the flexibility and determine the accuracy of different

    quantisation levels, compared to a standard multiplier-based correlator on the same FPGA device.

  4. RESULTS

    To present a case for multiplier-less correlators and to show that such architectures provides large savings in power consumption, we have also implemented a multiplier-based correlator for synchronising the short and long preamble sequences in the 802.16 frame. Our target platforms are the low power Xilinx Spartan-6 series of FPGAs and the higher performance Xilinx Virtex-6 series. The multiplier-less correlator is described in Verilog HDL and is directly implemented using vendor design tools (Xilinx ISE), where DSP mapping is forced using settings in the compilation flow. The multiplier-less correlator is also described in Verilog HDL by modelling the structure. To enable comparison across different quantisation levels, the dynamic controls are hardwired to represent correlators of specific precision.

    Table I represents the resource utilisation of the designs on the different target platforms (Virtex-6 LX240T device and Spartan-6 LX45 device) for a 64 sample case.The multiplier-based correlator (denoted as mult-based) design can be mapped entirely into the DSP blocks of the FPGA, thus requiring no other resources to implement the functionality. However, this results in the use of 256 DSP blocks for the functionality, which is not available even on the largest Spartan-6 FPGA, thus forcing the use of the higher power consuming Virtex-6 device. On the other hand, the multiplier-less correlator (denoted as mult-less) with the different precision levels (Q1 quantisation level 1, Q2 quantisation level 0.5, Q3 quantisation level 0.25 and Q4 quantisation level 0.125) could easily fit into a small Spartan-6 FPGA, thus ensuring that the static power consumption (the device dependent power component) is also lower. Furthermore, the highest precision (Q4) design on the low power Spartan-6 design has a maximum operating frequency of 128 MHz, higher than the multiplier- based design on the more powerful Virtex-6 device (115 MHz). Extracting the best performance from DSP-based designs require careful low-level designing, which could improve th performance of the multiplier-based correlators further. The 128 sample case consumes double the DSP blocks in case of multiplier-based correlators and nearly double the number of look-up tables (LUTs) and flip-flops (FFs) in case of multiplier-less correlators.

    TABLE I. RESOURCE UTILISATION OF MULTIPLIER-BASED AND MULTIPLIER-LESS CORRELATORS

    Design

    Virtex-6 LX240T

    Spartan-6 LX45

    DSPs

    FFs

    LUTs

    DSP

    FFs

    LUTs

    Mult- based

    256

    Cannot fit

    Mult- less Q1

    2632

    (1%)

    2040

    (1%)

    2635

    (4%)

    2044

    (7%)

    Mutl- less Q2

    2736

    (1%)

    3436

    (2%)

    2770

    (5 %)

    3443

    (12%)

    Mutl- less Q3

    2727

    (1%)

    4149

    (2%)

    2727

    (4%)

    4156

    (15%)

    Mult- less Q4

    2727

    (1%)

    5228

    (3 %)

    2748

    (5%)

    5229

    (19%)

    To evaluate the power consumption of the individual designs, we make use of the power analyser tool from Xilinx (XPower Analyser) using toggle rates from the post place and route simulation. The results are tabulated in Table II. The system is set to operate at 50 MHz The results show that the multiplier-based correlator consumes almost 4x dynamic power compared to the Q4 multiplier-less design (highest precision), while the static power consumption of the powerful Virtex-6 device results in large total power consumption for multiplier-based designs, making it unusable for battery-powered computing systems. On the other hand, the dynamic power consumption on the Spartan-6 device is slightly higher than the corresponding Virtex-6 implementation because of the slightly higher resource utilisation on the Spartan-6 device. However, the total power consumption is much lower, thanks to the large savings on the static power consumed, making the Spartan- 6 device an ideal choice for power constrained systems.

    Finally, we observed the precision of the multiplier-less design compared to the multiplier-based correlator using the

    64 sample and 128 sample preamble configurations. To model the effect of channel, we modelled an AWGN channel in MATLAB and used the output from the channel as the input to the correlators, at different noise levels. We observed that for all cases when the receiver SNR was above 16 dB, even the multiplier-less correlator with lowest precision (Q1) produced correlation results that are close to the multiplier-based design. However, for SNRs less than 8 dB, higher precision models still followed the multiplier- based correlator outputs closely thus making a case for highly precise lower power correlators based on multiplier- less designs. This allows our design to adapt to different precisions, based on SNR estimates, thus further reducing power consumption.

    Design

    Virtex-6 LX240T

    Spartan-6 LX45

    Static

    Dynamic

    Total

    Static

    Dynamic

    Total

    Mult- based

    3.4 W

    820 mW

    4.22

    W

    40 mW

    Mult- less Q1

    130 mW

    3.53

    W

    150 mW

    190

    mW

    Mutl- less Q2

    168 mW

    3.57

    W

    205 mW

    245

    mW

    Mutl- less Q3

    188 mW

    3.59

    W

    250 mW

    290

    mW

    Mult- less Q4

    210 mW

    3.61

    W

    304 mW

    344

    mW

    Design

    Virtex-6 LX240T

    Spartan-6 LX45

    Static

    Dynamic

    Total

    Static

    Dynamic

    Total

    Mult- based

    3.4 W

    820 mW

    4.22

    W

    40 mW

    Mult- less Q1

    130 mW

    3.53

    W

    150 mW

    190

    mW

    Mutl- less Q2

    168 mW

    3.57

    W

    205 mW

    245

    mW

    Mutl- less Q3

    188 mW

    3.59

    W

    250 mW

    290

    mW

    Mult- less Q4

    210 mW

    3.61

    W

    304 mW

    344

    mW

    TABLE II. POWER ESTIMATES FOR THE DIFFERENT DESIGNS

  5. CONCLUSION

OFDM is a highly popular modulation scheme, which requires precise synchronisation at receiver to achieve high bit-rates. Multiplier-based cross correlators provide highest synchronisation accuracy, but are impractical for power constrained designs. We have presented a dynamic multiplier-less correlator structure that provides high levels of precision along with run-time adaptability based on receiver SNR, allowing the system to achieve optimal performance based on channel conditions. This structure can be dynamically adapted to different preamble styles as well as depths, making them ideal for next generation OFDM systems.

ACKNOWLEDGMENT

I express my sincere gratitude to thank the teaching staff in the Department of Electronics and Communication Engineering,MACE for their whole hearted cooperation and encouragement. I am indebted to my friends, family and my

Parents for their prayers, and thank them for their understanding, support and encouragement.

REFERENCES

  1. L. Hanzo and T. Keller, OFDM and MC-CDMA: A Primer. New York:Wiley, 2006.

  2. J.-J. Beek, M. Sandell, M. Isaksson, and P. O. Borjesson, Low- complex frame synchronization in OFDM systems, in Proceedings of International Conference on Universal Personal Communication, Nov. 1995, pp. 982986.

  3. N. Lashkarian and S. Kiaei, Class of cyclic-based estimators for frequency-offset estimation of OFDM systems, IEEE Transactions on Communication, vol. 48, no. 12, pp. 21392149, Dec. 2000.

  4. A. Fort, J.-W. Weijers, V. Derudder, W. Eberle, and A. Bourdoux, A performance and complexity comparison of auto-correlation and crosscorrelation for OFDM burst synchronization, in Proceedings of International Conference on Acoustic Speech Signal Processing, vol. 2. 2003, pp. 341344.

  5. Pham, Thinh H., Suhaib A. Fahmy, and Ian Vince McLoughlin. "Low-Power Correlation for IEEE 802.16 OFDM Synchronization on FPGA", IEEE Transactions on Very Large Scale Integration (VLSI) Systems,vol. 21, no. 8,pp. 1549-1553, 2013.

  6. K.-W. Yip, Y.-C. Wu, and T.-S. Ng, Design of multiplierless correlators for timing synchronization in IEEE 802.11a wireless LANs, IEEE Transactions on Consumer Electronics, vol. 49, no. 1, pp. 107114, Feb. 2003.

  7. T. Fusco and M. Tanda, ML-based symbol timing and frequency offset estimation for OFDM systems with noncircular transmissions, IEEE Transactions on Signal Processing, vol. 54, no. 9, pp. 35273541, ep. 2006.

  8. T. Schmidl and D. Cox, Robust frequency and timing synchronization for OFDM, IEEE Transactions on Communication, vol. 45, no. 12, pp. 16131621,Dec. 1997.

  9. T.-H. Kim and I.-C. Park, Low-power and high-accurate synchronization for IEEE 802.16d systems, IEEE Transactions on Very Large Scale Integrated (VLSI) Systems, vol. 16, no. 12, pp. 16201630, Dec. 2008.

  10. Dick, Chris, and F. Harris. "FPGA implementation of an OFDM PHY." Proceedings of Conference Record of the Thirty-Seventh Asilomar Conference on. Signals, Systems and Computers, Nov. 2003.

  11. K. Wang, J. Singh, and M. Faulkner, FPGA implementation of anOFDM-WLAN synchronizer, in Proceedings ofInternational Workshop Electronic Design, Test and Applications, Jan. 2004, pp. 8994.

Leave a Reply

Your email address will not be published. Required fields are marked *