FPGA implementation of MB-OFDM with parallel architecture

Download Full-Text PDF Cite this Publication

Text Only Version

FPGA implementation of MB-OFDM with parallel architecture


M.E.(Applied electronics) Arunai engineering college Tiruvannamalai



MULTI-BAND orthogonal frequency-division multiplexing is one of ultra wideband radio standards, which provides high-speed connectivity in a wireless personal area network and needs to process large amount of computations in short time for support of high data rates. In order to satisfy the performance requirement while reducing power consumption, a multi-way parallel architecture based on bi-orthogonal encoder is proposed. The several novel optimization techniques for resource efficient implementation of the baseband modem which has highly, i.e., 8-way, parallel architecture, such as new processing structures for a (de)interleaver and a packet synchronizer was introduced this project. The designed OFDM system is integrated with MIMO Architecture for high data rate applications. This designed system can be able to detect and correct random as well as burst errors.

Index TermsBaseband modem, multi-band orthogonal frequency-division multiplexing (MB-OFDM), parallel architecture, resource optimization, ultra wideband (UWB),multi input multi output(MIMO).


Orthogonal Frequency Division Multiplexing (OFDM) is a multi-carrier digital modulation technique that has been recognized as an excellent method for high speed bi- directional wireless data communication. OFDM effectively squeezes multiple modulated carriers tightly together, reducing the required bandwidth but keeping the modulated signals orthogonal so they do not interfere with each other. OFDM is similar to FDM but much more spectrally efficient by spacing the sub-channels much closer together (until they are actually overlapping). This is done by finding frequencies that are orthogonal, which means that they are perpendicular in a mathematical sense, allowing the spectrum of each sub- channel to overlap another without interfering with it. In Figure the effect of this is seen, as the required bandwidth is greatly reduced by removing guard bands (which are present in FDM) and allowing signals to overlap.

Fig: Spectrum overlaps in OFDM

MULTI-BAND orthogonal frequency-division multiplexing (MB-OFDM) is one of ultra wideband (UWB) radio standards, which provides high-speed connectivity in a wireless personal area network (PAN) with specification of the data rates from 53.3 to 480 Mbps. Due to the high data rates, the MB-OFDM standard requires to process large amount of computations in very short time; its modem has to compute one symbol that consists of 165 complex numbers in every 312.5 ns. Even though its performance requirement results in large hardware complexity, a low power design with small chip size is absolutely essential for applying this technology to portable handheld devices. Also, an operating frequency of a circuit is one of the dominant factors that determine power consumption.

In MB-OFDM, the standard specification defines a sampling frequency of 528 MHz Such high frequency is problematic when we use it as a system clock speed; it consumes too much power and it is hard to implement due to timing constraints. Therefore, parallel architectures have been proposed in an effort to reduce power consumption as well as to relax timing constraints. Exploiting parallelism with -way parallel architecture enables to keep throughput constraint at – times lower clock speeds, whereas it may increase the hardware resources by a factor of (n). Despite of the increased



hardware resources, it is possible to reduce power consumption as well as to relax timing constraints due to two reasons. First, -way parallel architecture compensates for – times longer gate delays. Therefore, the parallel hardware can operate at reduced supply voltages and consequently consume less power. However, supply voltage scaling is beyond this papers scope: our work focused on high level resource optimization. Second, a resource efficient design, on which this paper focuses, is able to avoid the linear, i.e., -times, resource increments.

It is possible to share hardware resources among independent parallel data-paths. For example, a packet synchronizer with the cross correlation scheme requires a single set of shift registers which holds only one OFDM symbol. Four parallel data-paths can share an output of a single coefficient generator at cost of negligible performance loss in a carrier frequency offset (CFO) compensation unit. However, the topic of this paper is resource efficient designing without hurting the overall system performance at all.

The contribution of this paper is to present resource efficient (gate count reduction) implementation techniques for the highly parallel MB-OFDM baseband modem with low power consumption. We used the 8-way parallel architecture in order to use 8-times lower clock frequency for saving power consumption and demonstrating our proposal on the field-programmable gate-array (FPGA)-based prototyped system of bio-orthogonal convolution encoding technique (BOCE). This paper is the first presentation about an 8-way parallel architecture in MB-OFDM baseband modem design which is optimized by new processing structures and algorithm reconstruction. While several 4-way parallel architectures have been already introduced we believe that more highly parallel systems are desirable to satisfy strong demand of battery-long operation of mobile devices. The previous literature presented only one resource optimization technique which sacrifices the overall system performance although the degradation is negligible.

The proposed system of this project include that Encoding based on Bi-orthogonal Encoder,Multi user transmission Scheme, Speed is High compared with existing system,MIMO Architecture based design and It detects and corrects both random and burst errors.

MB-OFDM PHY baseband modem that supports both transmission (TX) and reception (RX) according the MB- OFDM protocol in the standard. The baseband modem is composed of various components which process the incoming data and then deliver the processing results to each following component in a streaming fashion.


    1. OFDM Transmitter Section

      Fig : OFDM Transmitter section

    2. OFDM Receiver Section

      Fig: OFDM Receiver section

      Proposed biorthogonal interleaved OFDM system is used in Multiuser and multicarrier technique that has been recognized as an excellent method for high speed bi directional wireless mobile communication. In conventional interleaved OFDM system, convolution encoder is used as the channel encoder, but it leads to Bandwidth inefficiency and also reduces the throughput of the transmission and reception. This system is ultimately designed for the Bandwidth optimization and also it supports the Multi user transmission and reception of interleaved OFDM system.

      UWB band (3432, 3960, and 4488 MHz) radio frequency (RF) signals are up/down-converted from/to baseband analog signals through RF/analog circuits.

      And the analog signals are converted from/to digital signals by DAC and ADC at the sampling frequency of 528 MHz The DAC and ADC drivers, which are interface logics for the converters, are basically parallel-to-serial and serial- to-parallel data converters between 66 and 528 MHz clock domains

      The puncturer omits some of coded bits in order to support different code rates with one Bi-orthogonal



      convolution encoder. The depuncturer inserts dummy bits for the omitted bits.

      Qadrature amplitude modulation (QAM) is a modulation scheme which conveys data by changing (modulating) the amplitude of two carrier waves. These two waves, usually sinusoids, are out of phase with each other by 90° and are thus called quadrature carriers. Like all modulation schemes, QAM conveys data by changing some aspect of a carrier signal, or the carrier wave, (usually a sinusoid) in response to a data signal. In the case of QAM, the amplitude of two waves, 90 degrees out-of-phase with each other (in quadrature) are changed (modulated or keyed) to represent the data signal.

      Phase modulation (analog PM) and phase-shift keying (digital PSK) can be regarded as a special case of QAM, where the amplitude of the modulating signal is constant, with only the phase varying. This can also be extended to frequency modulation (FM) and frequency-shift keying (FSK), for these can be regarded a special case of phase modulation.

      As with many digital modulation schemes, the constellation diagram is a useful representation. In QAM, the constellation points are usually arranged in a square grid with equal vertical and horizontal spacing, although other configurations are possible (e.g. Cross-QAM). Since in digital telecommunications the data is usually binary, the number of points in the grid is usually a power of 2 (2, 4, 8 ). Since QAM is usually square, some of these are rarethe most common forms are 16-QAM, 64-QAM, 128-QAM and 256- QAM.

      By moving to a higher-order constellation, it is possible to transmit more bits per symbol. However, if the mean energy of the constellation is to remain the same (by way of making a fair comparison), the points must be closer together and are thus more susceptible to noise and other corruption; this results in a higher bit error rate and so higher- order QAM can deliver more data less reliably than lower- order QAM, for constant mean constellation energy.

      If data-rates beyond those offered by 8-PSK are required, it is more usual to move to QAM since it achieves a greater distance between adjacent points in the I-Q plane by distributing the points more evenly.

      The complicating factor is that the points are no longer all the same amplitude and so the demodulator must now correctly detect phase and amplitude, rather than just phase.64-QAM and 256-QAM are often used in digital cable television and cable modem applications. In the US, 64-QAM and 256-QAM are the mandated modulation schemes for digital cable (see QAM tuner) as standardized by the SCTE in the standard ANSI/SCTE 07 2000. Note that many marketing people will refer to these as QAM-64 and QAM-256. In the UK, 16-QAM and 64-QAM are currently used for digital terrestrial television (Free view and Top Up TV).

      For the resource efficient implementation, our highly parallel baseband modem was designed with the following novel optimization techniques: 1) in a (de)interleaver which is based on inter-cell networking, an efficient asymmetric cell structure reduces the resource usage from a symmetric structure by abating multiplexing costs for the networking. 2) in a packet synchronizer, a small amount of shared pre- computation among multiple data paths allows the data paths to eliminate about a half of their computations without significant input-multiplexing costs that offset the benefit of the add elimination, thus reduces resource usage from a conventional parallel implementation. 3) in a carrier frequency offset compensator which involves inter-tracking compensation, algorithm reconstruction enables sharing a single set of complex multipliers for both offset tracking and compensation without increasing the processing latency and buffer memory and therefore this technique reduces the resource usage from an implementation with a conventional inter-tracking compensation.


      Interleaving is a form of time diversity that mitigates the effects of error bursts over the radio fading channels.

      1.3.1. ERRORS

      • Random Errors: The bit errors are independent of each other.-Random errors can be corrected by Repetition Coder.

      • Burst Errors: The bit errors occur sequentially in time and as groups-Burst errors can be corrected by Interleaving Techniques.

      2.3.2 Bit interleaving

      1. A technique called bit (or binary digit) interleaving keeps track of the number and sequence of the bits from each specific transmission so that they can be quickly and efficiently reassembled into their original form upon receipt.

      2. Interleaving is mainly used in digital data transmission technology, to protect the transmission against burst errors.




      our system: one (1/3) is for both interleaving and deinterleaving while the others (2/3) are for deinterleaving- only.

      Fig: without interleaving


      Fig: with interleaving

      Conventional interleaver systems perform three sub- processes step-by-step: symbol interleaving, tone interleaving, and cyclic shift. And their implementation requires dedicated memories for each step for bit permutation. Consequently this approach costs much chip resource for such storages between sub-processes and tends to have long latency for a series of the sub-processes. In order to resolve the problem, a new novel interleaving method based on mixed radix system (MRS) had been developed. By applying MRS on interleaving processes, a powerful interleaver architecture was derived to perform all the three sub-processes concurrently. Its structure is a 2-D array of simple cells and each cell consists of two flip-flops with multiplexing logics. Also, the proposed design allows us to use the same architecture for both the interleaver and the deinterleaver and supports perfect modular design for multiple data rates. The size of the array is quite compact compared to the required memories for the conventional interleaver: 40.3% smaller. Our demapper makes a 3-bit soft decision from received subcarrier(s) to recover originally mapped each single bit. Therefore, we have three (de)interleaver paths in

      It is possible to further optimize by combining the proposed interleaver architecture with constellation mapping processes. MB-OFDM defines two constellation mapping schemes: QPSK and DCM modulations. The QPSK spreads data into several subcarriers and the DCM requires data reordering. The spreading and reordering processes involve non-trivial amount of buffer storages and also latency. Conventionally those processes are done as separate phases: interleaving first and then spreading or reordering. But, we can unify the spreading and the (inverse)-reordering with the (de)interleaving process. With the proposed interleaver architecture, we can perform the spreading before the interleaving process by fully utilizing array cells of our interleaver. The DCM (inverse)-reordering pattern can be combined into the (de)interleaving process so that the reordering is done in parallel with the interleaving process. This way removes the additional buffer storages as well as latency for the spreading and the (inverse)-reordering. Since DCM-demapped bit streams are inverse-reordered, which is more storage demanding than the spreading, in a group of 100 soft decision bits basis, the storage reduction is 300-bits: 100 soft decision bits 3-bits per soft decision bit.

      However, to satisfy throughput demands of its output consuming units, the interleaver needs to be implemented with a highly unfolded inter-cell network: 10 and 20 times unfolding for the QPSK and DCM modulations, respectively. A serial implementation requires a few cells to have wide input multiplexers for changing inter-cell connections according to various interleaving parameters determined by data rates. In contrast, such cells are dominant with the highly unfolded inter-cell network. The number of input ports of each cell-input multiplexer increases from 2 to around 4, in case of the unified (de)interleaver, due to 10/20 times unfolding. asically, the cell has a symmetric structure. With this FF0) is moved to a horizontally connected cell for storing inputs to be interleaved (input-phase).

      After a block of inputs is stored, in turn, the stored datum is moved in a vertical direction (output-phase). At the same time, another flip-flop (e.g., FF1) takes the input storing process for the next incoming input block in a horizontal direction. Deinterleaving can be done simply by moving data in the opposite directions.



      1. Symmetric structure

        Fig: Symmetric structure

      2. Asymmetric structure

        Fig: Asymmetric structure

        Fig: Cell structures for the MRS-based (de)interleaver. (a) Symmetric structure.

        (b) Asymmetric structure.

        To reduce the multiplexing costs, we propose an asymmetric cell structure. With the proposed cell structure, instead of changing inter-cell data-moving directions between input and output-phases, one flip-flip moves its datum to another flip- flip, i.e., intra-cell move, once at the beginning of output phase. Because inter-cell data-moving direction is now fixed, this way eliminates cell-output multiplexers. In fact, intercell connections are consistent across all interleaving parameters in input-phase of interleaving and output-phase of deinterleaving. Therefore, one flip-flop (FF0) needs just a two-input multiplexer: one input for inter-cell move and another for intracell move. In addition, another flip-flop (FF1) of deinterleaving only

        cells does not need multiplexing for intra-cell move; two third cells (type B in Table I) take advantage of no intra-cell move because there are two deinterleaving-only paths out of three paths with the 3-bit soft decision.

        scheme correlates a received signal with a known preamble pattern. The auto-correlation scheme is more advantageous than the cross correlation scheme in terms of hardware complexity. Unfortunately, the auto-correlation scheme is not suitable for packet synchronization in MB-OFDM systems . Adjacent OFDM symbols including preamble are transmitted in different frequency bands due to frequency hopping, while their bands can be identified after packet synchronization . In addition, correlation with clean reference sequence instead of the noisy received samples can exhibit better performance especially at low signal-to-noise ratio (SNR) . Consequently, the cross-correlation scheme has been preferred in MB-OFDM systems . To alleviate the high implementation cost of the cross-correlation scheme, 1-bit (sign) reference sequence has been used. It was reported that using that reference incurs just 0.778 dB loss in the cross correlation results compared to a full precision. We adopted this method and further optimized it in order to implement a more resource efficient packet synchronizer. Our synchronizer detects a preamble whenever a cross correlation result with a known preamble sequence (a reference which consists of 128 real numbers) is greater than a certain level of received signal power.

            1. Wide-input multiplexer based design

              Fig: Wide-input multiplexer based design


There are two classifications of packet synchronization methods by correlation schemes: auto- correlation and cross correlation. The auto-correlation scheme carries out correlation between received signals which have certain time distance to each other, while the cross correlation



      1. Shared pre-adder-based design.

        Fig: Shared pre-adder-based design.

        Fig: Correlator designs for a packet synchronizer. (a) Wide-input multiplexer based design.(b) Shared pre-adder-based design.

        Fig: Operation sequence of the CFO compensation algorithm

        The numbers in rectangular boxes are indices of synchronization symbols and the circled numbers are sequence of the operation steps. The dotted arrow lines indicate that the compensations are incrementally refined by the previous phase tracking results.

        To support more preambles, the wide-input multiplexer based design needs to extend the number of input ports. In contrast, due to the regular structure, the correlator with shared pre-adders is able to support any sequence of preambles by just changing selection signals of the multiplexers. Therefore, it is easily reconfigurable and extensible: e.g., it is possible to share the correlator with other protocol processing. However, introduces an optimization method which is dedicated to preamble sequences defined in the MB-OFDM standard. Because MB-OFDM preamble sequences are generated in a certain hierarchical rule, a correlator can be also implemented in a hierarchical structure which is less complex compared to a flat structure.


RF signal is transmitted on a carrier frequency of 3432, 3960, and 4488 MHz. In the high carrier frequencies, carrier frequency offset (CFO) compensation is crucial for the receiver performance. We compensated for CFO by iteratively tracking phase errors of four synchronization symbols at time domain. The synchronization symbols 2, 8, and 20 are selected for the phase tracking while synchronization symbol 0 is used as their reference to estimate phase differences from it. Since intervals between these symbols are different in an increasing order, the compensation is done by multi-level tracking: from coarse to fine tracking. In this way, we compensated for CFO in 1 ppm resolution against 40 ppm offset.

Both the phase tracking and the compensation require complex multiplications; the phase tracking multiplies input symbols with conjugates of the reference symbol and the compensation multiplies input symbols with offset compensation coefficients. Due to the nature of our iterative incremental method, the compensation has to be processed prior to the phase tracking except for the first tracking: this compensation will be referred to as inter-tracking compensation. This is because our algorithm attempts to improve compensation accuracy by estimating errors caused by the previous inaccurate tracking. There are several approaches which eliminate the inter-tracking compensation for low complex implementations. Those approaches use a coarse tracking result for estimating the integer part of CFO while the fractional part is estimated by a fine tracking result. This allows simple combination of the coarse/fine tracking results. But, the fractional part of the coarse tracking result is lost. Instead, in order to preserve all coarse/fine tracking results, we present an algorithm reconstruction approach which alleviates the inter-tracking compensation cost.

The fast Fourier transform (FFT) module is shared

for both TX and RX and it is a pipelined 128-point complex FFT which provides throughput of 8 samples/cycle. The FFT module consists of four stages: first three stages employ two radix-4 butterfly units in each stage and the last stage employs four radix-2 units. In front of each stage, a data reordering unit provides 8 samples to the butterfly units in every cycle; it was implemented by extending a reordering unit proposed for a FFT with throughput of four samples/ cycle.

The subcarrier (de)mappers are in charge of mapping a complex number to a corresponding subcarrier and mapping in a reverse direction. Prior to the demapper, the sampling frequency offset (SFO) compensator compensates for a sampling frequency offset with respect to the packet TX side and a channel equalizer mitigates signal distortions caused by each subcarrier channel.

The constellation mapper converts coded bits into complex numbers according to rules of quadrature phase-shift keying (QPSK) and dual-carrier modulation (DCM) which is a variant of 16-quadrature amplitude modulation (QAM).



The constellation demapper recovers coded bits

from complex numbers as a soft decision bit. We used a 3-bit soft decision bit form that has eight decision levels.

Our system used a convolutional encoder with code rate 1/3 and constraint length 7. To decode the convolutional codes at 8-bits per cycle, we implemented a four-stage radix-4 Viterbi decoder by extending the two-stage radix-4 decoder proposed in [10], where the trace-back length is 48.







    • OFDM reduces the amount of cross talk in signal transmissions.

    • Used for High Speed applications, because OFDM works by splitting the radio signal into multiple smaller sub- signals that they are transmitted simultaneously at different frequencies to the receiver.

    • Robust against narrow-band co-channel interference.

    • Efficient implementation using FFT.


    • Wireless Communication system

    • Cellular Mobile Communications.

    • Navigation Systems

    • Satellite uplink and downlink signal transmission

    • Wireless cellular mobile Phones

    • IP TV (Internet Protocol Television) it is the modulation technique used for digital TV.

    • Asynchronous Digital Subscriber Line (ADSL) systems.


    The Paper focused on the OFDM system with 4 parallel sub channels in transmitter section. The designed transmitter section receives 4 bits as inputs and produces multi bits modulated format based on bi-orthogonal encoder. The proposing system will also be integrate with Multi input Multi Output System in order to increase the transmission and reception rate and to reduce the error rate to the optimum level. The paper also aims to detect and correct the random as well as burst errors from the received sequences. The MB-OFDM system designed using Verilog HDL and synthesized using Xilinx Software.


  1. Seok Joong Hwang, Youngsun Han, Seon Wook Kim, Jongsun Park, and Byung Gueon Min, IEEE TRANS. Resource Efficient Implementation of Low Power MB-OFDM PHY Baseband Modem With Highly Parallel Architecture , VOL. 20, NO. 7, JULY 2012

  2. Youngsun Han, Peter Harliman, Seon Wook Kim, Jong-Kook Kim, and Chulwoo Kim,IEEE TRANS. A Novel Architecture for Block Interleaving Algorithm in MB-OFDM Using Mixed Radix System, VOL. 18, NO. 6, JUNE 2010

  3. Taewon Hwang, Chenyang Yang, Gang Wu, , Shaoqian Li, and Geoffrey Ye Li, IEEE TRANS. Low-Power VLSI Implementation of the Inner Receiver for OFDM-Based WLAN Systems , VOL. 58,

    NO. 4, MAY 2009

  4. Alfonso Troya,, Koushik Maharatna, Milos Krstic, Eckhard Grass, Ulrich Jagdhold, and Rolf Kraemer,IEEE TRAS. OFDM and Its Wireless Applications: A Survey, VOL. 55, NO. 2, MARCH 2008

  5. Cheol-Ho SHIN, Sangsung CHOI, Hanho LEE, and Jeong-Ki PACK, IEICE TRANS. COMMUN., A Design and Performance of 4-Parallel MB-OFDM UWB Receiver, VOL.E90B, NO.3 MARCH 2007

[6]. Chao Cheng, Keshab K. Parhi, Fellow,IEEE TRANS. High- Throughput VLSI Architecture

for FFT Computation , VOL. 54, NO. 10, OCTOBER 2007

[7]. Jyh-Ting Lai, An-Yeu Wu, and Wen-Chiang Chen, IEEE TRANS. A Systematic Design Approach to the Band-Tracking Packet Detector in OFDM-Based Ultra wideband Systems, VOL. 56, NO. 6, NOVEMBER 2007

  1. A. M. Tonello, Space-time bit-interleaved coded modulation with an iterative decoding strategy, in Proc. IEEE VTC 2000-Fall, Boston, pp. 2428, Sept. 2000.

  2. X. Li and J. A. Ritcey, Bit-interleaved coded modulation with iterative decoding, in Proc. International Conference on Communications (ICC), pp. 858863, June 1999.

  3. S. ten Brink, J. Speidel, and R. H. Yan, Iterative demapping for QPSK modulation, Electron. Lett., vol. 34, no. 15, pp. 14591460, July 1998.

  4. X. Li and J. A. Ritcey, Bit-interleaved coded modulation with iterative decoding, IEEE Commun. Lett., vol. 1, no. 6, pp. 169171, Nov. 1997.

[12].Cheol-Ho Shin,Sangsung Choi,Hanho Lee,Jeong-Ki Pack,A High-Speed Receiver Architecture for MB-OFDM UWB Communications

[13] G. Caire, G. Taricco, and E. Biglieri, Bit-interleaved coded modulation,



Leave a Reply

Your email address will not be published. Required fields are marked *