A Pipelined Implementation of OFDM Transmitter and Receiver on Reconfigurable Platforms

1.Ch.Suresh, 2.K.Ramarao, 3.N.Vijay Shanker

1.Department of ECE, SRTIST, Ramananda Nagar, Nalgonda(dt),A.P,India.
2.Department of ECE, SRTIST, Ramananda Nagar, Nalgonda(dt),A.P,India.
3.Department of EIE, SRTIST, Ramananda Nagar, Nalgonda(dt),A.P,India.

Abstract

The new mobile technologies trying to give broadband over wireless channel allowing the user to have bandwidth connectivity even inside moving vehicle. The metropolitan broadband wireless networks require a non-line-of-sight (NLOS) capability, and the scheme Orthogonal Frequency Division Multiplexing (OFDM) becomes essential to overcome the effects of multipath fading. Orthogonal Frequency Division Multiplexing (OFDM) has become very popular, allowing high speed wireless communications. OFDM could be considered either a modulation or multiplexing technique, and its hierarchy corresponds to the physical and medium access layer.

The VHDL implementation allows the design to be extended for either FPGA or ASIC implementation, which suits more for the Software Defined Radio (SDR) design methodology. In this project the OFDM modulator and demodulator will be implemented with full digital techniques. VHDL will be used for RTL description and FPGA synthesis tools will be used for performance analysis of the proposed core. Modelsim Xilinx Edition will be used for functional simulation and verification of results. Xilinx ISE will be used for synthesis. The Xilinx’s chipscope tool will be used for verifying the results on Spartan 3E FPGA.

1. Introduction

Currently Global System for Mobile telecommunications (GSM) technology is being applied to fixed wireless phone systems in rural areas or Australia. However, GSM uses Time Division Multiple Access (TDMA), which has a high symbol rate leading to problems with multipath causing intersymbol interference. Several techniques are under consideration for the next generation of digital phone systems, with the aim of improving cell capacity, multipath immunity, and flexibility. These include Code Division Multiple Access (CDMA) and Coded Orthogonal Frequency Division Multiplexing (COFDM). Both these techniques could be applied to providing a fixed wireless system for rural areas. However, each technique has different properties, making it more suited for specific applications.

Orthogonal Frequency Division Multiplexing (OFDM) is a multicarrier transmission technique, which divides the available spectrum into many carriers, each one being modulated by a low rate data stream. OFDM is similar to FDMA in that the multiple user access is achieved by subdividing the available bandwidth into multiple channels, which are then allocated to users. However, OFDM uses the spectrum much more efficiently by spacing the channels much closer together. This is achieved by making all the carriers orthogonal to one another, preventing interference between the closely spaced carriers.

With the rapid growth of digital communication in recent years, the need for high-speed data transmission has been increased. The mobile telecommunications industry faces the problem of providing the technology that be able to support a variety of services ranging from voice communication with a bit rate of a few kbps to wireless multimedia in which bit rate up to 2 Mbps.

Since OFDM is carried out in the digital domain, there are several methods to implement the system. One of the methods to implement the system is using Field-Programmable Gate Array (FPGA). This hardware is programmable and the designer has full control over the actual design implementation without the need (and delay) for any physical IC fabrication facility. An FPGA combines the speed, power, and density attributes of an ASIC with the programmability of a general purpose processor will give advantages to
the OFDM system. An FPGA could be reprogrammed for new functions by a base station to meet future needs particularly when new design is going to fabricate into chip. This will be the best choice for OFDM implementation since it gives flexibility to the program design besides the low cost hardware component compared to others.

2. Principles of OFDM

OFDM is a special case of multi-carrier transmission, where a single data stream is transmitted over a number of lower rates sub-carriers. On classical frequency division multiplexing the total band is divided into N non overlapping frequency channels, while on OFDM the band is divided into a number of overlapping frequency channels but with orthogonal frequencies, the consequence is a better use of the available spectrum. Those orthogonal frequencies could be achieved by the IFFT.A basic OFDM system consists of a PSK or QAM modulator/demodulator, a serial to parallel / parallel to serial converter, and an IFFT/FFT module. The iterative nature of the FFT and its computational order makes OFDM ideal for a dedicated architecture outside or parallel to the main processor. Orthogonal Frequency Division Multiplexing (OFDM) is a multicarrier transmission technique, which divides the available spectrum into many carriers, each one being modulated by a low rate data stream. OFDM is similar to FDMA in that the multiple user access is achieved by subdividing the available bandwidth into multiple channels, which are then allocated to users. However, OFDM uses the spectrum much more efficiently by spacing the channels much closer together. This is achieved by making all the carriers orthogonal to one another, preventing interference between the closely spaced carriers.

Coded Orthogonal Frequency Division Multiplexing (COFDM) is the same as OFDM except that forward error correction is applied to the signal before transmission. This is to overcome errors in the transmission due to lost carriers from frequency selective fading, channel noise and other propagation effects. For this discussion the terms OFDM and COFDM are used interchangeably, as the main focus of this thesis is on OFDM, but it is assumed that any practical system will use forward error correction, thus would be COFDM. In FDMA each user is typically allocated a single channel, which is used to transmit all the user information. The bandwidth of each channel is typically 10 kHz-30 kHz for voice communications. However, the minimum required bandwidth for speech is only 3 kHz. The allocated bandwidth is made wider then the minimum amount required to prevent channels from interfering with one another. This extra bandwidth is to allow for signals from neighbouring channels to be filtered out, and to allow for any drift in the centre frequency of the transmitter or receiver. In a typical system up to 50% of the total spectrum is wasted due to the extra spacing between channels. This problem becomes worse as the channel bandwidth becomes narrower, and the frequency band increases.

Most digital phone systems use vocoders to compress the digitised speech. This allows for an increased system capacity due to a reduction in the bandwidth required for each user. Current vocoders require a data rate somewhere between 4-13kbps [13], with depending on the quality of the sound and the type used. Thus each user only requires a minimum bandwidth of somewhere between 2-7 kHz, using QPSK modulation. However, simple FDMA does not handle such narrow bandwidths very efficiently. TDMA partly overcomes this problem by using wider bandwidth channels, which are used by several users. Multiple users access the same channel by transmitting in their data in time slots. Thus, many low data rate users can be combined together to transmit in a single channel that has a bandwidth sufficient so that the spectrum can be used efficiently.

There are however, two main problems with TDMA. There is an overhead associated with the change over between users due to time slotting on the channel. A change over time must be allocated to allow for any tolerance in the start time of each user, due to propagation delay variations and synchronization errors. This limits the number of users that can be sent efficiently in each channel. In addition, the symbol rate of each channel is high (as the channel handles the information from multiple users) resulting in problems with multipath delay spread. OFDM overcomes most of the problems with both FDMA and TDMA. OFDM splits the available bandwidth into many narrow band channels (typically 100-8000).

The carriers for each channel are made orthogonal to one another, allowing them to be spaced very close together, with no overhead as in the FDMA example. Because of this there is no great need for users to be time multiplex as in TDMA, thus there is no overhead associated with switching between users. The orthogonality of the carriers means that each carrier has
an integer number of cycles over a symbol period. Due to this, the spectrum of each carrier has a null at the centre frequency of each of the other carriers in the system. This results in no interference between the carriers, allowing them to be spaced as close as theoretically possible. This overcomes the problem of overhead carrier spacing required in FDMA.

Each carrier in an OFDM signal has a very narrow bandwidth (i.e. 1 kHz), thus the resulting symbol rate is low. This results in the signal having a high tolerance to multipath delay spread, as the delay spread must be very long to cause significant inter-symbol interference (e.g. > 100 sec).

3. Implementation of OFDM Transceiver

3.1 OFDM transmitter consists of following units

3.1.1 Input Sampler

Generally input sampler will group the two bits coming from the input section. This section forming two bits called as the one symbol. The output of the input sampler is no. of symbols. This block samples the serial input and generates 2 bit IQ output. The output of the input sampler is group of symbols. Each symbol contains group of two bits.

3.1.2 Symbol Mapper

Input to the symbol mapper is group of symbols coming from the outputs of input sampler and these symbols are divided into two group of symbols called I and Q symbols (imaginary and quadrature). This block maps the input I, Q to the corresponding to the real part and imaginary part of the constellation symbols. Generally QPSK is used in the OFDM system it is acts as a modulator at the OFDM transmitter system. In QPSK four phases are possible. Each symbol having phase difference of 90 degrees for each phase and it is represented with group of two bits.

3.1.3 Serial Input Parallel Output (SIPO)

This block converts the serial input to the parallel output. This block is used in OFDM Transmitter to convert serial input to parallel output. This block’s output is given to the input of IFFT. The SIPO is used in OFDM system to convert 2 bit symbols in to 16 bit data. In this SIPO output contains 8 bits of real data and 8 bits of imaginary data. The 16 bit data is applied to the input of IFFT.

3.1.4 Inverse Fast Fourier Transform (IFFT)

It is the one of the important module in the OFDM system. To compute IFFT first exchange real and imaginary parts then perform FFT. After performing FFT then exchange real and imaginary terms then it is the IFFT and finally divide the by N. where N is the number of points, here we are using 8-point FFT so divide the results by 8. The output of the IFFT is applied to the Digital to analog converter is used to convert the digital data into OFDM signals and these symbols are transmitted through the transmitting system.

3.2 OFDM receiver consists of following units

3.2.1 Fast Fourier Transform (FFT)

It is the main module in the receiver section. In the receiver section OFDM symbols are applied to the Analog to Digital Converter which will convert serial data into parallel data and it is applied to the FFT section. In this module 8-point Decimation in frequency approach is used.

3.2.2 Parallel Input and Serial Output (PISO)

The registers are commonly attached to the output of microprocessors when more output pins are required than are available. This allows several binary devices to be controlled using only two or three pins - the devices in question are attached to the parallel outputs of the shift register, then the desired state of all those devices can be sent out of the microprocessor using a single serial connection. Similarly, PISO configurations are commonly used to add more binary inputs to a microprocessor than are available - each binary input (i.e. a switch or button, or more complicated circuitry designed to output high when active) is attached to a parallel input of the shift register, then the data is sent
back via serial to the microprocessor using several fewer lines than originally required.

3.2.3 Symbol Demapper

This block maps the Real and imaginary parts of the serial out from PISO, to the IQ corresponding to the real part and imaginary part of the constellation symbols. It extracts the IQ values from the serial out of PISO. To achieve maximum hardware efficiency, this constellation demapper is fully time division multiplexed. Backpressure is applied to the upstream module so that data is only provided at a maximum rate of one complex sample for every six clock cycles. If data is acquired every six clock cycles, the output bus fully conveys the soft decisions when operating in 64QAM modulation mode. In the 16QAM and QPSK modes, the output bus is not fully used. The constellation demapper multiplexes the IQ data onto a single bus, and passes the serial data stream on to the soft decision calculator where the appropriate metrics for the modulation scheme are calculated. These metrics determine the confidence in the polarity of each constellation bit. These metrics are time multiplexed onto a single bus, before they are passed on to a quantization module that reduces the bitwidth of the signal according to a quantization interval.

3.2.4 Out Bit Generator

This block takes 2 bit IQs from Symbol demapper and generates output bits. The following techniques are followed to implement this block 2 bit input(IQ Data) data is assigned to the one internal Shift Register (sig_IQ). Finally sig_IQ's msb is assigned to the output bit.

4. WIRELESS LAN IEEE 802.11A STANDARD

IEEE802.11A standard document specifies both the Medium Access Layer(MAC), as well as the Physical layer(PHY). The scope of this work is limited only to the PHY part, where in the next subsections the main blocks forming the baseband transmitter part will be discussed separately. The 802.11a standard is based on the OFDM technique of modulation, where several SCs, 52 in this standard, are used to spread the serial data stream on parallel slow rate streams. Table 1 shows that the 802.11a standard can achieve variable data rates, starting with 6 Mbps and ending with 54 Mbps. This variation is achieved through the combination of different configurations (coding rate and modulation scheme) are used to produce the desired data rate.

5. Standard Implementation Details

The used methodologies are based on the divide-and-conquer approach. Each block in the architecture was designed and tested separately, and later those blocks were assembled and extra modules were added to compose the complete system.

The design makes use of pipelining, and this was mainly achieved through duplicating of memory elements (registers and RAMs); that will buffer the incoming stream of bits while the previous stream is being processed. The design environment based on Xilinx integrated Software environment (ISE), and targeting the Xilinx Spartan-3E FPGAs.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Sampling frequency</td>
<td>20mhz</td>
</tr>
<tr>
<td>Number of Sub-carriers</td>
<td>52(48 data and 4 pilots)</td>
</tr>
<tr>
<td>OFDM symbol period</td>
<td>4 micro sec(80 samples)</td>
</tr>
<tr>
<td>Cyclic Prefix period</td>
<td>0.8 micro sec(16 samples)</td>
</tr>
<tr>
<td>Coding</td>
<td>1/2, 2/3, and 3/4</td>
</tr>
<tr>
<td>Modulation scheme</td>
<td>BPSK, QPSK, 16 &amp; 64-QAM</td>
</tr>
<tr>
<td>Data rate</td>
<td>6, 9, 12, 18, 24, 36, 48 and 54 Mbps</td>
</tr>
<tr>
<td>FFT processor</td>
<td>64 point</td>
</tr>
</tbody>
</table>

Table 1. OFDM parameters of IEEE802.11a standard

Next comes coding, which is divided into two sub stages. The final purpose of coding stage is to provide the receiver with the capability to detect and correct errors through redundancy. As a first step, the data stream is encoded using a convolutional encoder, which uses a no. of delay elements and a modulo-2 adder. The convolutional encoder composed from a number of D-type flip-flops that represent the delay elements, while the modulo-2 adder is constructed using an XOR gate. To improve the data ra, the second
coding sub-stage, puncturing, is used to reduce the no of redundant bits. The puncture implementation is based on masking the bit stream with even \(4/6\) or \(3/4\) masks, for a \(3/4\) or \(2/3\) coding rate, respectively. Specifically, the \(3/4\) mask reads each four bits together and masks the MSB, while the \(4/6\) mask reads a chunk of 6 bits and masks the bits numbered 3 and 4.

One of the widely used block interleaver designs is based on a RAM where the data is written in row order, and then read in column order. However, during the implementation it was found that this technique will consume a lot of multiplexers, as well as the need to have RAM’s with different sizes according to the interleaver size. The proposed design in this work is based on utilizing look-up tables; those were implemented as small read only memories (ROM). To perform IFFT, the interleaved bits are translated or mapped into two components the In-phase and the Quadrature (I and Q) components. Depending on the PHY mode and the data rate selected, the OFDM sub-carriers are modulated using BPSK, QPSK, 16-QAM or 64-QAM. The interleaved bits are grouped into chunks of \(Nbpsc(1,2,4\) or 6) bits, respectively. The grouped bits are used to address the specific ROM and to be mapped into the corresponding I/Q values as per the selected modulation scheme. The mapper is implemented with 4 ROMs, one for each modulation scheme. The ROMs contain the corresponding value of I and Q values, where the grouped bits are used to index the ROMs and obtain the corresponding I/Q pair. Moreover, the values stored in the ROMs are normalized with the normalization factor \(k\), as per each modulation scheme. This will reduce introducing 4 multipliers to normalize the resulting I/Q pair [2]. The representation of these I and Q value is based on a fixed point representation with 16 bits width and a 12 bits fractional part. Figure 2 illustrates the mapper internal architecture.

The mapper produces only 48 data sub-carriers, whereas the IFFT stage requires 64 pairs of I and Q. The pilot sub-carriers (4 pilots) are added at specific locations, to assist the receiver in performing channel estimation and frame detection. Those pilots are put in sub-carriers -21, 7, 21, and 21, and are based on BPSK modulation. This was implemented as a block used to store the generated I/Q pairs by the mapper, and also to add both the zero and pilot sub-carriers. To achieve that, two single-port distributed RAMs. To achieve that, two single port distributed RAMs with a size of \((64*16)\) bit were used to store the I/Q pairs in their corresponding locations, and to provide pipelining capability at this stage as well.

Next in the chain comes the IFFT block. This block was the only main block not designed in VHDL by the author, the Intellectual property (IP) core available in the XILINX development environment (ISE) was utilized [7]. The generated IP was implemented to run in a pipelining streaming I/O mode for continuous processing of the arriving data instead on working on the whole symbol samples all at once. This capability came from the pipelining provided by the previous and next stages, where each generated I/Q pair in the I/Q bank is fed to the IFFT processor. Next, after the required number of cycles by the IFFT block, the generated real and imaginary pairs are forwarded to the CP block. The last 16 samples of the generated OFDM symbol are copied into the beginning to form the cyclic prefix. In the 802.11 a standard, the last 16 samples of the IFFT output are are replicated at the beginning to form an 80 sample complete OFDM symbol. These 16 samples correspond to a 0.8us period, which is considered as the maximum delay in the multipath environment. This stage was implemented by using 2 single port distributed RAMs. The first is of a size \((16*16)\) bits, which is used to store the cyclic prefix samples. The other RAMs is of a size \((64*16)\) bits, and it is used to hold the OFDM symbol samples. Again, another copy of each RAM (CP and symbol RAMs) is added to provide pipelining.

6. Implementation Results

The work presented in this paper aimed to demonstrate the capability of a straight forward translation of a wireless communication standard into a pure VHDL implementation, and therefore to be implemented on a reconfigurable platform. The approach of divide and conquer was used to design and test each entity alone and later combine the complete system. The work has accomplished the task of designing the digital baseband part of an OFDM transmitter that conforms to the IEEE802.11a standard. The puncturing has caused difficulties in timing of the following stages, which limited the design not to work perfectly at the other data rates. As it was stated in the implementation details section, the design is based on two main processes. The first is bit manipulation in coding and interleaving stage, while the second process is mapping through utilizing both look-up tables and memory. Therefore, the most resources consumed by the design will be the pure LUTs available on the FPGA and the memory elements as well. However, the most complex part that requires digital signal processing function, the IFFT block, was designed using the available IP core.
The design was targeted to be mapped on XILINX FPGA, and two FPGAs were selected Xilinx Virtex-II pro (XC2VP30-7ff896) and Xilinx Spartan-3E (XC3S500E). The selection was based on the availability of the IFFT and DCM cores in both FPGAs, and also demonstrate the possibility and to compare the mapping on both chips. Table 2 presents resources required by the IEEE802.11a standard transmitter after mapping it on the XC2VP30 chip. The table lists the number of resources utilized and the percentage of the used/available resources on the selected FPGA.

### 7. Conclusion

The capability of designing and implementing an OFDM system was presented in this work. The design considered using a pure VHDL with the aid of available IPs to implement the IFFT and clock synthesis functions. From the mapping results the design can easily fit into Xilinx FPGA Spartan-3E with an occupation percentage around 25%. However mapping results showed that smaller FPGAs.

In addition, the results were compared to previous work and showed improved results in certain resources. The presented design has been able to conform only to the mandatory data rates (6, 12 and 24 Mbps). As a future work, the problem encountered by the puncturing, which introduced timing offset, is targeted to be solved to meet the rest of proposed data rates.

<table>
<thead>
<tr>
<th>RESOURCES</th>
<th>UTILIZED</th>
<th>AVAILABLE</th>
<th>%</th>
</tr>
</thead>
<tbody>
<tr>
<td>Slices</td>
<td>3839</td>
<td>4656</td>
<td>82%</td>
</tr>
<tr>
<td>Slices flip-flops</td>
<td>241</td>
<td>9312</td>
<td>2%</td>
</tr>
<tr>
<td>Number of LUTs</td>
<td>7026</td>
<td>9312</td>
<td>75%</td>
</tr>
<tr>
<td>MULT 18*18</td>
<td>16</td>
<td>20</td>
<td>80%</td>
</tr>
<tr>
<td>Bonded IOBs</td>
<td>3</td>
<td>232</td>
<td>1%</td>
</tr>
<tr>
<td>GCLKs</td>
<td>2</td>
<td>24</td>
<td>8%</td>
</tr>
</tbody>
</table>

Table 2. Complete system Resource Utilization.

### 8. References
