Area Efficient And Low Power RAKE Receiver And Channel Estimator For DS-UWB In FPGA

DOI : 10.17577/IJERTV1IS6470

Download Full-Text PDF Cite this Publication

Text Only Version

Area Efficient And Low Power RAKE Receiver And Channel Estimator For DS-UWB In FPGA

Area Efficient And Low Power RAKE Receiver And Channel Estimator For DS-UWB In FPGA

Dhanasree Jillella#1

Nagaraju Ravada*2


M.Tech. Student

Assistant Professor

Associate professor

#Dept of Electronics and Communication Engineering,

Sasi Institute Of Technology & Engineering, JNTU Kakinada, India.


In this paper, we propose and implement an efficient architecture for a chip-spaced Direct Sequence Ultra-Wideband (DS-UWB) RAKE receiver subsystem that consists of four parts: the Channel Estimator (CE), the Selection Subsystem (hybrid Partial/Selective – HPSS), the RAKE Receiver (RR) and the RAKE Control (RC).The proposed algorithm running in the HPS subsystem combines the benefits of both SRake and PRake methods in order to further reduce its complexity. An energy efficient carry select adder in which RCA is replaced with BEC is used in finger implementation of the sub system. The adders are replaced with BEC logic instead of RCA. The whole DS-UWB RAKE system is implemented by the use of VHDL language techniques, and it is fully synthesizable, targeting a platform that employs a Xilinx Virtex-4 FPGA. The structure of our design is highly parallel and modular is optimized for high performance and achieves a clock frequency of over 200 MHz in order to operate at the desired chip rate.

Keywords Channel Estimator, CSLA, Finger, Field programmable Gate Array (FPGA), RAKE.

1. Introduction

:direct-sequence spread spectrum UWB(DS-UWB) systems which we consider in this work, and multiband orthogonal frequency-division multiplexing UWB (MB-OFDM UWB) systems.

The main characteristic of a UWB system is its wide bandwidth(in the order of several GHz),which leads to highly frequency selective channels and received signals composed of a significant number of a resolvable multipath components with different delays in the order of nanosecond. A DS_SS UWB system with a RAKE receiver can exploit multipath diversity by constructive summation of the desired signal energy which is dispersed over the various multipath components, helping to mitigate fading and thus improving performance. However, the low energy of the resulting paths combined with the high resolvability; result in a RAKE receiver that must employ a large number of multipath components in order to optimize the received SNR. Previous studies showed that a RAKE receiver operating in a typical modern office building requires about 50 different RAKE fingers to capture a sufficient amount of the total energy of the received signal. This fact poses significant challenges in the design and implementation of a RAKE receiver aiming to achieve a high performance gain in a low complexity and power efficient structure.

The most common methods proposed towards this aim are the Selective RAKE (SRake) and Partial RAKE (PRake) schemes. The first one combines the strongest multipath components among the

OT available at the receiver input using MRC scheme. Despite the reduction in the number of RAKE fingers the selection procedure requires efficient channel estimation in order to keep track of the value of all multipath components at each time instant. The second method combines the first arriving multipath components using MRC and it is a less complex solution, as it does not have to carry

out to any selection among the multipath components. It is clear, that SRake performs better than PRake since the latter combines paths that may not contribute to increasing the collected energy. However, in case where the stronger multipath components are located in the beginning of the channel impulse response the performance gap decreases.

So far, several other complex types of RAKE architectures have been proposed for UWB systems. Comparison between PRake and SRake for pulse position modulation showed that the simpler one, PRake is almost as good as SRake with a small number of fingers in a Nakagami fading channel. Fractionally-spaced (FS) RAKE receivers for single user DS UWB systems employing Gaussian monocycles have been studied in. It is shown there that the FS RAKE receiver outperforms chip-and symbol-spaced RAKE receivers at the cost of higher complexity, since it can compensate better for channel distortion. Although, combined RAKE- equalization techniques have been examined in order to alleviate intersymbol interference (ISI), it was shown that in a UWB system of a single user link, the performance limiting factor in the SNR range of interest is energy capture rather than ISI .in, new algorithms for finger assignment are developed which use different selection criteria for assigning the RAKE fingers to reduce the effect of pulse shaping and pulse position modulation in the RAKE receiver performance.


Area and power have major role in the designing of integrated circuit because of the increase in popularity of portable systems as well as the rapid growth of power density in VLSI circuits. Addition usually influences strongly on the overall performance of digital systems and a crucial arithmetic function.

Carry Select Adder (CSLA) is one of the fastest adders used in many data-processing processors to perform fast arithmetic functions. From the structure of the CSLA, it is clear that there is scope for reducing the area and power consumption in the CSLA.

The CSLA is used in many computational systems to alleviate the problem of carry propagation delay by independently generating multiple carries and then select a carry to generate the sum. The basic idea of this work is to use Binary to Excess-1 Converter (BEC) instead of RCA with cin=1 in the regular CSLA to achieve lower area and power consumption.

In this paper, we propose and implemented an area efficient low power rake receiver and channel

estimator for DS-UWB.the proposed algorithm running in the HPS subsystem combines the benefits of both SRake and PRake methods in order to further reduce its complexity and binary to excess-1 converter is used to achieve lower area and power consumption.

The remainder of the paper is organized as follows: In the Section 2 the transmission model of our system is analyzed, in Section 3 we describe the architecture of the proposed DSUWB RAKE, Section 4 describes the design optimization techniques for further decreasing the complexity, power meanwhile increasing the performance and in Section 5 we present numerical results on the complexity and hardware utilization of our implementation. Finally, conclusions are given in Section 6.

  1. The Complete Transceiver Model

    In this section the system model of the DS-UWB system (figure 1) is presented in order to better understand the challenges of the architecture of the proposed system.

    1. Transmitter

      The information bits to be transmitted are generated randomly by a binary source at a symbol rate 1/ bits/sec, where Tb is the duration of a BPSK symbol. The binary sequence (at point A) is represented by a vector b. After BPSK modulation it becomes d.(at point B) and it is spread by a PN sequence pn composed of ±1s.after spreading, d is transformed to a vector c, which is generated at a chip rate of 1/ = / chips/sec (at point C).the spread spectrum processing gain is / . After the pulse shaper filter with impulse response ( ) we get a signal (at point D).

      Figure 1.DS-UWB Transmission Model

    2. Multipath Channel

      The signal s(t) passesthrough a multipath channel. In this paper we use the high frequency channel model that has been established by the IEEE 802.15.3a standardization group for the evaluation of the performance of different physical layers for high data rate UWB systems. This model is based on a modified propagation which adopts a cluster- based approach for the multipaths arriving at the receiver. The cluster arrival time and the ray arrival time within the each cluster are modelled as a Poisson process whereas the coefficients are assumed log-normally distributed. IEEE 802.15.3a group has suggested four sets of parameters to fit measurement data by considering four channel models (CM) representing different environmental scenario: CM1(0-4m,Lineofsight(LOS),CM2(0- 4m,non-LOS(NLOS)), CM3(4-10m,NLOS) and

      CM4(extreme NLOS).

    3. Receiver

      After passing through the channel, the multipath affected received signal r(t) can be expressed (at point E).The received signal goes through a pulse matched filter (at point F) and after sampling with chip rate 1/ ,we get the discrete time signal (at point G).the signal inputs the RAKE receiver subsystem, where it is delayed by each finger z at chip intervals, it is de-spread and then the selected

      PS taps are combined to the MRC method.

      Where PS is the number of RAKE fingers assigned to the resolvable multipath rays which are selected by the HPS algorithm and the coefficients produced by the channel estimator. Throughout this analysis we assumed perfect timing synchronization.

      Figure 2.Blockdiagram of the proposed system architecture

  2. The System Architecture

    The proposed overall architecture of the DS-UWB receiver subsystem is presented in figure consists of four different main parts: the RAKE control (RC), the channel estimator (CE), the component that implements the selection algorithm (HPSS) and finally the RAKE receiver (RR).The PN buffer component contains the PN sequence which is fed into the CE and RR. The role of the RC is to synchronize the CE and RR subsystems and determine the exact time of operation for each of them. The complete architecture is entirely designed and implemented using VHDL language techniques in the programming environment of the Xilinx ISE Design Suite 9.2.

    The design targets a platform that hosts a Xilinx Virtex-4 SX (XC4VSX35) FPGA. For the signal representation we have chosen an accuracy of 8 bits, which is adequate for our application. In the following subsections the three main components of our system are described.

    1. Channel Estimator

      The CE subsystem produces estimates of the channel impulse response coefficients which are fed into the HPS sub system. Channel Estimation is performed by using a data aided approach, in which we assume that each packet begins with known pilot bits. Each of these pilot bits is chosen to be the PN sequence with the desirable characteristics (low cross-correlation, high auto-correlation values). The CE subsystem correlates the received pilot bits with the local PN sequence and calculates the estimates of the channel coefficients. This sub-optimal but low-complexity algorithm is known as sliding Window (SW) algorithm and it can be optimum (in the maximum likelihood-ML sense) if the shifted versions of the signal are mutually orthogonal . The channel is assumed to remain constant for the duration of the data packet and the estimated channel coefficients are used for the whole detection process of the data packet. The block diagram for the CE architecture is given in Figure 3. It consists of 15 fingers which compute a corresponding number of estimates.

      Figure 3.Channel Estimator Subsystem The RTL schematic for the implementation of each CE finger is presented in figure 4. The PN multiplier is implemented by the use of a multiplexer which selects between the incoming signal and its twos

      complement. This is followed by an accumulator consisting of an adder and two registers which are synchronized appropriately by the RC component. The adder that follows uses information that comes from an accumulator of the input signal in order to normalize the final output which is the exported estimate of a certain channel coefficient.

      Figure 4.Channel Estimator Finger Implementation

    2. Rake Receiver

      The corresponding block diagram and the RTL schematic of the finger implementation for the RAKE Receiver are shown in Figures below. We see the full RAKE case where the selection algorithm is not implemented and all of the 15 RAKE fingers are used for the MRC scheme.

      Figure 5.RAKE Receiver Subsystem

      When the selection algorithm is employed, the RR takes the form of the block diagram shown in figure 7 which is the proposed HPS implementation.

      Figure 6.RAKE Receiver Finger Implementation

      In this case, only 9 of the 15 fingers are implemented in hardware, combining the selected coefficient estimates and the corresponding signals from the Signal Buffer. The information, on which

      signals from the Signal Buffer are chosen, comes from the HPS subsystem in the form of indices that drive certain multiplexers. For example, in Figure 7, fingers 1, 2, 3, 5, 6 and 10 were not selected to participate in the MR combining scheme.

      Each RR finger consists of a PN multiplier, an accumulator and a coefficient multiplier. The first two are implemented in the same manner as described above for the CE finger. The final multiplier multiplies the output of the accumulator with the corresponding coefficient in order to implement the MRC scheme. The outputs of all RR fingers are summed to obtain the final estimated symbol.

    3. Hybrid Partial/Selective Subsystem

      The HPS subsystem employs the CE coefficient estimates and selects the strongest of them. The proposed HPS algorithm minimizes the complexity by reducing the channel coefficients estimates that participate in the selection process. This algorithm exploits the fact that, with great certainty the first multipath components will be strong enough to be selected by the sorting algorithm, while the multipath component at the tail of the channel impulse response are so weak that the probability of being selected is very low. This assumption can be adopted owing to the fact that the channel model has a power delay profile (PDP) that is exponentially decaying. For that reason, we partially select a number of the channel coefficients estimates that components of the channel and partially abort the estimates that correspond to the latest arriving multipath components. An example of this procedure is shown in figure 7, where four channel estimates were partially accepted and three of them were partially aborted. Consequently, eight of the CE exported estimates participate in the selection process, among which five are finally selected (selectively accepted) and three are aborted (selectively aborted).that way, the selection subsystems complexity can be reduced significantly as it will be shown numerically in the next section.

      Figure 7. Hybrid Partial/Selective RAKE Receiver Subsystem

      The implementation of the proposed selection subsystem (HPSS) is based on a modified version of the bubble sorting algorithm written in VHDL

      language. This algorithm suits well in our application because we do not desire full sorting of the input coefficients but only a certain number of the strongest coefficients. Thus, in our example, the main loop of the algorithm runs only five times, instead of eight, leading to a very low complexity implementation. The synthesis tool translates optimally the algorithm into a set of comparators and multiplexers. The RTL schematic is not shown here because of its visual complexity.

      The HPS subsystem exports the indices of the nine partially and selectively accepted estimates that are used by the RR subsystem, which is now implemented by employing only nine fingers instead of fifteen.

  3. Design Optimization

    The Carry Select Adder (CSLA) is used in both CE and RR fingrs to alleviate the issue of propagation delay by generating multiple carries independently and then select a carry to generate the sum. As CSLA uses multiple pairs of Ripple Carry Adders (RCA) to generate partial sum and carry by considering carry input cin=0 and cin=1, then the final sum and carry are selected by the multiplexers it is considered to be area inefficient.

    The primary idea of this work is to use Binary to Excess-1 converter (BEC) instead of RCA with cin=1in the regular CSLA to achieve low device utilization and power consumption. The main advantage of this (BEC) logic comes from the lesser number of logic gates than the n-bit Full Adder (FA) structure. The details of the (BEC) logic are discussed below.

    The CSLA has been chosen for comparison as it has a more balanced delay, and requires lower power and area. The delay and area evaluation methodology of the regular and modified SQRT CSLA are presented in Numerical results, respectively.


    Figure 8.Delay And Area Evaluation Of An XOR Gate

    The AND, OR, and Inverter (AOI) implementation of an XOR gate is shown in Figure

    8. The gates between the dotted lines are performing the operations in parallel and the numeric

    representation of each gate indicates the delay contributed by that gate. The delay and area evaluation methodology considers all gates to be made up of AND, OR, and Inverter, each having delay equal to 1 unit and area equal to 1 unit. We then add up the number of gates in the longest path of a logic block that contributes to the maximum delay. The area evaluation is done by counting the total number of AOI gates required for each logic block. Based on this approach, the CSLA adder blocks of 2:1 mux, Half Adder (HA), and FA are evaluated and listed in Table 1.

    Table 1

    Delay And Area Count Of The Basic Blocks Of CSLA

    4.1 Binary to Excess 1 Converter

    As stated above the main this work is to use BEC instead of the RCA with cin=1 in order to reduce the area and power consumption of the regular CSLA. To replace the n-bit RCA, an n+1 bit BEC is required. A structure and the function table of a 4-b BEC are shown in Figure 9.and Table 2, respectively. Figure 10 illustrates how the basic function of the CSLA is obtained by using the 4-bit BEC together with the mux. One input of the 8:4 mux gets as it input (B3, B2, B1, and B0) and another input of the mux is the BEC output. This produces the two possible partial results in parallel and the mux is used to select either the BEC output or the direct inputs according to the control signal Cin. The importance of the BEC logic stems from the large silicon area reduction when the CSLA with large number of bits are designed. The Boolean expressions of the 4-bit BEC is listed as (note the functional symbols ~ NOT, & AND, ^XOR)

    X0 = B0

    X1 = B1 ^ B0

    X2 = B2 ^ (B0 & B1)

    XN = BN ^ (B0 & B1 & B2 —– & BN-1) 5

    Figure 9. 4-b BEC

    Table 2

    Function Table Of The 4-b BEC

  4. Performance Evaluation Of Modified CSLA

    The structure of the proposed CSLA using BEC for RCA to optimize the area and power is shown in Fig. We again split the structure into five groups. The delay and area estimation of each group are shown in Fig. The steps leading to the evaluation are given here.

    1. The group2 has one 2-b RCA which has 1 FA and 1 HA for Cin=0. Instead of another 2-b RCA with Cin=1 a 3-b BEC is used which adds one to the output from 2-b RCA.

      Based on the consideration of delay values of Table I, the arrival time of selection input c1(time(t)=7) of 6:3 mux is earlier than the s3[t=9] and c3[t=10] and later than the s2[t=4]. Thus, the sum3 and final c3 (output from mux) are depending on s3 and mux and partial c3 (input to mux) and mux, respectively. The sum2 depends on c1 and mux.

      Figure 10(a) group 2, 4-Bit BEC with the Mux

    2. For the remaining groups the arrival time of mux selection input is always greater than the arrival time of data inputs from the BECs. Thus, the delay of the remaining groups depends on the arrival time of mux selection input and the mux delay.

    3. The area count of group2 is determined as follows:

      Gate count=43[FA+HA+Mux+BEC) FA=13(1*13)

      HA=6(1*6) AND=1 NOT=1 XOR=10(2*5) Mux=12(3*4)

    4. Similarly, the estimated maximum delay and area of the other groups of the modified SQRT CSLA are evaluated.

    Figure 10(b) group 3 , 4-Bit BEC with the Mux

  5. Conclusion

    A simple approach is proposed in this paper to reduce the area and power of the DS UWB architecture. The reduced number of gates of this work offers the great advantage in the reduction of area and also the total power. The compared results show that the modified system has the area and powers of the modified architecture are significantly reduced by 17.4% and 15.4% respectively. The power-delay product and also the area-delay product of the proposed design show a decrease for RAKE receiver and channel estimator.

  6. Acknowledgement

The author would like to thank Sasi Institute of Technology &Engineering and Ashwan Kumar Koralla of cedronics and reviewers.


  1. G. R. Aiello and G. D. Rogerson, Ultra-wideband wireless systems,

    IEEE Microwave Mag., vol. 4, pp. 36-47, Feb. 2003.

  2. R. Fisher, R. Kohno, M. McLaughlin, and M.

    Welborn, DS-UWB

    physical layer submission to IEEE 802.15 Task Group 3a (Doc.

    Number P802.15-04/0137r4), IEEE P802.15, Jan. 2005.

  3. P. Runkle, J. McCorkle, T. Miller, and M. Welborn,

    DS-CDMA: the

    modulation technology of choice for UWB communications, in IEEE

    Conf. on Ultra Wideband Systems & Tech., pp. 364 368, Nov. 2003.

  4. M. Z. Win and R. A. Scholtz, On the energy capture of ultrawide

    bandwidth signals in dense multipath environments, IEEE

    Communications Letters, vol. 2, pp. 245247, Sept. 1998.

  5. D. Cassioli, M. Z. Win, F. Vatalaro, and A. Molisch,

    Performance of

    low-compexity rake reception in a realistic UWB channel, IEEE

    Intern. Conf. on Communications, vol. 2, pp. 763-767, Aug. 2002.

  6. B. Mielczarek, M. O. Wessman, and A. Svensson,

    Performance of

    coherent UWB rake receivers with channel estimators, IEEE

    Vehicular Technology Conf., vol. 3, pp. 18801884, Oct. 2003.

  7. M. Eslami and X. Dong, Rake-MMSE-equalizer performance for

    UWB, IEEE Communications Letters, vol. 9, Issue 6, pp. 502504,

    June 2005.

  8. A. Parihar, L. Lampe, R. Schober, and C. Leung,

    Equalization for DSUWB systemspart I: BPSK modulation, IEEE Trans. on

    Communications, vol. 55, pp. 1164-1173, June 2007.

  9. A. Rajeswaran, V. S. Somayazulu, and J. R. Foerster,


    performance for a pulse based UWB system in a realistic UWB indoor

    channel, IEEE Intern. Conf. on Communications, vol. 4, pp. 2879

    2833, May 2003. carry-select adder for lowpower applications, in Proc. IEEE Int. Symp.Circuits Syst., 2005, vol. 4, pp. 40824085.

  10. W. Li, J. Zhong and T. A. Gulliver, A low complexity rake receiver

    for ultra-wideband systems, IEEE Vehicular Technology Conf.., vol.

    3, pp. 1393-1396, Sept. 2005.

  11. D.D.Wentzloff, R. Blazquez, F.S. Lee, B.P. Ginsburg, J. Powell, and

    A.P. Chandrakasan, System design considerations for ultra-wideband

    communication, IEEE Communications Magazine, vol. 43, pp. 114-

    121, Aug. 2005.

  12. B. Ramkumar, H.M. Kittur, and P. M. Kannan,

    ASIC implementation

    of modified faster carry save adder, Eur. J. Sci. Res., vol. 42, no. 1, pp.

    5358, 2010.

  13. T. Y. Ceiang and M. J. Hsiao, Carry-select adder using single ripple

    carry adder, Electron. Lett., vol. 34, no. 22, pp. 2101 2103, Oct. 1998.

  14. Y. Kim and L.-S. Kim, 64-bit carry-select adder with redued area,

    Electron. Lett., vol. 37, no. 10, pp. 614615, May 2001.

  15. J. M. Rabaey, Digtal Integrated CircuitsA Design Perspective.

    Upper Saddle River, NJ: Prentice-Hall, 2001.

  16. Y. He, C. H. Chang, and J. Gu, An area efficient 64- bit square root

    carry-select adder for lowpower applications, in Proc. IEEE Int. Symp.

    Circuits Syst., 2005, vol. 4, pp. 40824085.

  17. Christos Thomos,Charalampos Papadopoulos and Grigorios Kalivas Design and Implementation of a Low- Complexity RAKE Receiver and Channel Estimator for DS-UWB,IEEE 2010.

Leave a Reply