 Open Access
 Total Downloads : 585
 Authors : Dhanasree Jillella, Nagaraju Ravada, N.V.G.Prasad
 Paper ID : IJERTV1IS6470
 Volume & Issue : Volume 01, Issue 06 (August 2012)
 Published (First Online): 30082012
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Area Efficient And Low Power RAKE Receiver And Channel Estimator For DSUWB In FPGA
Area Efficient And Low Power RAKE Receiver And Channel Estimator For DSUWB In FPGA
Dhanasree Jillella#1 
Nagaraju Ravada*2 
N.V.G.Prasad#3 
M.Tech. Student 
Assistant Professor 
Associate professor 
#Dept of Electronics and Communication Engineering,
Sasi Institute Of Technology & Engineering, JNTU Kakinada, India.
Abstract
In this paper, we propose and implement an efficient architecture for a chipspaced Direct Sequence UltraWideband (DSUWB) RAKE receiver subsystem that consists of four parts: the Channel Estimator (CE), the Selection Subsystem (hybrid Partial/Selective – HPSS), the RAKE Receiver (RR) and the RAKE Control (RC).The proposed algorithm running in the HPS subsystem combines the benefits of both SRake and PRake methods in order to further reduce its complexity. An energy efficient carry select adder in which RCA is replaced with BEC is used in finger implementation of the sub system. The adders are replaced with BEC logic instead of RCA. The whole DSUWB RAKE system is implemented by the use of VHDL language techniques, and it is fully synthesizable, targeting a platform that employs a Xilinx Virtex4 FPGA. The structure of our design is highly parallel and modular is optimized for high performance and achieves a clock frequency of over 200 MHz in order to operate at the desired chip rate.
Keywords Channel Estimator, CSLA, Finger, Field programmable Gate Array (FPGA), RAKE.
1. Introduction
:directsequence spread spectrum UWB(DSUWB) systems which we consider in this work, and multiband orthogonal frequencydivision multiplexing UWB (MBOFDM UWB) systems.
The main characteristic of a UWB system is its wide bandwidth(in the order of several GHz),which leads to highly frequency selective channels and received signals composed of a significant number of a resolvable multipath components with different delays in the order of nanosecond. A DS_SS UWB system with a RAKE receiver can exploit multipath diversity by constructive summation of the desired signal energy which is dispersed over the various multipath components, helping to mitigate fading and thus improving performance. However, the low energy of the resulting paths combined with the high resolvability; result in a RAKE receiver that must employ a large number of multipath components in order to optimize the received SNR. Previous studies showed that a RAKE receiver operating in a typical modern office building requires about 50 different RAKE fingers to capture a sufficient amount of the total energy of the received signal. This fact poses significant challenges in the design and implementation of a RAKE receiver aiming to achieve a high performance gain in a low complexity and power efficient structure.
The most common methods proposed towards this aim are the Selective RAKE (SRake) and Partial RAKE (PRake) schemes. The first one combines the strongest multipath components among the
OT available at the receiver input using MRC scheme. Despite the reduction in the number of RAKE fingers the selection procedure requires efficient channel estimation in order to keep track of the value of all multipath components at each time instant. The second method combines the first arriving multipath components using MRC and it is a less complex solution, as it does not have to carry
out to any selection among the multipath components. It is clear, that SRake performs better than PRake since the latter combines paths that may not contribute to increasing the collected energy. However, in case where the stronger multipath components are located in the beginning of the channel impulse response the performance gap decreases.
So far, several other complex types of RAKE architectures have been proposed for UWB systems. Comparison between PRake and SRake for pulse position modulation showed that the simpler one, PRake is almost as good as SRake with a small number of fingers in a Nakagami fading channel. Fractionallyspaced (FS) RAKE receivers for single user DS UWB systems employing Gaussian monocycles have been studied in. It is shown there that the FS RAKE receiver outperforms chipand symbolspaced RAKE receivers at the cost of higher complexity, since it can compensate better for channel distortion. Although, combined RAKE equalization techniques have been examined in order to alleviate intersymbol interference (ISI), it was shown that in a UWB system of a single user link, the performance limiting factor in the SNR range of interest is energy capture rather than ISI .in, new algorithms for finger assignment are developed which use different selection criteria for assigning the RAKE fingers to reduce the effect of pulse shaping and pulse position modulation in the RAKE receiver performance.
CSLA
Area and power have major role in the designing of integrated circuit because of the increase in popularity of portable systems as well as the rapid growth of power density in VLSI circuits. Addition usually influences strongly on the overall performance of digital systems and a crucial arithmetic function.
Carry Select Adder (CSLA) is one of the fastest adders used in many dataprocessing processors to perform fast arithmetic functions. From the structure of the CSLA, it is clear that there is scope for reducing the area and power consumption in the CSLA.
The CSLA is used in many computational systems to alleviate the problem of carry propagation delay by independently generating multiple carries and then select a carry to generate the sum. The basic idea of this work is to use Binary to Excess1 Converter (BEC) instead of RCA with cin=1 in the regular CSLA to achieve lower area and power consumption.
In this paper, we propose and implemented an area efficient low power rake receiver and channel
estimator for DSUWB.the proposed algorithm running in the HPS subsystem combines the benefits of both SRake and PRake methods in order to further reduce its complexity and binary to excess1 converter is used to achieve lower area and power consumption.
The remainder of the paper is organized as follows: In the Section 2 the transmission model of our system is analyzed, in Section 3 we describe the architecture of the proposed DSUWB RAKE, Section 4 describes the design optimization techniques for further decreasing the complexity, power meanwhile increasing the performance and in Section 5 we present numerical results on the complexity and hardware utilization of our implementation. Finally, conclusions are given in Section 6.

The Complete Transceiver Model
In this section the system model of the DSUWB system (figure 1) is presented in order to better understand the challenges of the architecture of the proposed system.

Transmitter
The information bits to be transmitted are generated randomly by a binary source at a symbol rate 1/ bits/sec, where Tb is the duration of a BPSK symbol. The binary sequence (at point A) is represented by a vector b. After BPSK modulation it becomes d.(at point B) and it is spread by a PN sequence pn composed of Â±1s.after spreading, d is transformed to a vector c, which is generated at a chip rate of 1/ = / chips/sec (at point C).the spread spectrum processing gain is / . After the pulse shaper filter with impulse response ( ) we get a signal (at point D).
Figure 1.DSUWB Transmission Model

Multipath Channel
The signal s(t) passesthrough a multipath channel. In this paper we use the high frequency channel model that has been established by the IEEE 802.15.3a standardization group for the evaluation of the performance of different physical layers for high data rate UWB systems. This model is based on a modified propagation which adopts a cluster based approach for the multipaths arriving at the receiver. The cluster arrival time and the ray arrival time within the each cluster are modelled as a Poisson process whereas the coefficients are assumed lognormally distributed. IEEE 802.15.3a group has suggested four sets of parameters to fit measurement data by considering four channel models (CM) representing different environmental scenario: CM1(04m,Lineofsight(LOS),CM2(0 4m,nonLOS(NLOS)), CM3(410m,NLOS) and
CM4(extreme NLOS).

Receiver
After passing through the channel, the multipath affected received signal r(t) can be expressed (at point E).The received signal goes through a pulse matched filter (at point F) and after sampling with chip rate 1/ ,we get the discrete time signal (at point G).the signal inputs the RAKE receiver subsystem, where it is delayed by each finger z at chip intervals, it is despread and then the selected
PS taps are combined to the MRC method.
Where PS is the number of RAKE fingers assigned to the resolvable multipath rays which are selected by the HPS algorithm and the coefficients produced by the channel estimator. Throughout this analysis we assumed perfect timing synchronization.
Figure 2.Blockdiagram of the proposed system architecture


The System Architecture
The proposed overall architecture of the DSUWB receiver subsystem is presented in figure 2.it consists of four different main parts: the RAKE control (RC), the channel estimator (CE), the component that implements the selection algorithm (HPSS) and finally the RAKE receiver (RR).The PN buffer component contains the PN sequence which is fed into the CE and RR. The role of the RC is to synchronize the CE and RR subsystems and determine the exact time of operation for each of them. The complete architecture is entirely designed and implemented using VHDL language techniques in the programming environment of the Xilinx ISE Design Suite 9.2.
The design targets a platform that hosts a Xilinx Virtex4 SX (XC4VSX35) FPGA. For the signal representation we have chosen an accuracy of 8 bits, which is adequate for our application. In the following subsections the three main components of our system are described.

Channel Estimator
The CE subsystem produces estimates of the channel impulse response coefficients which are fed into the HPS sub system. Channel Estimation is performed by using a data aided approach, in which we assume that each packet begins with known pilot bits. Each of these pilot bits is chosen to be the PN sequence with the desirable characteristics (low crosscorrelation, high autocorrelation values). The CE subsystem correlates the received pilot bits with the local PN sequence and calculates the estimates of the channel coefficients. This suboptimal but lowcomplexity algorithm is known as sliding Window (SW) algorithm and it can be optimum (in the maximum likelihoodML sense) if the shifted versions of the signal are mutually orthogonal . The channel is assumed to remain constant for the duration of the data packet and the estimated channel coefficients are used for the whole detection process of the data packet. The block diagram for the CE architecture is given in Figure 3. It consists of 15 fingers which compute a corresponding number of estimates.
Figure 3.Channel Estimator Subsystem The RTL schematic for the implementation of each CE finger is presented in figure 4. The PN multiplier is implemented by the use of a multiplexer which selects between the incoming signal and its twos
complement. This is followed by an accumulator consisting of an adder and two registers which are synchronized appropriately by the RC component. The adder that follows uses information that comes from an accumulator of the input signal in order to normalize the final output which is the exported estimate of a certain channel coefficient.
Figure 4.Channel Estimator Finger Implementation

Rake Receiver
The corresponding block diagram and the RTL schematic of the finger implementation for the RAKE Receiver are shown in Figures below. We see the full RAKE case where the selection algorithm is not implemented and all of the 15 RAKE fingers are used for the MRC scheme.
Figure 5.RAKE Receiver Subsystem
When the selection algorithm is employed, the RR takes the form of the block diagram shown in figure 7 which is the proposed HPS implementation.
Figure 6.RAKE Receiver Finger Implementation
In this case, only 9 of the 15 fingers are implemented in hardware, combining the selected coefficient estimates and the corresponding signals from the Signal Buffer. The information, on which
signals from the Signal Buffer are chosen, comes from the HPS subsystem in the form of indices that drive certain multiplexers. For example, in Figure 7, fingers 1, 2, 3, 5, 6 and 10 were not selected to participate in the MR combining scheme.
Each RR finger consists of a PN multiplier, an accumulator and a coefficient multiplier. The first two are implemented in the same manner as described above for the CE finger. The final multiplier multiplies the output of the accumulator with the corresponding coefficient in order to implement the MRC scheme. The outputs of all RR fingers are summed to obtain the final estimated symbol.

Hybrid Partial/Selective Subsystem
The HPS subsystem employs the CE coefficient estimates and selects the strongest of them. The proposed HPS algorithm minimizes the complexity by reducing the channel coefficients estimates that participate in the selection process. This algorithm exploits the fact that, with great certainty the first multipath components will be strong enough to be selected by the sorting algorithm, while the multipath component at the tail of the channel impulse response are so weak that the probability of being selected is very low. This assumption can be adopted owing to the fact that the channel model has a power delay profile (PDP) that is exponentially decaying. For that reason, we partially select a number of the channel coefficients estimates that components of the channel and partially abort the estimates that correspond to the latest arriving multipath components. An example of this procedure is shown in figure 7, where four channel estimates were partially accepted and three of them were partially aborted. Consequently, eight of the CE exported estimates participate in the selection process, among which five are finally selected (selectively accepted) and three are aborted (selectively aborted).that way, the selection subsystems complexity can be reduced significantly as it will be shown numerically in the next section.
Figure 7. Hybrid Partial/Selective RAKE Receiver Subsystem
The implementation of the proposed selection subsystem (HPSS) is based on a modified version of the bubble sorting algorithm written in VHDL
language. This algorithm suits well in our application because we do not desire full sorting of the input coefficients but only a certain number of the strongest coefficients. Thus, in our example, the main loop of the algorithm runs only five times, instead of eight, leading to a very low complexity implementation. The synthesis tool translates optimally the algorithm into a set of comparators and multiplexers. The RTL schematic is not shown here because of its visual complexity.
The HPS subsystem exports the indices of the nine partially and selectively accepted estimates that are used by the RR subsystem, which is now implemented by employing only nine fingers instead of fifteen.


Design Optimization
The Carry Select Adder (CSLA) is used in both CE and RR fingrs to alleviate the issue of propagation delay by generating multiple carries independently and then select a carry to generate the sum. As CSLA uses multiple pairs of Ripple Carry Adders (RCA) to generate partial sum and carry by considering carry input cin=0 and cin=1, then the final sum and carry are selected by the multiplexers it is considered to be area inefficient.
The primary idea of this work is to use Binary to Excess1 converter (BEC) instead of RCA with cin=1in the regular CSLA to achieve low device utilization and power consumption. The main advantage of this (BEC) logic comes from the lesser number of logic gates than the nbit Full Adder (FA) structure. The details of the (BEC) logic are discussed below.
The CSLA has been chosen for comparison as it has a more balanced delay, and requires lower power and area. The delay and area evaluation methodology of the regular and modified SQRT CSLA are presented in Numerical results, respectively.
.
Figure 8.Delay And Area Evaluation Of An XOR Gate
The AND, OR, and Inverter (AOI) implementation of an XOR gate is shown in Figure
8. The gates between the dotted lines are performing the operations in parallel and the numeric
representation of each gate indicates the delay contributed by that gate. The delay and area evaluation methodology considers all gates to be made up of AND, OR, and Inverter, each having delay equal to 1 unit and area equal to 1 unit. We then add up the number of gates in the longest path of a logic block that contributes to the maximum delay. The area evaluation is done by counting the total number of AOI gates required for each logic block. Based on this approach, the CSLA adder blocks of 2:1 mux, Half Adder (HA), and FA are evaluated and listed in Table 1.
Table 1
Delay And Area Count Of The Basic Blocks Of CSLA
4.1 Binary to Excess 1 Converter
As stated above the main this work is to use BEC instead of the RCA with cin=1 in order to reduce the area and power consumption of the regular CSLA. To replace the nbit RCA, an n+1 bit BEC is required. A structure and the function table of a 4b BEC are shown in Figure 9.and Table 2, respectively. Figure 10 illustrates how the basic function of the CSLA is obtained by using the 4bit BEC together with the mux. One input of the 8:4 mux gets as it input (B3, B2, B1, and B0) and another input of the mux is the BEC output. This produces the two possible partial results in parallel and the mux is used to select either the BEC output or the direct inputs according to the control signal Cin. The importance of the BEC logic stems from the large silicon area reduction when the CSLA with large number of bits are designed. The Boolean expressions of the 4bit BEC is listed as (note the functional symbols ~ NOT, & AND, ^XOR)
X0 = B0
X1 = B1 ^ B0
X2 = B2 ^ (B0 & B1)
XN = BN ^ (B0 & B1 & B2 —– & BN1)
www.ijert.org 5
Figure 9. 4b BEC
Table 2
Function Table Of The 4b BEC

Performance Evaluation Of Modified CSLA
The structure of the proposed CSLA using BEC for RCA to optimize the area and power is shown in Fig. We again split the structure into five groups. The delay and area estimation of each group are shown in Fig. The steps leading to the evaluation are given here.

The group2 has one 2b RCA which has 1 FA and 1 HA for Cin=0. Instead of another 2b RCA with Cin=1 a 3b BEC is used which adds one to the output from 2b RCA.
Based on the consideration of delay values of Table I, the arrival time of selection input c1(time(t)=7) of 6:3 mux is earlier than the s3[t=9] and c3[t=10] and later than the s2[t=4]. Thus, the sum3 and final c3 (output from mux) are depending on s3 and mux and partial c3 (input to mux) and mux, respectively. The sum2 depends on c1 and mux.
Figure 10(a) group 2, 4Bit BEC with the Mux

For the remaining groups the arrival time of mux selection input is always greater than the arrival time of data inputs from the BECs. Thus, the delay of the remaining groups depends on the arrival time of mux selection input and the mux delay.

The area count of group2 is determined as follows:
Gate count=43[FA+HA+Mux+BEC) FA=13(1*13)
HA=6(1*6) AND=1 NOT=1 XOR=10(2*5) Mux=12(3*4)

Similarly, the estimated maximum delay and area of the other groups of the modified SQRT CSLA are evaluated.
Figure 10(b) group 3 , 4Bit BEC with the Mux


Conclusion
A simple approach is proposed in this paper to reduce the area and power of the DS UWB architecture. The reduced number of gates of this work offers the great advantage in the reduction of area and also the total power. The compared results show that the modified system has the area and powers of the modified architecture are significantly reduced by 17.4% and 15.4% respectively. The powerdelay product and also the areadelay product of the proposed design show a decrease for RAKE receiver and channel estimator.

Acknowledgement
The author would like to thank Sasi Institute of Technology &Engineering and Ashwan Kumar Koralla of cedronics and reviewers.
References

G. R. Aiello and G. D. Rogerson, Ultrawideband wireless systems,
IEEE Microwave Mag., vol. 4, pp. 3647, Feb. 2003.

R. Fisher, R. Kohno, M. McLaughlin, and M.
Welborn, DSUWB
physical layer submission to IEEE 802.15 Task Group 3a (Doc.
Number P802.1504/0137r4), IEEE P802.15, Jan. 2005.

P. Runkle, J. McCorkle, T. Miller, and M. Welborn,
DSCDMA: the
modulation technology of choice for UWB communications, in IEEE
Conf. on Ultra Wideband Systems & Tech., pp. 364 368, Nov. 2003.

M. Z. Win and R. A. Scholtz, On the energy capture of ultrawide
bandwidth signals in dense multipath environments, IEEE
Communications Letters, vol. 2, pp. 245247, Sept. 1998.

D. Cassioli, M. Z. Win, F. Vatalaro, and A. Molisch,
Performance of
lowcompexity rake reception in a realistic UWB channel, IEEE
Intern. Conf. on Communications, vol. 2, pp. 763767, Aug. 2002.

B. Mielczarek, M. O. Wessman, and A. Svensson,
Performance of
coherent UWB rake receivers with channel estimators, IEEE
Vehicular Technology Conf., vol. 3, pp. 18801884, Oct. 2003.

M. Eslami and X. Dong, RakeMMSEequalizer performance for
UWB, IEEE Communications Letters, vol. 9, Issue 6, pp. 502504,
June 2005.

A. Parihar, L. Lampe, R. Schober, and C. Leung,
Equalization for DSUWB systemspart I: BPSK modulation, IEEE Trans. on
Communications, vol. 55, pp. 11641173, June 2007.

A. Rajeswaran, V. S. Somayazulu, and J. R. Foerster,
Rake
performance for a pulse based UWB system in a realistic UWB indoor
channel, IEEE Intern. Conf. on Communications, vol. 4, pp. 2879
2833, May 2003. carryselect adder for lowpower applications, in Proc. IEEE Int. Symp.Circuits Syst., 2005, vol. 4, pp. 40824085.

W. Li, J. Zhong and T. A. Gulliver, A low complexity rake receiver
for ultrawideband systems, IEEE Vehicular Technology Conf.., vol.
3, pp. 13931396, Sept. 2005.

D.D.Wentzloff, R. Blazquez, F.S. Lee, B.P. Ginsburg, J. Powell, and
A.P. Chandrakasan, System design considerations for ultrawideband
communication, IEEE Communications Magazine, vol. 43, pp. 114
121, Aug. 2005.

B. Ramkumar, H.M. Kittur, and P. M. Kannan,
ASIC implementation
of modified faster carry save adder, Eur. J. Sci. Res., vol. 42, no. 1, pp.
5358, 2010.

T. Y. Ceiang and M. J. Hsiao, Carryselect adder using single ripple
carry adder, Electron. Lett., vol. 34, no. 22, pp. 2101 2103, Oct. 1998.

Y. Kim and L.S. Kim, 64bit carryselect adder with redued area,
Electron. Lett., vol. 37, no. 10, pp. 614615, May 2001.

J. M. Rabaey, Digtal Integrated CircuitsA Design Perspective.
Upper Saddle River, NJ: PrenticeHall, 2001.

Y. He, C. H. Chang, and J. Gu, An area efficient 64 bit square root
carryselect adder for lowpower applications, in Proc. IEEE Int. Symp.
Circuits Syst., 2005, vol. 4, pp. 40824085.

Christos Thomos,Charalampos Papadopoulos and Grigorios Kalivas Design and Implementation of a Low Complexity RAKE Receiver and Channel Estimator for DSUWB,IEEE 2010.