 Open Access
 Total Downloads : 241
 Authors : K. Jebin Roy, R. Ramya
 Paper ID : IJERTV3IS20906
 Volume & Issue : Volume 03, Issue 02 (February 2014)
 Published (First Online): 01032014
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Adaptive FIR Filter based on Distributed Arithmetic and LMS Algorithm for LowArea and LowPower
K. Jebin Roy
M.E. VLSI Design
Anand Institute of Higher Technology Affiliated to Anna University, Chennai 603103, India
R. Ramya M.Tech. VLSI Design, Sathyabama University, Accredited by NAAC, Chennai 600119, India
Abstract In this manuscript, we proposed a novel pipelined architecture for lowpower and lowarea adaptive FIR filter based on distributed arithmetic (DA) and LMS algorithm. DA is bitserial computational process and uses parallel lookup table (LUTs) apprise and equivalent implementation of filtering and weightupdate operations to appliance high throughput filter rates irrespective of the filter length. The full adder based conditional signed carry save accumulation for DAbased inner product computation is replaced and design by using 10 transistor full adder based carry save accumulation of shift accumulation, with the intention of the proposed design, which can reduce the area complexity and power consumption. The leastmeansquare (LMS) algorithm adaptation is functioned to update the weight and abate the mean square error between the assessed and chosen output. The weight increment block based adder/subtractor cells is swapped by carry save adder in order to reduce area difficulty. It comprises of multiplexors, smaller LUT, and practically half the number of transistors compared to the present DAbased design.
Index Terms: Adaptive Filter, Distributed Arithmetic (DA), Finite Impulse Response (FIR), Least Mean Square (LMS) Algorithm, Lookup table (LUT).

INTRODUCTION
Adaptive filters find extensive use in many signal processing applications such as channel equalization, echo cancellation, noise cancellation [1]. The finite impulse response (FIR) filters whose weights are updated by the famous WidrowHoff least mean square (LMS) algorithm is the most popularly used adaptive filter not only due to its simplicity but also due to its satisfactory convergence performance [5]. The direct form configuration on the onward path of the FIR filter results in a long critical path due to an inner product computation to obtain a filter output. Consequently, it is required to reduce the critical path of the structure if the input signal has high sampling rate. By reducing the critical path of the structure, thereby, the critical path could not exceed the sampling period.
Distributed arithmetic (DA) is so named because it performed arithmetic operation. DA is bit serial computation in nature and it eliminates the need for hardware multipliers
and is capable of implementing large order filters with very high throughput. A lot of study has been done to implement the DA based adaptive FIR filter for area efficient design, the multiplierless distributed arithmetic (DA) based technique has achieved plenteous popularity for its high throughput, but it results are increased in costeffective, area and time efficient computing structures [8]. DA based hardware efficient adaptive FIR filter inner product has been suggested by Allred et al. [2] using two separate lookup tables (LUTs) Filtering lookup table and Auxiliary lookup table for filtering and weight updating module. Later, Guo and DeBrunner [3], [4] have improved the design structure in [2] by using only one lookup table instead of two LUTs for both filter and weight updating module. On the other hand, the design process in [2], [3], [4] and [8] require more cycles for lookup table (LUT) update for each new sample, hence it do not support high sampling rate. Meher and Park have improved the design with low adaptation delay for high speed DA based adaptive filter [6]. In a recent paper, Meher and Park proposed a new DA based adaptive filter architecture for low power, low area and high throughput with very low adaptation delay [7].
This brief proposes an adaptive FIR filter using distributed arithmetic for area efficient design. High Throughput is achieved by using a parallel lookup table update and equivalent implementation of filtering and weightupdating operations. The conditional signed carry saved accumulation for DAbased inner product computation is designed by using 10 transistor full adder based carry saved accumulation of shift accumulation. The use of the proposed design helps to reduce the area complexity and power consumption.
In the next section, a brief study of the least mean square (LMS) adaptive algorithm, followed by the description of the proposed DA based technique filter in Section 3. The structure of the proposed adaptive filter and description of the proposed DA based adaptive FIR filter in Section 4. Results and Conclusions are given in Section 5 and 6.

REVIEW OF LMS ADAPTIVE ALGORITHM
The LMS algorithm computes a filter output and an error value that is equal to the difference between the current
filter output and the desired response for every clock cycle. In every training cycle, the estimated error is then used to
and the inner product given by (7) can be computed as
= 1 2 . 0 , = 1 .
update the filter weights. The weights of LMS adaptive filter
1
=0
(8)
during the nth iteration is updated according to the following equations [6]:
+ 1 = + Âµ . (1a)
Where
= (1b)
= . (1c)
The input vector and the weight vector at the nth training iteration are respected given by
= , 1 , , + 1 (2)
= 0 , 1 , , 1 (2)
is the desired response, and is the filter output of the iteration. denotes the error value generated during the iteration, which is used to update the weights, Âµ is the convergence factor, and N is the filter length.
In the case of filter designs, the feedback error
becomes available after certain number of cycles, called the
adaptation delay. The pipelined architectures therefore use the delayed error for updating the current weight instead of the most recent error, where is the adaptation delay. The weight update equation of such delayed LMS adaptive filter is given by
+ 1 = + Âµ . .
(3a)

PROPOSED DABASED APPROACH FOR INNER PRODUCT COMPUTATION
In each cycle, the LMS adaptive filter needs to perform an innerproduct computation which contributes to the most of the critical path. Let the inner product computation of (1c) be given by
Meanwhile any element of the Npoint bit sequence
0 1 can either be 1 or 0, the partial sum for = 0, 1,, 1 can have 2 possible values. If the entire 2 possible values sum are precomputed and stored in a LUT, the partial sum can be read out from the LUT using the bit sequence as address bits for computing the inner product.
Figure 1: DAbased implementation of four point inner product
Figure 2: Carry save implementation of shift accumulation
The inner product of (8) can therefore be calculated in cycles of carry save implementation of shift accumulation, followed by LUTread operations corresponding to number of bit slices for 0 1, as shown in Fig. 1. Since the carry save implementation of shift accumulation in Fig. 2 required more area and power consumption.
=0
= 1 .
(4)
Where and for 0 1 form the N point vectors corresponding to the current weights and most recent 1 input respectively. Let us assume be the bit width of the weigt, every component of the vector weight may be expressed in 2s complement representation
=1
= 0 + 1 . 2
(5)
Where denotes the bit of . Substituting (5), we can write (4) in an expanded form
= 1 . 0 + 1 . 1 . 2
=0
=0
=1
(6)
Figure 3: 10T 1Bit Full Adder
To convert the sumofproduct form of (4) into a distributed form, the order of summations over the indices and in
(6) can be interchanged to have
= 1 . 0 + 1 2 . 1 .
The carry save implementation of shift accumulation based full adder is design by using 10 transistor one bitfull adder
[9] as shown in Fig. 4. The bit slices of vector are fed one=0
=0
=0
(7)
after the next in the LSB to the MSB order to the carry save
accumulator. Finally, the sum and carry output of the carry save accumulator is obtained after clock cycle are required to be added by a final adder. The content of the LUT location can be expressed as
=0
= 1 .
(9)
where is the + 1 th bit of the – bit binary representation of integer for 0 2 1 can be precomputed and stored in RAM based LUT of 2 words. However, instead of storing 2 words in LUT, we store (2 1) words in a DA table of (2 1) registers.
Figure 5: Distributed arithmetic table
Figure 6: Proposed structure of DAbased LMS adaptive filter length
= 4
DA table for N=4 is shown in Fig. 5. DA table contains only 15 registers to store the precalculated sums of input words. In DA table, seven new values of are computed by seven adders in parallel.

PROPOSED STRUCTURE OF ADAPTIVE FIR FILTER
Figure 7: Proposed structure of DAbased LMS adaptive filter of length N=16
A straightforward DAbased implementation of inner product requires LUT of very large size. For that reason, the computation of the inner products of large orders needs to be decomposed [4] into small adaptive filtering blocks as shown in Fig. 6 and large order adaptive filters shown in Fig. 7.
The structure of DAbased adaptive filter of length N=4 comprises of a fourpoint innerproduct block and a weight increment block along with additional circuits for the
computation of error value and control word for the barrel shifters. The fourpoint innerproduct block [shown in Fig. 1] contains a DA table consisting of an array of 15 registers as shown in Fig. 5 which stores the partial inner products for 0 < 15 and a 16:1 multiplexor to select the content of one of those registers from the DA table. Bit slices of weights A = 3 2 1 0 for 0 1 are
fed to the MUX as control in LSB to MSB order, and the output of the MUX is fed to the carry save accumulator using 10T full adder as shown in Fig. 4. After bit cycles, the carry save accumulator shift accumulates all the partial inner products and generates a sum and carry output word of size (+2) bit each. The carry and sum words are shifted added with an input carry 1 to generate filter output which is subsequently subtracted from the desired output to obtain the error .
Figure 8: Structure of the weightincrement block for = 4
if r6=1 then t= 000; else if r5 = 1 then t= 001; else if r4 = 1 then t= 010; else if r3 = 1 then t= 011; else if r2 = 1 then t= 100; else if r1 = 1 then t= 101; else if r0 = 1 then t= 110;
Figure 9: Logic used for generation of control word for the barrel shifter for
= 8
As in the case in [4], all the bits of the error except the most significant bit (MSB) one are ignored (8th Bit). The remaining bits are magnitude of the error, the magnitude of the computed error is decoded to generate the control word for the barrel shifter. The logic used for the generation of control word for the barrel shifter is shown in Fig. 9. The number of shifts in that case is increased by locations accordingly to reduce the hardware complexity. The weight increment unit as shown in Fig. 8 for = 4 comprises of 4 barrel shifters and four carry save adder cells. The barrel shifter shifts the different input values for = 0, 1, 2 1 by appropriate number of locations. The barrel shifter yields the desired increments are fed to the carry save adder with the sign bit from the error value. The sign bit of the error is used as the control for the 2:1 MUX to select any one of the sum or carry output from the Carry save adder. The output of the MUX is fed to the Byte parallel to Bitserial converter to convert 8 bit data into 1 bit data. The output waveform of DAbased adaptive FIR filter (N=16) as shown in Fig. 10.
Figure 10: DAbased LMS adaptive FIR filter of length N=4

RESULTS
Thus the existing and proposed designs in [7] and [8] are implemented in Xilinx 14.1 using verilog code. Along with area and power of corresponding design are measured using Tanner 15.1 EDA in 250nm CMOS technology.
Table 1: Implementation Results Using Xilinx 14.1 and Tanner 15.1
Designs
Filter
Length
Area
(sq.Âµs)
Power
(mW)
Existing
N = 16
18264
9.41
Proposed
N = 16
14520
6.40

CONCLUSION
In this script, an adaptive FIR filter using distributed arithmetic (DA) for area efficient design is implemented. High throughput is drastically enriched by parallel (LUTs) update and equivalent implementation of filtering and weightupdate operations. The proposed carry save accumulation using 10 transistor full adder schemes of signed partial inner products for the computation of the filter output and also modified in weight increment block. By this way it utilizes low area, low power consumption and the throughput of the filter rates increases irrespective of the filter length.
REFERENCES

B. Widrow and S. D. Stearns, Adaptive signal processing. Prentice Hall, Englewood Cliffs, NJ, 1985.

D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson,
LMS adaptive filters using distributed arithmetic for high throughput, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no.7,pp. 13271337, Jul. 2005.

R. Guo and L. S. DeBrunner, Two highperformance adaptive filter implementation schemes using distributed arithmetic, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 9, pp. 600604, Sep. 2011.

R. Guo and L. S. DeBrunner, A novel adaptive filter implementation scheme using distributed arithmetic, in Proc. Asilomar Conf. Signals, Syst., Comput., , pp. 160164, Nov. 2011.

S. Haykin and B. Widrow LeastMeanSquare Adaptive Filters Hoboken, NJ, USA: Wiley, 2003.

P. K. Meher and S. Y. Park, Highthroughput pipelined realization of adaptive FIR filter based on distributed arithmetic, in VLSI Symp. Tech. Dig., Oct. 2011, pp. 428433.

Sang Yoon Park, Member, IEEE, and Pramod Kumar Meher, Senior Member, IEEE LowPower, HighThroughput, and LowArea Adaptive FIR Filter Based on Distributed Arithmetic IEEE Transactions On Circuits And SystemsIi: Express Briefs, Vol. 60, No. 6 June 2013.

S. A. White Applications of the distributed arithmetic to digital signal processing: A tutorial review, IEEE ASSP Mag., vol. 6, no. 3, pp. 419, Jul. 1989.

Hanan A.Mahmoud and Magdy.A. Bayoumi, A 10Transistor Low Power High Speed Full Adder Cell.