 Open Access
 Total Downloads : 297
 Authors : Sathyabhama. B, Sahaana Harishankar, Preetha. R
 Paper ID : IJERTV3IS031874
 Volume & Issue : Volume 03, Issue 03 (March 2014)
 Published (First Online): 31032014
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Optimized Architecture for an Adaptive FIR Filter using Distributed Arithmetic
Sathyabhama. B1, Sahaana harishankar2, Preetha3
1 Assistant Professor, Department of Electronics and Communication, Panimalar engineering college, Chennai, India 2 Sahaana harishankar, Department of Electronics and Communication, Panimalar engineering college, Chennai, India 3 Preetha, Department of Electronics and Communication, Panimalar engineering college, Chennai, India
Abstract the architecture is based on distributed arithmetic in which the partial products of filter coefficients are pre computed. These coefficients are stored in lookup tables (LUTs) and the weighted coefficient are updated using Offset Binary Coding (OBC).The filtering is done by shift and accumulate operations on these partial products. Thus, it results in lesser area comparatively and this technique can be applied to all FIR based adaptive filter application. Results are obtained using modelsim and altera quartus tool.
Index Terms adaptive filter, Distributed Arithmetic (DA), finite impulse response (FIR), least Mean Square (LMS), lookup table (LUT), Offset Binary Coding (OBC).

INTRODUCTION
Adaptive filters are widely used in many Signal Processing, applications like system identification and modeling, equalization, interference and echo cancelation. However, in many applications such as echo cancelation and system identification, coefficient adaption is needed. Such filters are generally made of Finite Impulse Response (FIR) filters whose coefficients are updated as per a minimization criterion. The output of a FIR filter is the weighted sum of present and past input samples and , hence they can be realized using multiply and accumulate(MAC) units .If N is the number of filter taps, a single MAC unit would take N clock cycles to produce one sample of output sequence. Although, multiple MAC units can be utilized in order to increase the speed of the system the system cost increases as the multipliers consume more area.
Distributed Arithmetic (DA) is one of the efficient techniques to realize the higher order filters and it can achieve the high throughputs without using Hardware Multipliers. Distributed Arithmetic is essentially a bit serial operation that produces the sum of products in a fixed number of clock cycles regardless of the number of products to be summed up. The basic idea behind DA is that, the precomputed partial products are stored in look up tables (LUT), accessing and
instant n is given by
N 1
y[n]= w[i]x[n i]
i0
shiftaccumulating. The right ones will generate the output. Furthermore, the Multiplier less architecture of DA makes it the most effective one to realize the higher order filters. A fixed coefficient filter can be easily realized using DA by storing the partial products of filter coefficients in the LUT. However the DA treatment to adaptive filters will face certain difficulties not encountered in the fixedcoefficient case namely, the two principal operations filtering and weight updating are manually coupled and so the partial products of the filter coefficients stored in the LUTs are recalculated before the filtering operation. However, few attempts have been made to realize adaptive filters using DA by approximations to standard adaptation algorithms which will degrade the performance.

FIR FILTER DESIGN
A. DA (DISTRIBUTED ARITHMETIC) based FIR Filter
Memory based structures are wellsuited for many digital signal processing (DSP) algorithms, which involve multiplication with a fixed set of coefficients. For this Distributed Arithmetic architecture is used in our FIR filter design along with the technique of Offset Binary Coding(OBC). DA is an efficient technique for calculation of sum of products or vector dot product or inner product or multiply and accumulate (MAC). MAC operation is very common in all Digital Signal Processing Algorithms. This technique consists of Accumulators, Look Up Tables and Shift registers. According to this technique, all of the multipliers are totally removed and are replaced by array multipliers. The LUTs can be subdivided into many LUTs so that the size of the higher order filters can be reduced. In DA, the cumulative partial products are precomputed and stored in a Look Up Table (LUT) that is addressed by multiplier bits. The output y[n] of an N tap fir filter with the present input sample x[n] at any
Fig.1. DA based implementation of four tap fir filter
Where w[i] (i=0,1,,N1) denotes the weights of the filter
.By representing each of the input samples x[ni] in the twos complement form,we got
B1
The output signal is compared to a second signal d (n), called the desired response signal, by subtracting the two samples at time n. This difference signal, given by
e(n) = d(n) y(n) ,
is known as the error signal. The error signal is fed into a procedure which alters or adapts the parameters of the filter from time n to time (n + 1) in a welldefined manner. This process of adaptation is represented by the oblique arrow that pierces the adaptive filter block in the figure. As the time index n is incremented, it is hoped that the output of the adaptive filter becomes a better and better match to the desired response signal through this adaptation process, such that the magnitude of e(n) decreases over time. In this context, what is meant by better is specified by the form of the adaptive algorithm used to adjust the parameters of the adaptive filter.
In the adaptive filtering task, adaptation refers to the method by which the parameters of the system are changed from time
X [ni] = x
ni
bi,B1
bi,B1 j j 1
2 j
(2)
index n to time index (n + 1). The number and types of parameters within this system depend on the computational
By substituting (2) into (1) and rearranging, we will get
B1
structure chosen for the system.
Where
Y[n]= c
j 0
N 1
B1 j
2 j
A. DABased FIR Filter With the OBC technique
The ROM size in the before mentioned block can be further reduced using the OBC technique [5] as follows. By rewriting
cB1 j. wi bi,B1 j
i0
N 1
cB1 wi bi,B1
i0
(j 0 )
(2) as x
ni
(1/ 2)[x
ni
(x
ni )]
Thus for the given set of wi (i=0, 1…, N1).the terms cB1 j
Would take only one out of
2 N possible combinatonal, which
can be precomputed and stored in LUT.The DA implementation of a four tap FIR filter is shown in fig.1. The incoming bits of input samples are stored in the registers in the order that, at any instant the bits of the most recent input sample are stored in the top most register while the bits of the oldest sample are stored in the bottommost register. The least significant bits (LSBs) from each of the registers from the address lines to the LUT containing the partial products. The partial products are then shifted and accumulated for B" number of clock cycles to produce one sample of the output.

ADAPTIVE FILTERING TECHNIQUE
The diagram below shows a block diagram in which a sample from a digital input signal x(n) is fed into a device, called an adaptive filter, that computes a corresponding output signal sample y(n) at time n. For the moment, the structure of the adaptive filter is not important,
Fig.3. DA based implementation of four tap fir filter using the OBC scheme
Choose
(5)
Bysubstituting (4) and (5) into (1) and rearranging we got
except for the fact that it contains adjustable parameters whose values affect how y(n) is computed.
.y[n] B1 (N 1 1w d
i
2
j 0 i0
Define
N 1 1
i,B1 j
)2 j – ( 1 N 1 w ) 2 ( B1)
i
2 i0
F i g . 2 . T h e g e n e r a l a d a p t i v e f i l t e r .
p j wi di, j ,
2
i0
0 j B 1
2
1 N 1
P initial wi
i0
(6)
We get this
B1
y[n]= p
j 0
B1 j
2 j p
initial
2( B1)
(7)
Now, for the given set of
wi (i=0,1,,N1), the terms PB1J
would take one out of 2 n combinations, half of which would
be the mirror image of the other half[4].Hence ,a 2 N 1 sized ROM can be used, the address of which can be obtained through the EXOR operation of all the LSBs with the LSB of the newest sample. Considering an adaptive filter that processes an input signal x(n) and generates an output signal y(n) as
Where
Y (n) = wT x
wT = [ w0 (n), w1(n),…wN 1(n)]
(8)
is the
is the coefficient
vector and
xT [x(n), x(n 1),…, x(n N 1)
Fig.4. Primary LUT
input sample vector, where N is the number of filter coefficients. If the least mean square (LMS) algorithm is
chosen, then each of the weights updated using
wk (k=0,1,,N1) can be
wk (n 1) wk (n) e(n)x(n k)
(9)
Where e(n) = d(n) y(n) is the error signal and is the approximate step size.

PROPOSED DA BLOCK FOR THE ADAPTIVE FILTER
The block schematic of the proposed DA architecture for efficient implementation of the filtering and weight update operations consists of a register bank to store the incoming input samples, a primary LUT (PLUT) that stores the combinations of weights that is responsible for the DA filtering operation in every iteration and also a secondary LUT(SLUT) that will store the combination of input samples. It also consists of a register R0 that along with S – LUT aids the weight adaptation process, a shift – and accumulate block of the DA filtering operation, and a combinational logic block that takes care of the weight adaptation process.
Fig.5. Secondary LUT
While the PLUT can store the OBC combinations of the filter weights, the SLUT stores the OBC combination of input samples except for the term containing the most recent input
sample which is stored in register
R0 .The entry of PLUT at
time n addressed by a can be expressed as
P (n) 1 w
(n) N 1 1w
(n)(1)qN 1 k ( a ) 1
k
a 2 0
k 1 2
1 aT w,
= 2 a=0, 1, 2
N
2 1
updated using the sum and the difference with the term
Â½[x(n)]. Mathematically, the new entries Si (n+1) of SLUT
(10)
can be obtained from the old entries Si (n) with the index entry i [0,2N 1 1] as
S (n+1) = (1) i1 R 1/ 2{S
(n) S
(n) } (16)
i 0 2[ i ]
2
2[ i ]
2
If the vector w is replaced by the vector x, which is represented as x T [x(n), x(n 1),…x(n N 1)] ,then a new LUT T(n) is formed with its entry at address location a given as ,
T(a)
(n) 1 x(n) N 1 x(n k)(1)qN 1k ( a ) 1,
2
k 1
N
2
Fig.6.contents of sLUT
a=0, 1,.., 2
1 (12)
Where
R0 (n) is the entry of register R0
at time n. When the
T(a) (n) R0 (n) s(a) (n)
Where R0 denotes the contents of the register
(13)
R0 at time n
new input sample x(n+1) has arrived ,the right – shifted version of it that is term1/2[x(n+1)] is stored in the register R0 , which is useful for weight adaptation at time n+1.The S LUT update scheme from time n to n=1 for a 4Tap filter is
and s(a) (n) is the entry of the SLUT at time n addressed by a
shown in fig.5, where the positive and negative terms of the
input samples are represented by 1 and 0,respectively.It
R0 (n)=
1 x(n)
2
N 1 1
( a )
(14)
can be observed that, at time n+1,there are enough contents in the SLUT but their locations are not in proper order for the weight adaptation. By close observation, it can be seen that the
S (a)
(n)
k 1
x(n k)(1)qN 1k 1
2
(15)
contents of SLUT are placed in the bit circularly rightshifted addressed locations. For example, if we consider the data (at
time n+1) at address location 1(001) given as Â½{x(n)+x(n1)
In Fig4, the contents of PLUT and SLUT at time instant n
for a four tap filter are shown. It can be observed that S LUT stores the OBC combinations (lower half) of input samples except for the sample x(n).In the SLUT update scheme averaging the SLUT entry with its next consecutive entry would generate a term that is independent of the oldest input sample. The content of register Ro is then subtracted and added with the result. For example, when N=4,the average of the first and second entries of SLUT would give
12[x(n 1) x(n 2)], Which is independent of the
third term x[n3] .Now the subtraction and addition of the term of [x{n}] with the result will generate the terms of [x{n}x(n1)x(n2)] and Â½{x(n)x(n1)x(n2)},respectively which are then stored in the same consecutive locations of S LUT. Similarly, the third and fourth locations of SLUT are updated by subtracting and adding the term1/2[x (n)] to their average and storing the difference and sum in exactly the same locations. In a similar way, all the entries of SLUT can be
x(n2)}which is supposed to be in address location 6(110) nothing but the circularly rightshifted version of the address bits of location 5(101).
If 1 and 0 are the positive and negative terms in the OBC combinations of input samples, at time n, the contents of S LUT would look like the binary sequence of the input samples, as shown in fig.6.At time n+1,the contents would be a circularly rightshifted version of that binary sequence . Similarly at time n+2, the contents would be a circularly right shifted version of the binary sequence at time n+1 and so on. Hence, for accessing the entries of SLUT in each iteration, instead of physically moving the contents, the address bits to the SLUT are circularly leftshifted. At time n+3, the sequence once again would be a normal binary sequence as shown in fig. 6.Accessing time can be reduced by maintaining two similar SLUTs namely, ODDLUT and EVENLUT as shown in fig.7, where the even location entries and odd location entries are stored respectively. Pinitial Can be stored in a register and can be updated in every iteration easily since
the term Pinitial is nothing but the additive inverse of the entry of PLUT at its last address location. The algorithm which explains the overall operation of the proposed filter is shown in fig.8
Fig. 7. Two LUT storing odd and even location
IV SIMULATION RESULTS
The following simulation results are obtained using altera modelsim and synthesized using quartus.
Fig. 8. Fir filter using array multiplier
Fig. 9. Fir filter using Distributed Arithmetic
Fig. 10. Adaptive fir filter using Distributed Arithmetic
simulated results.
SUMMARY RESULTS (TABLE 1)
DESCRIPTION
ADAPTIVE FIR FILTER
AREA
Logic elements – 1322
POWER
327.81mw
TIMING
4.327ns
SUMMARY RESULTS (TABLE 2)
Logic elements
– 288
DESCRI PTION
FIR FILTER WITH ARRAY MULTIPLIER
FIR FILTER WITH DISTRIBUTED
ARITHMETIC
AREA
Logic elements – 19
POWER
324.40mw
320.7mw
TIMING
15.923ns
4.824 ns

CONCLUSION
In brief, we have presented an FIR adaptive filter implementation based on DA. In order to reduce the memory requirement this has been implemented . Unlike the technique used in [11], hence the proposed technique uses an SLUT used to adapt weight process as in [10]. The technique used to store complex combination of input samples and the prev
ious input sample is smartly eliminated in order to update S LUT. Thus, the proposed structure will utilize very less chip area and can operate at higher throughput when compared to other existing blocks.
REFERENCES

D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson, LMS adaptive filters using distributed arithmetic for high throughput, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 7, pp. 13271337, Jul. 2005.

R. Guo and L. S. DeBrunner, Two highperformance adaptive filter implementation schemes using distributed arithmetic, IEEE Trans. Circuit Syst. II, Exp. Briefs, vol. 58, no. 9, pp. 600604, Sep. 2011.

R. Guo and L. S. DeBrunner, A novel adaptive filter implementation scheme using distributed arithmetic, in Conf. Rec. 45th ASILOMAR, Nov. 2011, pp. 160164.

C. F. N. Cowan and J. Mavor, New digital adaptivefilter implementation using distributedarithmetic techniques, Proc. Inst. Elect. Eng., vol. 128, no. 4, pt. F, pp. 225230, Feb. 1981.

D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson, LMS adaptive filters using distributed arithmetic for high throughput, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 7, pp. 13271337, Jul. 2005.

D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson, A novel high performance distributed arithmetic adaptive filter implementation on an FPGA, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2004, vol. 5, pp. V161V164.

.A. White, Applications of distributed arithmetic to digital signal processing: A tutorial review, IEEE ASSP Mag., vol. 6, no. 3, pp. 419, Jul. 1989.

D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson, An FPGA implementation for a high throughput adaptive filter using distributed arithmetic, in Proc. 12th Annu. IEEE Symp. Field Programmable Custom Comput. Mach., 2004, pp. 324325.

A. Croisier, D. J. Esteban, M. E. Levilion, and V. Rizo, Digital Filter for PCM Encoded Signals, U.S. Patent 3 777 130, Apr. 1973.

A. Peled and B. Lie, A new hardware realization of digital filters, IEEE Trans. Acoustics, Sound, Signal Process., vol. ASSP22, no. 4, pp. 456462, Dec. 1974.

Pocket Guide to DSP Processors, Berkeley Design Technology Inc. (2003, Dec.). [Online]. Available: http://www.bdti.com

Stratix Device Family Data Sheet, Altera Corporation. (2003).[Online].Available:http://www.altera.com/literature/lit index.html

J. P. Uyemura, Introduction to VLSI Circuits and Systems. New York:
Wiley, 2001

M. Surya Prakash and R. Shaik, High performance architecture for LMS based adaptive filter using distributed arithmetic, in Proc. ICICA, Mar. 2012, vol. 24, pp. 1822.