 Open Access
 Total Downloads : 23
 Authors : E. Suruthi, D. Sowmiya
 Paper ID : IJERTCONV3IS16113
 Volume & Issue : TITCON – 2015 (Volume 3 – Issue 16)
 Published (First Online): 30072018
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Memory Optimization in Adaptive Fir Filter using APCOMS and CSE Method
E. Suruthi Prof. D. Sowmiya

Scholar, Associate Professor,
Dept. of Electronics and communication Engineering Dept. of Electronics and communication Engineering Sri Muthukumaran Institute Of Technology Sri Muthukumaran Institute Of Technology Chennai,Tamilnadu,India Chennai,Tamilnadu,India
Abstract An efficient architecture for the implementation of Adaptive FIR Filter for achieving optimized memory. Besides, we have described the design and implementation of Adaptive FIR filter using APCOMS and CSE method. The APCOMS involves reduction in LUT size to thirdfourth of the conventional LUT and the error is eliminated at the output. The CSE method is to reduce the number of shifting operation in the multipliers. The area, power is reduced and the speed is increased. The overall design implementation on the ALTRA cyclone III FPGA kit.
Index Terms: Adaptive FIR Filter, APCOMS, CSE.

INTRODUCTION

ADAPTIVE FIR filters have a wide range of communication and DSP applications such as adaptive equalization,system identification and image restoration. The directform LMS adaptive filter involves a long critical path due to an innerproduct computation to obtain the filter output. The
critical path is required to be reduced by pipelined implementation when it exceeds the desired sample period. The conventional LMS algorithm does not support pipelined frequency but, they involve implementation because of its recursive behavior, it is modified to a form called the delayed LMS (DLMS) algorithm which allows pipelined implementation of the filter. A lot of work has been done to implement the DLMS algorithm in systolic architectures to increase the maximum usable frequency but they involve an adaptation delay. A systolic architecture, where they have used relatively large processing elements (PEs) for achieving a lower adaptation delay with the critical path of one MAC operation. Memorybased computing systems are more
regular than the multiplyaccumulate structures, and well suited for many digital signal processing (DSP) algorithms, which involve multiplication with fixed set of coefficients. In adaptive fir filter coefficients are not fixed. The existing work on the fixed point LMS adaptive filter does not discuss the APCOMS approach. In this fixed point implementation using a novel partial product generator(PPG).So it can increase the area, time, delay and power. We have referred to this as oddmultiplestorage (OMS) scheme and anti symmetric product coding. In this paper, we propose a
combined APCOMS technique, it could be reduced the LUT to ThirdFourth of the conventional LUT size. Since the approach area, time, delay and power reduced compare the canonical sign digit (CSD) multiplier but the multiplications are largely involved. Reduction in the number of multiplication is possible by the CSE method with the shift and add algorithm. The area and power is reduced and the speed is increased. We discuss the synthesis of the proposed architecture and comparison with the existing architectures.

REVIEW OF FIXED POINT IMPLEMENTATION
In this Existing adder we will use the implementation of a delayed least mean square(DLMS) adaptive filter. For achieving lower adaptationdelay and areadelaypower efficient implementation, use a novel Partial Product Generator(PPG). The Fixed point LMS Adaptive filter implementation is used to reduce the number of pipeline delays along with area,sampling period and energy consumption.The design more efficient in terms of the Power Delay Product(PDP) and Energy Delay Product(EDP).
Fig.1. Existing system error computation block
This structure to minimize the adaptation delay in the error computation block, followed by the weightupdate block. The structure for errorcomputation unit of an Ntap DLMS adaptive filter shown in fig.1. It consists of N number of 2b partial product generators (PPG) corresponding to N multipliers and a cluster of L/2 binary adder trees, followed by a single shiftadd tree. The structure of each PPG consists of L/2 number of 2to3 decoders and the same number of AND/OR cells (AOC).Each of the 2to3 decoders takes a 2b digit (u1u0) as input and produces three outputs. Each AOC consists of three AND cells and two OR cells. It provides nearly 20% saving in the ADP and 9% saving in EDP. The design with a clock slower than the maximum usable frequency and a lower operating voltage to reduce the power consumption.

PROPOSED ARCHITECTURE
The block diagram of the full adder cell and its building blocks are shown in Figure 2. In addition, various circuits have been proposed for each module.
Fig.2.Proposed block diagram
In this block diagram the APCOMS plays the major role.The input signal is applied to the filter,it removes the unwanted signal and the output send to the comparator.The comparator compares the desired signal and the input signal.Then the error output send to the APCOMS.The output of APCOMS again send to the filter and the output send them out.The frequency response realized in the time domain is of more interest for FIR filter realization (both hardware and software).
A. Adder Tree
In the RCA(Ripple Carry Adder) method, two inputs are added from LSB to MSB where each carry is added with forthcoming bits. It increases propagation delay.In the parallel adder method, Both sum and Carry are generated
in same time cycle using XOR and AND gates.The carry zero(0) to be stored in the Parallel Adder(PA) and the carry to be one(1) to be stored in the BEC(Binary Excess Code).Multiplexer is used for multiplication operation.In this adder tree is used to reduce the computation time.So the power is saved.
Fig.3. Adder block diagram
B.Shift/Add Tree
The shift/add tree is placed on the filter convolution using multiplier. The convolution operation means it can perform the multiplication operation. If the two addresses are multiplied number of operation increased but in the shift/add tree quickly perform the operation.
Example:

THE APCOMS AND CSE METHOD APCOMS METHOD
In the APCOMS technique,the LUT tables size is reduced to thirdfourth of the conventional LUT.The APCOMS block diagram shown in the Fig. A conventional lookup table (LUT)based multiplier is shown in Fig. 1, where A is
a fixed coefficient, and X is an input word to be multiplied with A. Assuming X to be a positive binary number of word length L, there can be 2L possible values of X, and accordingly, there can be 2L possible values of product C =
A X.
Fig.4.APCOMS block diagram
Therefore, for memorybased multiplication, an LUT of 2L words, consisting of pre computed product values corresponding to all possible values of X, is conventionally used.
A.APC for LUT optimization
For simplicity of presentation, we assume bothX and A to be positive integers.The product words for different values of X for L = 5 are shown in Table I. It may be observed in this table that the input word X on the first column of each row is the twos complement of that on the third column of the same row.In addition, the sum of product values corresponding to these two input values on the same row is 32A. Let the product values on the second and fourth columns of a row be u and v, respectively. The product values on the second and fourth columns of Table I therefore have a negative mirror symmetry. This behaviour of the product words can be used to reduce the LUT size, where, instead of storing and v, only [(v u)/2] is stored for a pair of input on a given row. Since one can write u = [(u + v)/2
(v u)/2] and v = [(u + v)/2 + (v u)/2], for (u + v) = 32A, we can have
The 4bit LUT addresses and corresponding coded words are listed on the fifth and sixth columns of the table, respectively. Since the representation of the product is derived from the antisymmetric behavior of the products, we can name it as antisymmetric product code. The 4bit address X_ = (x_3x_2x_1x_0) of the APC word is given by
where XL = (x3x2x1x0) is the four less significant bits of X, and X_L is the twos complement of XL.The desired product could be obtained by adding or subtracting the stored value (v u) to or from the fixed value 16A when x4 is 1 or 0, respectively.
Fig.5.Proposed APCOMS combined LUT
Product word = 16A + (sign value) Ã— (APC word). where sign value = 1 for x4 = 1 and sign value = 1 for x4 = 0. The product value for X = (10000) corresponds to APC value zero, which could be derived by resetting the LUT output, instead of storing that in the LUT.
Fig.6. LUTbased multiplier for L = 5 using the APC technique
TABLE I APC WORDS FOR DIFFERENT INPUT VALUES
B.Modified OMS for LUT optimization
The multiplication of any binary word X of size L, with a fixed coefficient A, instead of storing all the 2L possible values of C = A X, only (2L/2) words corresponding to
the odd multiples of A may be stored in the LUT, while all the even multiples of A could be derived by leftshift operations of one of those odd multiples.
In Table II, we have shown that, at eight memory locations, the eight odd multiples, A Ã— (2i + 1) are stored as Pi, for i = 0, 1, 2, . . . , 7. The even multiples 2A, 4A, and 8A are derived by leftshift operations of A. Similarly, 6A and 12A are derived by left shifting 3A, while 10A and 14A are derived by left shifting 5A and 7A, respectively. A barrel shifter for producing a maximum of three left shifts could be used to derive all the even multiples of A.
TABLE II OMSBASED DESIGN OF THE LUT OF APC WORDS
It may be seen from Tables II and III that the 5bit input word X can be mapped into a 4bit LUT address (d3d2d1d0), by a simple set of mapping relations di = x“i+1, for i = 0, 1, 2 and d3 = x0 where X= (x3x2x1x0) is generated by shiftingout all the leading zeros of X_ by an arithmetic right shift followed by address mapping.
where YL and Y _L are derived by circularly shiftingout all the leading zeros of XL and X_L, respectively.
TABLE III REDUCED APCOMS ADDRESS CSE METHOD
In adaptive FIR filter, the memory to be reduced based on the architecture level using Common Subexpression Elimination (CSE) method. When a portion of an expression (subexpression) occurs more than once ,it can be calculated once and the result can be used further. The Common Sub expression Elimination method is used for reducing the number of shifting and adding operations and increasing the speed. The use of multiplier is reduced based on the shift and add method. The shift and add algorithm used to reduce the number of multiplications in the APCOMS output. So the area and power is reduced when compared to the existing method.
Then the output y(n) is the combination of input signal x(n) and the coefficient b(k).Both multiplied and the next iteration delay is added for the coefficient calculation.
Example:
A0000 0000
B0000 1001 Q1100
Fig.7.Coefficient Calculation
Fig.9.Power Report
Fig.10.Snapshots of the output waveform
The binary output value is same as the shift and add output and the number of shifting and adding is reduced in the shift and add method. So the area and power is reduced.
Fig.8.Area Report

SIMULATION RESULTS
Simulations have been performed using Modelsim 6.4a Simulation tool technology Fig.10 shows the input and output waveform results for proposed APCOMS technique. Fig.9. shows the power analyzer summary.
Fig.8. shows the area report. Proposed APCOMS provide good accuracy of the input signal. The results of this proposed design reduced the LUT size into thirdfourth of the conventional LUT size. By this we can clearly decide that the proposed circuit can have lower area overhead than the other conventional circuits. The same process applied to the CSE method. The area is reduced. From the results, it is clear that the proposed circuit can have very less power.

CONCLUSION AND FUTURE WORK
Analyzing the APCOMS and CSE FIR architecture and performance, this paper presents a new architecture for DALUT. The proposed architecture applies the main concept of the basic APCOMS and CSE method implementing the MAC unit and at the same time has many advantages over its basic architecture.The results obtained show that with the proposed architecture, the computation time and the area used is reduced. The overall design
implemetations on the ALTRA cyclone III FPGA kit.The error out output is displayed on the system.

REFERENCES

Agrawal D.P and Meyer M.D (1993) A high sampling rate delayed LMS filter architecture, IEEE Journal of Circuit Syst, Vol. 40, no 11, pp. 727729.

Chang C.H, Jong C.C and Xu F. (2007) Design of LowComplexity FIR Filters Based on SignedPowersoftwo Coefficients With Reusable Common Subexpression, IEEE Journal of Very Large Scale Integration(VLSI) Syst, Vol. 26, no.10, pp. 18981907.

Cowan C.F.N, Ting L.K and Woods R. (2005) Virtex FPGA implementation of a pipelined adaptive LMS predictor for electronic support measures receivers, IEEE Journal of Very Large Scale Integration(VLSI) Syst, Vol. 13, no.1, pp. 8699.

Feng W.S and Van L.D (2001) An efficient systolic architecture for the DLMS adaptive filter and its applications, IEEE Journal of Analog Digital Signal Process, Vol. 48, no.4, pp. 359366.

Ho.H and Szwarc.V (2007) Hardware optimization for a reconfigurable polyphaseFFT design using common sub expression elimination, IEEE Conf., on Circuits and Systems,pp. 650653.

Ling F, Long G and Proakis J.G (1992) Corrections to the LMS algorithm with delayed coefficient adaptation, IEEE Journal on Signal Process, Vol. 40, no.1, pp. 230232.

Menard D, Rocher R, Scalart P and Sentieys O (2004) Accuracy evaluation of fixedpoint LMS algorithm, Proc. Inter. Conf. Acoust., on Speech, Signal Process, pp. 3441.

Pramod Kumar Meher (2010) Novel Input Coding Technique for High Precision LUTBased Multiplication for DSP Applications, Proc. Inter. Conf., on Very Large Scale Integration (VLSI) and System OnChip (SOC), pp. 201206.

Pramod Kumar Meher and Yu Pan (2014) BitLevel Optimization of AdderTrees for Multiple Constant Multiplications for Efficient FIR Filter Implementation,IEEE Journal of Circuits and Syst,Vol. 61,no.2 ,pp. 455462.

Qiuzhong Wu and Yihe Sun (2005) An integrated CAD tool for ASIC implementation of multiplierless FIR filters with common subexpression eliminatiom potimization,IEEE Conf., on ,pp. 67 72.