 Open Access
 Total Downloads : 273
 Authors : Biplab Roy
 Paper ID : IJERTV5IS030637
 Volume & Issue : Volume 05, Issue 03 (March 2016)
 DOI : http://dx.doi.org/10.17577/IJERTV5IS030637
 Published (First Online): 23032016
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Area Efficient Reloadable FIR Filter based on NEw Distributed Arithmetic (NEDA)
Biplab Roy
Department of Electronics and Communication Engineering Neotia Institute of Technology, Management and Science South 24 pgns, West Bengal, India
Abstract This paper describes the design and implementation of highly efficient circuit for the implementation of FIR filter in terms of power and area keeping the speed at per with the fully parallel DA. It is a multiplierless FIR filter which is designed based on new distributed arithmetic algorithm (NEDA). This NEDA based technique consists of Multiplexers, shifters and accumulator. Moreover in this architecture the filter coefficients can be any time modified at once, which is certainly an advantage over simple DA where the LUT contents need to recomputed before the memory is rewritten. Analysis on the performance are done using XilinxSysgen and Mathworks Simulink. The proposed architecture provides an efficient implementation compared to other existing structures for FIR Filter.
Keywords DA, NEDA, FIR filter, XILINXSysgen.

INTRODUCTION
FIR filters are extensively used in various communication systems mainly where phase linearity is important and have been the primary signal processing unit. In fact any system which changes the frequency content of the incoming signal one way or other, generally is called the "Filter" and so its performance improvement in terms of speed, power and area have always been and will remain demanding forever. In signal processing, the implementation method of filters and many others like various transforms have been primarily based on multipliers [1], which is the main component of the MAC unit, the very important building block which is defined as:
So people have been searching for alternate implementations of MACs in FPGAs and subsequently Distributed Algorithm evolved as a very efficient solution for LUTbased FPGAs. Inspired by the potential of the Xilinx FPGA lookup table architecture, the DA algorithm was resurrected in the early 90s and shown [3] to produce very efficient filter designs and it is based on an efficient partition of the function in partial terms using 2s complement binary representation of data. Many DSP applications use functions like convolution, correlation and filtering where Inner product computations are important. This inner product computation is done using the Distributed Arithmetic principles. The partial terms can be precomputed and stored in LUTs. Yoo et al. [4,5,6] observed that the requirement of memory/LUT capacity increases exponentially with the order of the filter, given that DA implementations need 2K words, K being the number of taps of the filter. This approach has another problem in changing the filter coefficients for changing its functional characteristic as then the LUT contents need to be recomputed before the memory is rewritten. The problem of exponential increment of LUT size can be partially solved by breaking up the single LUT into many and partial addressing. But this approach has neither been very efficient nor it can solve the need of recomputation of LUT contents for changing the filter characteristic.
In this work, We have tried to address both the problems by utilizing the NEDA algorithm [7] and designing a reloadable architecture where filter coefficients can be any time changed
[] = =1 [](1)
without any extra effort. The results of the implementation
experiment are analyzed in terms of parameters such as area
Where y[n] = response of network at time n, xk[n] = kth input variable at time n and Ak = weighting factor of kth input variable that is constant for all n, and so it remains time invariant.
Direct multiplier based implementation will be highly expensive in terms of area and power though due to the advent of lowpower, lowcost FPGAs, which have embedded customized DSP blocks within it and increasing use of FPGAs in advanced systems, this problem have been partially addressed [2]. FPGAs are gaining popularity because of its highly parallel execution which increases the speed many fold.
and speed. The brief description of the distributed arithmetic is presented in Section 2. The implementation of the FIR filter using NEDA algorithm is discussed in Section 3. The Section
4 presents the implementation results. The last section concludes the work and presents the future work.

DISTRIBUTED ARITHMETIC
Distributed Arithmetic, along with Modulo Arithmetic, are computation algorithms that perform multiplication with look up table based schemes. Indeed, DA specifically targets the sum of products (sometimes referred to as the vector dot product) computation that covers many of the important DSP filtering and frequency transforming functions. Ironically, many DSP designers have never heard of this algorithm. Inspired by the potential of the Xilinx FPGA lookup table architecture, the DA algorithm was resurrected in the early 90s and shown to produce very efficient filter designs [1].
The multiplyintensive nature of equation 1 can be appreciated by observing that a single output response requires the accumulation of K product terms. In DA the task of summing product terms is replaced by table lookup procedures that are easily implemented in the Xilinx configurable logic block (CLB) lookup table architecture. We start by defining the number format of the variable to be 2s complement, fractional – a standard practice for fixed point microprocessors in order to bound number growth under multiplication. The constant factors, Ak, need not be so restricted, nor are they required to match the data word length, as is the case for the microprocessor. The constants may have a mixed integer and fractional format; they need not be defined at this time. The variable, xk, may be written in the fractional format as shown in equation 2:

NEW DISTRIBUTED ARITHMETIC
DA computes the inner product of two multidimensional vectors. Thus, increase in the number of dimensions increases the memory requirement to store all the obtained products. This is due to the reason that, increase in number of dimensions increases the number of obtained partial products. The elimination of increased memory requirement is possible only if one or both of the inputs has a fixed set of coefficients. This method is commonly known as NEw Distributed Arithmetic (NEDA) [7]. Thus, using NEDA, distribution of arithmetic is done on the coefficient values instead of doing on the inputs. This results in memoryless DA architecture of the implemented systems. Conventional NEDA based architectures are bitserial in nature. Depending on the application and requirement, they can be designed as digit serial or bitparallel architectures. Thus, NEDA is classified
=1
=1
= 0 + 1 2
(2)
under the family of shiftadd algorithms. This can be simply explained considering the following MAC operation:
where xkb is a binary variable and can assume only values of 0 and 1. A sign bit of value 1 is indicated by xk0. The final result is obtained by first substituting equation 2 into equation 1 and is:
= 0
=
=
This can be written as:
(5)
=
+
1 2
(3)
1
=1
0
=1
=1
2
= (12 ) ( . ) (6)
and then explicitly expressing all the product terms under the summation symbols with
= [101 + 202 + + 0]20
.
Considering both ci ad xi in twos complement form as:
+[111 + 212 + + 1]21
= 2 + 1 2
(7)
i
i
=
+[1(1)
1
+ 2(1)
2
+ + (1)
]2(1)
(4)
where ci=0 or 1, k=N, N+1,M1, C M is the sign bit and CiN is the least significant bit. Substituting equation 7 into equation 6 one can get the following matrix:
Each term within the brackets of equation 4 denotes a binary AND operation involving a bit of the input variable and all the bits of the constant. The plus signs denote arithmetic sum operations. The exponential factors denote the scaled contributions of the bracketed pairs to the total sum. One can now construct a lookup table that can be addressed by the same scaled bit of all the input variables and can access the sum of the terms within each pair of brackets. One such implementation is shown in the fig 1.
Fig. 1. Example DA Architecture
1
1
1
0 0 2
= [20 21 27] ( ) ( ) (8)
1
1
7 7
where it is considered that the coefficients are of 8bit in fix_8_7 format and there are k no of coefficients. Equation 8 can be implemented as some initial additions of xks as the terms of the second bracket are all either 1 or 0. And finally these temporary addition results can be passed through a shifteradder to get the final result.

FIR FILTERS DESIGNED
We have designed FIR filters based on both DA and NEDA. Both are fully parallel implementation to gain the maximum speed. Fig 2 shows our circuit for DA based 7tap FIR LPF. Here we have fragmented addressing of the LUTs to minimize the memory buildup. First the inputs are registered, then its similar bits of the input samples are sliced out parallely by the bitbasher circuit, LUTs are addressed to get the partial sums parallel and finally shift accumulated to get the final result again parallel.
Fig. 2. 8bit 7tap FIR LPF DA implementation
Fig 3 shows the same 7tap FIR filter but based on NEDA.

RESULTS
Here also first the inputs are registered, then muxed out if the similar bits of the coefficients are logic 1, otherwise digit 0 is passed. The partial sums were calculated and shift accumulated to get the final result again parallely. The filter coefficients were calculated from the FDAtool with 10kHz sampling frequency and 2kHz cutoff and quantized into sfix_8_7 format.
Both the circuits were tested by applying a mixture of two sinusoids of 100Hz and 4kHz to test its lowpass behavior. Both behaved exactly same and the output is shown in the fig

Following the same procedure we also designed 14tap 8 bit FIR LPFs for both DA and NEDA for the purpose of comparing the resources taken when implemented in Spartan 3an FPGA which has been calculated by the Resource Estimator tool after they were mapped into the FPGA, as shown in the table 1.
Fig. 3 8bit 7tap FIR LPF NEDA implementation
Fig. 4. Simulation results(upper) mixture of two sinusoids 100Hz and 4kHz applied, (lower) the output which is of 100Hz showing low pass behavior. TABLE 1. Resource utilisation summery for Spartan 700an FPGA.
Resources
8bit DA FIR
8bit NEDA FIR
7tap
14tap
7tap
14tap
Slices
324
3024
507
1020
Multipliers
0
0
0
0


CONCLUSION
From the resource table (table 1) it seems that DA is area efficient for lower tap order but our NEDA based implementation becomes efficient as the number of taps are increased. Additionally our NEDA based implementation has got the advantage that the filter coefficients can be any time loaded into the circuit as can be seen from the figures.
ACKNOWLEDGMENT
Author acknowledges Neotia Institute of Technology, Management and Sciences, WB, India, for all of its financial support and encouragement.
REFERENCES

Keshab K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, Wiley, 1999.

Uwe MeyerBaese.Digital signal processing with FPGA[M]. Beijing:
Tsinghua University Press, 2006

Mintzer, L. FIR filters with the Xilinx FPGA FPGA 92 ACM/SIGDA Workshop on FPGAs pp. 129134

M.A. Soderstrand, L.G. Johnson, H. Arichanthiran, M. Hoque, and R.Elangovan, Reducing Hardware Requirement in FIR Filter Design,in Proceedings IEEE International Conference on Acoustics, Speech,and Signal Processing 2000, Vol. 6, pp. 3275 3278

H. Yoo, and D. Anderson, HardwareEfficient Distributed Arithmetic Architecture for HighOrder Digital Filters, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005, Vol. 5, pp. 125 128

T.Vigneswarn and P.Subbarami ReddyDesign of Digital FIR Filter Based on DDA algorithm Journal of Applied Science ,2007

Wendi Pan, Ahmed Shams, and Magdy A. Bayoumi, NEDA: A NEw Distributed Arithmetic Architecture and its Application to One Dimensional Discrete Cosine Transform, Proc. IEEE Workshop on Signal Processing Syst., Oct. 1999, pp. 159 168.G. O. Young, Synthetic structure of industrial plastics (Book style with paper title and editor), in Plastics, 2nd ed. vol. 3, J. Peters, Ed. New York: McGrawHill, 1964, pp. 1564.