 Open Access
 Total Downloads : 449
 Authors : Miss. S. Anitha
 Paper ID : IJERTV2IS100775
 Volume & Issue : Volume 02, Issue 10 (October 2013)
 Published (First Online): 23102013
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
An Efficient Digital FIR Filter Designs Based on Parallel Faithfully Rounded Truncated MCM/A
An Efficient Digital FIR Filter Designs Based on Parallel Faithfully RounVdol.e2dIssue 10, October – 2013
Truncated MCM/A
Miss. S. Anitha. P.G scholar , V.S.B Engineering College.
Abstract: Finite impulse response (FIR) designs are presented using the concept of faithfully rounded truncated multipliers. System performance is generally determine by the performanceof multiplierWe jointly consider the optimization of bit width and hardware resources without sacrificing the frequency response and output signal precision. Nonuniform coefficient quantization with proper filter order is proposed to minimize total area cost.Multiple constant multiplication/accumulation in a direct FIR structure is implemented using an improved version of truncated multipliers. Here proposed vedic multiplication used for the improve the performance of the multiplication process.
Index TermsDigital signal processing (DSP), faithful rounding,
finite impulse response (FIR) filter, truncated multipliers,VLSI design.

INTRODUCTION
FINITE impulse response (FIR) digital filter is one of the fundamental components in many digital signal processing (DSP) and communication systems. It is also widely used in many portable applications with limited area and power budget.A general FIR filter of order M can be expressed asIn case of linear phase, the coefficients are either symmetric or antisymmetric with ai = aMi or ai = aMi.There are two basic FIR structures, direct form and transposed form, as shown in Fig. 1 for a linearphase evenorder FIR filter. In the direct form in Fig. 1(a), the multiple constant multiplication (MCM)/accumulation (MCMA) module performs the concurrent multiplications of individual delayed signals and respective filter coefficients, followed by accumulation of all the products. Thus, the operands of the multipliers in MCMA are delayed input signals x[n i] and coefficientsai. In the transposed form in Fig. 1(b), the operands of the multipliers in theMCM module are the current input signal x[n] and coefficients. The results of individual constant multiplicationsgo through structure adders (SAs) and delay elements.,
Fig. 1. Structures of linearphase evenorder FIR filters: (a) Direct form and (b) transposed form.
In the past decades, there are many papers on the designs and implementations of lowcost or high speed FIR filters [1][13],[15][19]. In order to avoid costly multipliers, most prior hardware implementations of digital FIR filters can be divided into two categories: multiplierless based and memory based.Multiplierlessbased designs realize MCM with shiftand add operations and share the common suboperations using canonical signed digit (CSD) recoding and common subexpression elimination (CSE) to minimize the adder cost of MCM
[1][10]. In [18] and [19], more area savings are achieved by jointly considering the optimization of coefficient quantization and CSE. Most multiplierless MCMbased FIR filter designs use the transposed structure to allow for crosscoefficient sharingand tend to be faster, particularly when the filter order islarge. However, the area of delay elements is larger compared with that of the direct form due to the range expansion of the constant multiplications and the subsequent additions in theSAs. In [17], Blad and Gustafsson presented highthroughput (TP) FIR filter designs by pipelining the carrysave adder trees in the constant multiplications using integer linear programming to minimize the area cost of full adders (FAs), half adders (HAs), and registers (algorithmic and pipelined registers).
Fig. 2. Three stages in digital FIR filter design and implementation
Memorybased FIR designs consist of two types ofapproaches: lookup table (LUT) methods and distributed arithmetic(A) methods [11][13]. The LUTbased design storesin ROMs odd multiples of the input signal to realize the constant
multiplications in MCM [11]. The DAbased approaches recursively accumulate the bitlevel partial results for the inner product computation in FIR filtering [12], [13]. An important design issue of FIR filter implementation is the optimization of the bit widths for filter coefficients, which has
direct impact on the area cost of arithmetic units and registers. Moreover, since the bit widths after multiplications grow, many DSP applications do not need fullprecision outputs. Instead, it is desirable to generate faithfully rounded outputs where the total error introduced in quantization and rounding is no more than one unit of the last place (ulp) defined as the weighting of the least significant bit (LSB) of the outputs. In this brief, we present lowcost implementations of FIR filters based on the direct structure in Fig. 1(a) with faithfully rounded truncated multipliers.
The MCMA module is realized by accumulating all the partial products(PPs) whereunnecessary PP bits (PPBs) are removed without affecting thefinal precision of the outputs. The bit widths of all the filter coefficients are minimized using nonuniform quantization with unequal word lengths in order to reduce the hardware cost whilestill satisfying the specification of the frequency response.This brief is organized as follows. Section II discusses the nonuniform quantization and optimization of filter coefficients.
Section III describes the PP generation and compression in the faithfully rounded MCMA module. Section IV compares the experimental results.experimental results.

COEFFICIENT QUANTIZATION AND
OPTIMIZATION
A generic flow of FIR filter design and implementation can be divided into three stages:
finding filter order and coefficients,coefficient quantization, and hardware optimization, as
shown in Fig. 2. In the first stage, the filter order and the corresponding coefficients of infinite precision are determined to satisfy the specification of the frequency response. Then, the coefficients are quantized to finite bit accuracy. Finally, various optimization approaches such as CSE are used to minimize the area cost of hardware implementations. Most prior FIR filter implementations focus on the hardware optimization stage.
After FIR filter operations, the output signals have larger bit width due to bit width expansion after multiplications. In many practical situations, only partial bits of the fullprecision outputs are needed. For example, assuming that the input signals of the FIR filter have 12 bits and the filter coefficients are quantized to 10 bits, the bit width of the resultant FIR filter output signals is at least 22 bits, but we might need only the 12 most significant bits for subsequent processing.
ALGORITHM IN MATLAB:
In this brief, we adopt the direct FIR structure with MCMA because the area cost of the flipflops in the delay elements is smaller compared with that of the transposed form. Furthermore,
we jointly consider the three design stages in Fig. 2 in order to achieve more efficient hardware design with faithfullyrounded output signals.
Unlike conventional uniform quantization of filter coefficients with equal bit width, the nonuniform quantization technique
with possibly different bit widths is adopted in this brief. Fig. 3 shows the pseudocode of the proposed quantization scheme.
Initially, subroutine Parks_McClellan() is used to find the filter order M for the given frequency response. Step 1 of uniform quantization starts with calling the MATLAB builtinfunction remez() to find the coefficients for the FIR filte of
order M. Then, we quantize the coefficients with enough bit and generate the set of uniformly quantized coefficients ai with equal bit width B. The subroutine freq_resp_satisfied() checks if the frequency response is still satisfied after quantization.
proportional to the number of FA cells required in the PPB compression because a FA reduces one PPB.
After Step 1 of uniform quantization and filter order optimization, the nonuniform quantization in Step 2 gradually reduces the bit width of each coefficient until the frequency response is no longer satisfied.
Finally, we finetune the nonuniformly quantized coefficients by adding or subtracting the weighting of LSB of each coefficient and check if further bit width reduction is possible. Using the algorithm in Fig. 3, we can find the filter order M and the nonuniformly quantized coefficients that lead to minimized area cost in the FIR filter implementation.

PP TRUNCATION AND COMPRESSION
Fig. 3. Proposed algorithm of coefficient quantization and fine tuning.
After coefficient quantization, we perform recoding to minimizethe number of nonzero digits. In this brief, we consider CSD recoding with digit set of {0, 1,1} and radix4 modifiedBooth recoding with digit set of {0, 1,1, 2,2} and select theone that results in smaller area cost.While most FIR filter designs use minimum filter order,we observe that it is possible to minimize the total area byslightly increasing the filter order. Therefore, the total area of the FIR filter is estimated using the subroutine area_cost_ estimate() using the approach in [20]. Indeed, the total number of PPBs in the MCMA is directly
The FIR filter design in this brief adopts the direct formin Fig. 1(a) where the MCMA module sums up all the products ai Ã— x[n i]. Instead of accumulating individual multiplication for each product, it is more efficient to collect all the PPs into a single PPB matrix with carrysave addition to reduce the height of the matrix to two, followed by a final carry propagation adder. Fig. 4 illustrates the difference of individual multiplications and combined multiplication for A Ã— B + C Ã— D.
In order to avoid the sign extension bits, we complement the sign bit of each PP row and add some bias constant using the property Â¯ s = 1 s, where s is the sign bit of a PP row, All the bias constants are collected into the last row in the PPB matrix.
The complements of PPBs are denoted by white circles with overbars. In the faithfully rounded FIR filter implementation, it is required that the total error introduced during the arithmetic operations is no larger than one ulp. We modify a recent truncated multiplier design in [14] so that more PPBs can be deleted, leading to smaller area cost. Fig. 6 compares the two approaches. In [14], the removal of unnecessary PPBs is composed of three processes: deletion, truncation, and rounding.
Two rows of PPBs are set undeletable because they will be removed at the subsequent truncation and rounding.
TRUNCATION & ROUNDING:
In this brief, we propose an improved version of the faithfullyrounded truncated multiplier design as shown in Fig. (b).
TABLE I
SPECIFICATIONS OF THE THREE FIR FILTERS UNDER CONSIDERATION
a single row of PPBs is made undeletable (for the subsequent rounding), and the PPB elimination consists of only deletionand rounding. The error ranges of deletion and rounding in the
improved version are as follows:
ulp E_D 0 <1/2 ulp ED=E_D +1/2 ulp 1/2
ulp < E_R 0 1/2 ulp < ER=E_R +1/2 ulp 1/2<ulp
ulp < E=(ED + ER) ulp.
Since the range of the deletion error in the improved version is twice larger than that in [14], more PPBs
can be deleted, leading to smaller area in the subsequent PPB compression

EXPERIMENTAL RESULTS AND
COMPARISONS
We implemented three FIR filters with the specificationsgiven in Table I [15], [16]. M is the original filter order while Mopt is the filter order with optimized total area using the method in Fig. 2. B denotes the number of fractional bits for uniformly quantized coefficients with filter order Mopt,
EWL is the effective word length without counting the leading sign bits, fpass and fstop are the passband and stopband edge frequencies normalized to one, and Apass and Astop denote the corresponding peak topeak ripples.
SIMULATION RESULTS:
TRUNCATION SCHEME1:
CONVENTIONAL MULTIPLIER:
TABLE:
MULTIPLIERS 
AREA~ 
POWER~ 
Scheme1 
1678 
55pw 
Scheme2 
1306 
42pw 
TRUNCATION SIGNED MULTIPLIER:
TRUNCATION2 UNSIGNED MULTIPLIER:

TAP FIR FILTER:
LINEAR PHASE FIR:
Proposed work:
Although most prior designs are based on the transposed form, we observe that the direct FIR structure with faithfully rounded MCMAT leads to the smallest area cost and power consumption.
Using vedic multiplication , also reduce the hard ware computation.
CONCLUSION:
This brief has presented lowcost FIR filter designs by jointly considering the optimization of coefficient bit width and hardware resources in implementations. Multiplier is the key component of many high performance systems, using this truncation parallel multiplier area ,complexity, power are reduced.
REFERENCES:
1.H.J. Ko and S.F. Hsiao, Design and application of faithfully rounded sand truncated multipliers with combined deletion, reduction, truncation and rounding, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 5, pp. 304308, May 2011.

F. Xu, C. H. Chang, and C. C. Jong, Contention resolution algorithms forcommon subexpression elimination in digital filter design, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 52, no. 10, pp. 695700, Oct. 2005.

I.C. Park and H.J. Kang, Digital filter synthesis based on an algorithm to generate all minimal signed digit representations, IEEE Trans. Comput.Aided Design Integr. Circuits Syst., vol. 21, no. 12, pp. 15251529, Dec. 2002.


C.Y. Yao, H.H. Chen, T.F. Lin, C.J. J. Chien, and X.T. Hsu,
A novel commonsubexpressionelimination method for synthesizing fixedpoint FIR filters, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 11, pp. 22152221, Sep. 2004.

O. Gustafsson, Lower bounds for constant multiplication problems, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 11, pp. 974978, Nov. 2007.

A. Blad and O. Gustafsson, Integer linear programmingbased bitleveloptimization for highspeed FIR filter architecture, Circuits Syst. SignalProcess., vol. 29, no. 1, pp. 81101, Feb. 2010.

F. Xu, C. H. Chang, and C. C. Jong, Design of lowcomplexity FIRfilters based on signedpowersoftwo coefficients with reusable commonsubexpressions, IEEE Trans. Comput.Aided Design Integr. CircuitsSyst., vol. 26, no. 10, pp. 18981907, Oct. 2007.

Y. J. Yu and Y. C. Lim, Design of linear phase FIR filters in subexpressionspace using mixed integer linear programming, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 10, pp. 23302338, Oct. 2007.

K. C. Bickerstaff, M. Schulte, and E. E. Swartzlander, Jr., Reduced areamultipliers, in Proc. Int. Conf. Appl.Specific Array Processors, 1993,pp. 478489.

P. K. Meher, New approach to lookuptable design and memorybased realization of FIR digital filter, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 3, pp. 592603, Mar. 2010.