 Open Access
 Total Downloads : 150
 Authors : Mr. Jitesh R. Shinde, Prof. (Dr.) S. S. Salankar
 Paper ID : IJERTV3IS21230
 Volume & Issue : Volume 03, Issue 02 (February 2014)
 Published (First Online): 08032014
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Optimal Multiobjective Approach for VLSI Implementation of Digital FIR Filters
Mr. Jitesh R. Shinde
Research Scholar & IEEE member, Electronics Engg. dept., Priyadarshini College of Engg.,
Nagpur, India
Prof. (Dr.) S. S. Salankar
Electronics & Communication Engg. dept., G.H.Raisoni College of Engg.
Nagpur, India
=
=0
Abstract Filters are heart of any Digital Signal Processing (DSP) based system wherein, multipliers and adders are the basic
The Ztransform of the impulse response yields the transfer function of the FIR filter i.e.
component in Finite Impulse Response (FIR) filters. So, VLSI
implementation of DSP systems performance is generally determined by the performance of the multipliers and adders. Multiplier is generally the slowest element in the system. Furthermore, it is generally the most area consuming. But optimizing the speed and area of the multiplier is a major design issue because improving speed results mostly in larger areas.
So, in this paper, an optimal multiobjective approach for VLSI implementation of digital FIR filters is suggested, wherein the three main design constraints viz. area, speed and power are optimized simultaneously without affecting the functionality of design.
= =
= 1 ; (1.3)
Keywords FIR filter, direct form FIR filter, transpose form FIR filter, MAC, Multiplier, MCM, SCM.

INTRODUCTION
Finite impulse response (FIR) filters are of great importance in digital signal processing (DSP) systems since their characteristics in linearphase and feedforward implementations make them very useful for building stable highperformance filters.
Fig. 1.1: A discretetime FIR filter of order N. The top part is an Nstage delay line with N + 1 taps
FIR filters are clearly boundedinput bounded output (BIBO) stable, since the output is a sum of a finite number of finite multiples of the input values, so can be no greater than   times the largest value appearing in the input [2].
Physically, a discrete system (FIR filters) is realized or implemented either as a digital hardware or as software on a digital hardware. The processing of the discrete time signal by the digital hardware involves mathematical operations like addition, multiplication and delay.
In signal processing, a finite impulse response (FIR) filter is a filter whose impulse response (or response to any finite length input) is of finite duration, because it settles to zero in finite time. This is in contrast to infinite impulse response (IIR) filters, which may have internal feedback and may continue to respond indefinitely (usually decaying). [1]
The time domain representation of Nth order FIR system is,

WHY FIR FILTERS
An FIR filter has a number of useful properties which sometimes make it preferable to an infinite impulse response (IIR) filter. FIR filters:

Require no feedback. This means that any rounding errors are not compounded by summed iterations. The same relative error occurs in each calculation. This also makes implementation simpler.

Are inherently stable. This is due to the fact that, because there is no required feedback, all the poles are located at the origin and thus are located within the unit circle (the required condition for
=0
=
(1.1)
stability in a discrete, lineartime invariant system).
The impulse response h[n] of digital FIR filter can be calculated if we set x[n] = [] in the above relation, where [] is the Kronecker delta impulse. The impulse response for an FIR filter then becomes the set of coefficients bn, as follows:

They can easily be designed to be linear phase by making the coefficient sequence symmetric; linear phase, or phase change proportional to frequency, corresponds to equal delay at all frequencies. This property is sometimes desired for phasesensitive
=0
=
= ; For n=0 to N (1.2)
applications, for example data communications, crossover filters, and mastering.


BASIC BUILDING BLOCKS OF FIR FILTERS

MAC Unit
From figure 1, it is seen that the critical operations usually involve are many multiplications and/or accumulations. Hence for realtime signal processing, a high speed and high throughput MultiplierAccumulator (MAC) is always a key to achieve a high performance digital signal processing system.
A conventional MAC unit consists of multiplier and an accumulator that contains the sum of the previous consecutive products. The function of the MAC unit is given by the following equation:
F = Ai Bi (3.1)
Fig. 3.1.: Basic structure of MAC

ADDER UNIT
In electronics, an adder is a digital circuit that performs addition of numbers. In modern computers adders reside in the arithmetic logic unit (ALU) where other operations are performed. Depending on the area, delay and power consumption requirements, several adder implementations have been proposed. Ripple Carry Adders with the most compact design (O (n) area) among all types of adders, are the slowest in speed (O (n) time). Carry Select Adders (O (n) time) and (O (2n) area) are in between RCAs and CLAs (O (n) time) and (O (n log n) area) thus providing an optimum solution between the areaefficient RCAs and the shortestdelay CLAs.


DESIGN ISSUES
The main goal of a DSP processor design is to enhance the speed of the MAC unit, and at the same time limit the power consumption and number of gates (or area).
There are three sources of power dissipation in CMOS circuits: switching power Psw, shortcircuit power Psc, and leakage power Pleakage. Psw is often the most significant source, therefore efforts to reduce the power consumed in FIR system realizations, focus on reducing Psw. Since multiplications represent the most complex task in FIR computations, a lot of research has been carried out on reducing the complexity of or totally eliminating multiplications in computing the product terms in Eqn. (1) [10].
FIR filters have a large number of multiplications involved in the filter algorithm, which are usually implemented in floating point arithmetic (IEEE 754 doubleprecision binary floatingpoint format: binary32).
The floating point number system can accommodate a large range of numbers and so in floating point arithmetic higher accuracy in processing can be achieved. But the hardware implementation for floating point arithmetic is costlier and the speed of processing is low due to double calculations i.e., separate calculation for mantissa and exponent. In this arithmetic, the truncation and rounding errors occur both for multiplication and addition, whereas in fixed point arithmetic such errors occur only for multiplication. The addition in fixed point arithmetic leads to overflow, but the overflow is rare phenomena in floating point arithmetic due to larger dynamic range. Therefore, the floating point arithmetic is preferred for nonreal time applications on general purpose systems (computers) in which the cost and speed are not significant and fixed point arithmetic is preferred due to the reduced cost of the hardware and high speed processing [12].

MULTIOBJECTIVE PROBLEM FORMULATION Multiobjective optimization involves minimizing or
maximizing multiple objective functions subject to a set of constraints. Example problems include analyzing design tradeoffs, selecting optimal product or process deigns, or any other application where you need an optimal solution with tradeoffs between two or more conflicting objectives.
In VLSI implementation of digital FIR filters, design constraints which influence the performance of FIR filters are area, power and delay. But main hurdle in VLSI implementation of digital FIR system is that either design can be area efficient or power efficient or speed efficient; but not all areatimespeed efficient simultaneously. Optimizing one parameter affects the others.
So, the objective of this research work is to come up with step by step an optimal multiobjective approach for VLSI implementation of digital filters wherein all constraints viz. area, power and time are optimized simultaneously.

DESIGN APPROACH

Obtain Transposed Form of FIR Filter of figure 1.1.
Fig. 5.1.: FIR Filter Transposed Form
Advantages of transposed form are:

Computationally equivalent to direct form.

Can be obtained by reversing order of final addition followed by retiming. Now, all multiplications share one input.

The directform structure has the disadvantage that each adder has to wait for the previous adder to finish
before it can compute its result. For high speed hardware such as FPGAs/ASICs, this introduces latency which limits how fast the filter can be clocked. A solution to this is to use the transposed directform structure instead. With this structure, the delays between the adders can be used for pipelining purposes and therefore all additions/multiplications can be performed in fully parallel fashion. This allows realtime handling of data with very high sampling frequencies and also provides a solution to optimize the speed of the system.



MULTIPLICATION
Multiplication in digital FIR designs often involves the multiplication by constant coefficients as shown in figure 1.1. The shift and add loop of traditional multipliers can be replaced with a set of high speed wireshifts and then added in one quick step while still fulfilling the same binary multiplication shown in Equation 2.1.
Figure 5.2: Example of SCM approach based multiplier design
of nonzero terms within the discrete coefficients as each nonzero term corresponds to an additional adder in the hardware implementation. Depending on the target hardware, it may be possible to implement a linearphase FIR filter using less multipliers than the minimumphase filter by taking advantage of the symmetry even if the filter length of the linearphase is larger [3,4].
k =
n i=0
2i ki; (2.1)
For the bitparallel design of the MCM operation, the MCM problem is defined as finding the fewest number of
This optimization is sometimes referred to as
multiplierless design, although the shift and add structure created does still implement a multiplier. Single constant multiplication (SCM) is also a term that is used to describe the optimized constant multipliers. In hardware, the multiplication operation is considered to be expensive, as it occupies significant area. Hence, constant multiplications are generally realized using only addition, subtraction, and shift operations [5].
The logic for obtaining the shift and add structure of an SCM (figure 5.2) is to first convert the constant multiplicand into its binary form. For example, the constant (43)10 is converted to (101011)2. Then to multiply x by (43)10, shift x by a set amount for each 1 digit in the binary encoding. The amount of the shift is determined by the order of magnitude of that particular bit position. For (43)10, the MSB of the binary encoding is a 1, so x needs to be shifted left by five because the MSB has the magnitude of 32. The final step is to then add all of shifted values to compute the product.
The number of 2input additions necessary to perform the constant multiplication is the number of nonzero digits of the binary representation minus one. The example coefficient (43)10, (101011)2, requires three adders to form a product because there are four nonzero digits. While this optimization for constant multiplications is useful, it is not optimal.
The multiplier block of the digital FIR filter in its transposed form [Fig. 5.1], where the multiplication of filter coefficients with the filter input is realized, has significant impact on the complexity and performance of the design because a large number of constant multiplications are required. This is generally known as the multiple constant multiplications (MCM) operation. The goal is the minimization
addition and subtraction operations that realize the MCM, since shifts can be implemented using only wires in hardware.
Many efficient algorithms [6, 7] have been introduced for the MCM problem. In spite of various methods they use and different search space they explore, the main idea has always been the maximization of the sharing of common partial products among the constant multiplications. As an example, consider the constant multiplications 29x and 43x. Observe from Figure 5(a)(b) that the sharing of partial products 3x and 5x reduces the number of operations from 6 to 4. The same sharing of partial products approach has been used in our transposed form structure [11]. Thus, when using MCM instead of SCM, an added savings can be accomplished by reusing fundamentals between the constants.
Fig. 5.3: Shiftadds implementations of 29x and 43x (a) without partial product sharing; (b) with partial product sharing.
Many efficient algorithms [6, 7] have been introduced for the MCM problem. In spite of various methods they use and different search space they explore, the main idea has always been the maximization of the sharing of common partial products among the constant multiplications. As an example, consider the constant multiplications 29x and 43x. Observe from Figure (a)(b) that the sharing of partial products 3x and 5x reduces the number of operations from 6 to 4.

COMPARISON AND SIMULATION RESULT
In our work, we had designed three LTI filters viz. filter 1 (direct form), filter 2 (transposed form) and filter 3 (Optimized direct form) and then their performance was compared with respect to area, dynamic power dissipation and propagation delay.
Firstly, simple direct form FIR filter structure was implementated in MATLAB using FDA (Filter Design & Analysis) tool of MATLAB with following specifications:

Design Method : FIR equiripple

Response type : Low pass

Filter order: 17
The magnitude response of direct form FIR filter using floating point arithmetic and fixed point arithmetic were found to be same (figure 6.2).
Next, using these filter coeffiecients direct form, transposed form and optimized direct form FIR filter structure were implemented using Active HDL and their performance with respect to area, timing and dynamic power consumption were analysed using Xilinx tool at RTL level and Cadence SOC encounter tool at Layout level.
U37
U38
U39
U40
U41
U42
U43
U44
U45
U46
U47
U48
U49
U50
U53
U52
U51
rst clk
Elaborated settings done in FDA tool are shown in figure
x(7:0)
d(7:0) q(7:0)
rst cl k
latch
d(7:0) q(7:0)
rst cl k
latch
d(7:0) q(7:0)
U1
rst cl k
U2
a(7: 0) c(13: 0)
b(7: 0)
multi 16
U3
a(7: 0) c(13: 0)
b(7: 0)
multi 16
U24
a(7: 0) c(13: 0)
p>b(7: 0)
multi 16
latch
d(7:0) q(7:0)
rst cl k
latch
d(7:0) q(7:0)
rst cl k
latch
d(7:0) q(7:0)
rst cl k
latch
d(7:0) q(7:0)
rst cl k
latch
d(7:0) q(7:0)
rst cl k
latch
d(7:0) q(7:0)
rst cl k
latch
d(7:0) q(7:0)
rst cl k
latch
d(7:0) q(7:0)
rst cl k
latch
d(7:0) q(7:0)
rst cl k
latch
d(7:0) q(7:0)
rst cl k
latch
d(7:0) q(7:0)
rst cl k
latch
d(7:0) q(7:0)
rst cl k
latch
d(7:0) q(7:0)
a(7: 0) c(13: 0)
b(7: 0)
multi 16
rst cl k
latch
d(7:0) q(7:0)
rst cl k
latch
U16
U13
below:
h0(7:0) p(7:0) p(7:0) p(7:0)
h4(7:0)
p(7:0)
p(7:0) h7(7:0)
h8(7:0) h9(7:0) p0(7:0) p1(7:0) p2(7:0) p3(7:0) p4(7:0) p5(7:0) p6(7:0) p7(7:0)
b(31:0)
U36
a(13: 0)
b(31:0)
c(31:0)
U31
a(13: 0)
b(31:0)
c(31:0)
U32
b(31:0)
a(13: 0)
a(13: 0)
c(31:0)
U33
b(31:0)
a(13: 0)
a(13: 0)
U34
U35
b(31:0)
multi 16
multi 16
U12
U10
a(7: 0) c(13: 0)
b(7: 0)
multi 16
U9
a(7: 0) c(13: 0)
b(7: 0)
multi 16
U8
a(7: 0) c(13: 0)
b(7: 0)
multi 16
U7
a(7: 0) c(13: 0)
b(7: 0)
multi 16
U6
a(7: 0) c(13: 0)
b(7: 0)
multi 16
U5
a(7: 0) c(13: 0)
b(7: 0)
multi 16
U4
a(7: 0) c(13: 0)
b(7: 0)
multi 16
a(7: 0) c(13: 0)
b(7: 0)
multi 16
BUS7626(15:0)
a(13: 0)
U26
89(32
BUS86 :0) b(31:0)
c(31:0)
U27
a(13: 0)
b(31:0)
c(31:0)
U28
b(31:0)
a(13: 0)
c(31:0)
U29
a(13: 0)
b(31:0)
c(31:0)
U30
a(13: 0)
b(31:0)
c(31:0)
U11
a(7: 0) c(13: 0)
b(7: 0)
multi 16
U18
a(13: 0)
b(31:0)
c(31:0)
b(31:0)
c(31:0)
a(7: 0) c(13: 0)
b(7: 0)
multi 16
a(13: 0)
U23
a(7: 0) c(13: 0)
b(7: 0)
multi 16
a(13: 0)
U22
b(31:0)
c(31:0)
U14
a(7: 0) c(13: 0)
b(7: 0)
U25
b(31:0)
c(31:0)
U15
a(7: 0) c(13: 0)
b(7: 0)
U21
a(13: 0)
b(31:0)
c(31:0)
U20
b(31:0)
c(31:0)
multi 16
U19
a(13: 0)
b(31:
adder1431
adder1431
adder1431
c(31:0)
a(13: 0)
adder1431
b(31:0)
c(31:0)
adder1431
c(31:0)
adder1431
adder1431
adder1431
adder1431
adder1431
adder1431
adder1431
adder1431
adder1431
adder1431
adder1431
adder1431
adde
Fig.6.1: FIR Equiripple filter specification on FDA tool
From FDA tool, the filter coefficients for direct form FIR filter structure were obtained. But these coefficients were negative and in floating point format. In order to optimize the resources used (i.e. gates and hence area) at RTL ( Register Transfer level) , processing performance, system cost and ease of use; and since dynamic range of output is known, the floating point coefficients were converted into the fixed point coefficient by multiplying them with 1000 and taking the round off value of it. After that negative coefficients were converted into the positive coefficients by taking the absolute value of previous value.
Fig 6.2 Magnitude Response of FIR Filter (a) Using floating point arithmetic
(b) Using fixed point arithmetic
Fig.6.3: Direct Form FIR filter using Active HDL
The multiplier unit of MAC in direct form FIR filter is implemented using generic multiplier. The MCM block of MAC unit in transposed form is implemented using structure shown in figure 5.3(b). As per the concept of MCM approach derived from SCM approach, number of adders required to implement MAC unit of transposed structure will be high and thereby affecting area and power consumption of transposed form FIR filter structure.
h9(7:0) p0(7:0) p1(7:0)
U 5 1
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 1 7
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 1 6
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 2 1
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 1 4
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 1 2
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 1 1
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 1 0
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 9
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 8
a(7:0) c(13:0)
b(7:0)
mu lti1 6
U 7
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 6
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 5
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 4
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 1
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 2
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 3
a(7:0) c(13:0) b(7:0)
mu lti1 6
U 2 4
a(7:0) c(13:0)
b(7:0)
mu lti1 6
To optimize the problems faced in implementating the MAC unit of transposed form digital filter structure, we had suggested a slight modification in transposed form FIR filter structure and with same approach DF FIR was also redesigned. In optimized transposed form structure concept of co effiecient reuse is used to optimize further the resources used in comparison to structure shown in figure 5.3(b) and direct form structure FIR filter structure. In other words, it was observed that from18 filter coefficents obtained from FDA tool of MATLAB, five coefficents were repeated two or three times. So we had designed only five multipler unit based on structure shown in figure 5.3(b). The final direct form structure of FIR filter and transposed of direct form structure of FIR filter are shown in figure 6.3 and figure 6.4 respectively. The optimized DF form structure is shown in figure 6.5.
x(7:0)
h0(7:0) p(7:0) p(7:0) p(7:0) h4(7:0)
p(7:0) p(7:0)
h7(7:0)
h8(7:0)
p2(7:0) p3(7:0) p4(7:0)
p5(7:0) p6(7:0) p7(7:0)
U36 U31 U32 U33 U34 U35 U26 U27 U28
U29 U30
U18 U23 U15 U22
U20 U19 U52
U37 latcp2 U38 latcp2 U39 latcp2 U40 latcp2 U41 latcp2 U42 latcp2 U43 latcp2 U44 latcp2 U45 latcp2 U46 latcp2 U47 latcp2 U48 latcp2 U13 latcp2 U25 latcp2 U49 latcp2 U50 latcp2 U53 latcp2
b(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) c
adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431
clk rst
a(13:0)
rst clk
a(13:0)
rst clk
a(13:0)
rst clk
a(13:0)
rst clk
a(13:0)
rst clk
a(13:0)
rst clk
a(13:0)
rst clk
a(13:0)
rst clk
a(13:0)
rst clk
a(13:0)
rst clk
a(13:0)
clk
rst
a(13:0)
clk
rst
a(13:0)
rst clk
a(13:0)
rst clk
a(13:0)
rst clk
a(13:0)
rst clk
a(13:0)
rst clk
a(13:0)
Fig.6.4: Transposed of Direct Form FIR filter using Active HDL
x(7:0)
U36 U31 U32 U33 U34 U35 U26 U27 U28 U29 U30 U18 U23 U15 U22 U20 U19 U52
U37 latcp2 U38 latcp2 U39 latcp2 U40 latcp2 U41 latcp2 U42 latcp2 U43 latcp2 U44 latcp2 U45 latcp2 U46 latcp2 U47 latcp2 U48 latcp2 U13 latcp2 U25latcp2 U49 latcp2 U50 latcp2 U53 latcp2
b(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) d(31:0) q(31:0) b(31:0)c(31:0) c(
adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431 adder1431
clk
a(13:0)
rst cl k
a(13:0)
rst cl k
a(13:0)
rst cl k
a(13:0)
rst cl k
a(13:0)
rst cl k
a(13:0)
rst cl k
a(13:0)
cl k
rst
a(13:0)
rst cl k
a(13:0)
rst cl k
U 4
a(7:0) y(13:0)
mu lh 4
a(13:0)
rst cl k
a(13:0)
cl k
rst
a(13:0)
cl k
rst
a(13:0)
rst cl k
a(13:0)
rst cl k
U 1
a(7:0) y(13:0)
mu lh 3
a(13:0)
cl k
rst
U 2
a(7:0) y(13:0)
a(13:0)
mu lh 2
rst cl k
U 2 4
a(7:0) y(13:0)
mu lh 1
a(13:0)
rst cl k
U 3
a(7:0) y(13:0)
mu lh 0
a(13:0)
Fig.6.5: Optimized Direct Form FIR filter using Active HDL
Now, on comparing direct form FIR structure and transposed form FIR structure, it was observed that latch size in transposed form at each stage is increasing and hence the adder size is also increasing. So, this may increase area overhead and hence power consumption of design and also may add to the latency of the circuit. The same was observed after the implementation of transposed structure.
Compilation Report of direct form (filter 1), transposed form (filter 2) and optimized direct form (MCM with partial product sharing) (filter 3) of FIR filter are given below:
Table 7.3.1 Compilation Summary of Filter 1, Filter 2 & Filter 3
Filter 0
Filter 1
Filter 2
Technology
gscl45nm
gscl45nm
gscl45nm
Global Operating Voltage
1.1 V
1.1 V
1.1 V
Combinational
area
27431.992899
sq.nm
28421.277155
sq.nm
14868.831567
sq.nm
Noncombinational area
1404.145630
sq.nm
5616.582520
sq.nm
5620.336920
sq.nm
Total cell area
28836.138528
sq.nm
34037.859675
sq.nm
20489.168487
sq.nm
Cell Internal Power
3.5082 mW
7.7709 mW
1.4805 mW
Net Switching
Power
2.3820 mW
5.0423 mW
864.7971 uW
Total Dynamic Power
5.8902 mW
12.8132 mW
2.3453 mW
Cell Leakage Power
172.0155 uW
225.0586 uW
152.1185 uW
Worst Case Propagation delay (RTL
Xilinx report)
21.101 nsec
11.129 nsec
6.619 nsec


CONCLUSION
In this paper, implementation of low power, speed efficient and area efficient FIR filters using filter coefficient reuse concept and MCM technique based on partial product sharing has been considered wherein multiplication operations are replaced by shiftandadd operation. The experimental results showed that area has been reduced by 28.946 %, dynamic power consumption reduced by 60.183% and worst propagation delay by 68.232 % in optimized direct form FIR filter structure in comparison to direct form and transposed form digital FIR filter structure. This indicated that our proposed modification in direct form FIR filter structure and use of MCM technique based on partial product sharing and
use of concept coefficient sharing leads to an area efficient, low power and high speed digital FIR Filter structure for DSP systems.
Future research includes improvising the performance of the FIR system by implementing if possible adder unit using fast adders and a full characterization of each design option at layout level.
REFERENCES

Rabiner, Lawrence R., and Gold, Bernard, 1975, Theory and Application of Digital Signal Processing (Englewood Cliffs, New Jersey: PrenticeHall, Inc.) ISBN 0139141014.

A. E. Cetin, O.N. Gerek, Y. Yardimci, "Equiripple FIR filter design by the FFT algorithm," IEEE Signal Processing Magazine, pp. 6064, March 1997.

K. Johansson, O. Gustafsson, and L. Wanhammar, "Multiple Constant Multiplication for DigitSerial Implementation of Low Power FIR Filters," WSEAS Transactions on Circuits and Systems, vol. 5, no. 7, pp. 10011008,2006.

Y. Voronenko and M. Piischel, "Multiplierless Multiple Constant Multiplication," ACM Transactions on Algorithms, vol. 3, no. 2, 2007.

H. Nguyen and A. Chatterjee, "NumberSplitting with ShiftandAdd Decomposition for Power and Hardware Optimization in Linear DSP Synthesis," IEEE Trans. on VLSI, vol. 8, no. 4, pp. 419–424, 2000.

L. Aksoy, C. Lazzari, E. Costa, P. Flores, and J. Monteiro, Efficient shiftadds design of digitserial multiple constant multiplications, in Proc. Great Lakes Symp. VLSI, 2011, pp. 6166.

A. Dempster and M. Macleod, "Use of MinimumAdder Multiplier Blocks in FIR Digital Filters," IEEE TCAS II, vol. 42, no. 9, pp. 569 577, 1995.

Ahmed Shahein, Student Member, IEEE, Qiang Zhang, Niklas Lotze, and Yiannos Manoli, Senior Member, IEEE A Novel Hybrid Monotonic Local Search Algorithm for FIR Filter Coefficients Optimization, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 59, NO. 3, MARCH 2012.

Steven Smith, The Scientist and Engineers Guide to Digital Signal processing, Second Edition, Chapter 28, pp. 514519, California Technical Publishing, San Diego, California.

N.Sankarayya, K.Roy and D.Bhattacharya, Optimizing Computations in Transposed Direct Form Realization of FloatingPoint LTI FIR Systems, ComputerAided Design, 1997, Digest of TechnicalPapers., 1997 IEEE/ACM International Conference.

Levent Aksoy , Cristiano Lazzari, Eduardo Costa, Paulo Flores and Jose Monteiro, Optimization of Area in DigitSerial Multiple Constant Multiplications at GateLevel, Circuits and Systems (ISCAS), 2011 IEEE International Symposium, Rio de Janeiro.

A.Nagoor Kani, Digital Signal Proessing,Chapter 8, pp.8.18.16, Second Edition, Tata McGrawHill.