Low Power FIR Filter Design using Truncated Multiplier on DSP Application

DOI : 10.17577/IJERTV3IS090681

Download Full-Text PDF Cite this Publication

Text Only Version

Low Power FIR Filter Design using Truncated Multiplier on DSP Application

S. Manikandan T. Karthik

Assistant Professor,ETE Department Assistant Professor,ECE Department Karpagam College of Engineering, Coimbatore. Karpagam College of Engineering,Coimbatore

Abstract In this paper, a novel approach is used to design a low power and an area efficient finite impulse response (FIR) design are presented using the concept rounded truncated multipliers. The optimization of bit width and hardware resources without sacrificing the frequency response and output signal precision are considered. Double precision in floating point representation are proposed to reduce the adders size while reducing the precision. Direct FIR structure is implemented using an improved version of truncated multipliers in multiple constant multiplication/acc-umulation (MCMA).When compared with the previous FIR filter design approaches, the proposed designs achieve low power results and best area.

Keywords Finite Impulse Response (FIR) filter, Digital Signal Processing (DSP), Floating Point Representation, Radix-4 Modified Booth Algorithm, Truncation Multiplier.


    Finite Impulse Response (FIR) filters are widely used as a basic component in several Digital Signal Processing (DSP) and Communication System and also used in Image Processing applications. Moreover it can be used in several portable applications among limited area and power budget.

    Generally Finite Impulse Response (FIR) filters are said to be non-recursive, since they do not make use of feedback and naturally it could be considered as stable. If the coefficients of the FIR filters are symmetrical, then it is said to be linear phase. Hence it delays the signals equally for all frequencies are more significant in several applications. Moreover it is straight-forward to keep away from overflow in an FIR filter.

    In general FIR filter can be expressed as an order of M follows

    In linear phase FIR filters, the coefficients may be either symmetric or antisymmetric with ai = aMi or ai = aMi.

    Usually the two basic FIR Structures are direct form and transposed form. Fig. 1 shows a linear-phase even-order FIR filter.Fig. 1(a) describes the direct form,where the multiple constant multiplication (MCM)/accumulation modules performs the parallel multiplications of each individual delayed signals and respective filter coefficients, followed by accumulation of all the products. Hence, the operands of the multipliers in MCMA are delayed input signals x[n-i] and

    Fig.1 Structure of linear phase even-order FIR filters: (a) Direct Form and (b) Transposed Form.

    Coefficients ai. But in Transposed form, which is shown in Fig. 1(b), the operands of the multipliers in the MCM modules are the present input signal x[n] and coefficients. The ouputs of the individual constant multiplications passed to structure adders (SAs) and delay elements. But in the past decades, there are many papers on the designs and implementations of low-cost or high-speed FIR filters[1][13],[15][19].As a result to avoid the costly multiplier, in the sense multipliers which occupied more area and power, for the most part prior hardware implementations of digital FIR filters can be classified into two categories: multiplierless based and memory based.

    To reduce the area of adder of MCM, Multiplierless based design realize MCM with shift-and-add operations and share the common sub operations with the use of canonical signed digit (CSD) recoding and common subexpression elimination (CSE) [1][10].Based on this, more area savings are achieved by combinely considering the optimization of coefficient quantization and CSE.When the filter order is large, most multiplierless MCM based FIR filter design use the transposed structure to allow for cross coefficient sharing and tend to be faster. Due to the range expansion of the constant multiplication

    and the subsequent additions in the SAs, the area of delay element is large when compared with the direct form. High- throughput (TP) FIR filter designs are presented by Blad and Gustafsson in [17] by pipelining the carry-save adder trees in the constant multiplications using integer linear programming to reduce the area cost of full adders (FAs), half adders (HAs), and registers (algorithmic and pipelined registers).

    Memory-based FIR filter designs again classified into two methods: lookup table (LUT) method and distributed arith- metic (DA) method [11][13]. The LUT-based design stores in ROMs odd multiples of the input signal to recognize the constant multiplications in MCM [11].The DA-based approaches recursively accumulate the bit-level partial results for the inner product computation in FIR filtering [12], [13].

    FIR filter implementation has an important design issue which has the optimization of the bit widths for filter coefficients and direct impact on the area cost of arithmetic units and registers. Likewise, given that the bit widths after multiplications increased, many DSP applications do not need full-precision outputs. As an alternative, it is attractive to produce faithfully rounded outputs where the total error introduced in quantization and rounding is not exceed more than one unit of the last place (ulp) defined as the weighting of the least significant bit (LSB) of the outputs. In this order, we present a low power and an area efficient implementation of FIR filter based on the structure of direct form in Fig. 1(a) with truncated multipliers. The MCMA module is achieve by accumulating all the partial products (PPs) where unnecessary PP bits (PPBs) are deleted without affecting the final precision of the outputs.

    Fig. 2 Digital FIR filter stages

    By using nonuniform quantization the bit widths of all the filter coefficients are minimized with unequal word lengths in order to decrease the hardware cost while still satisfying the specification of the frequency response.

    This paper is organised as follows: In section II, Quantization and optimization of filter coefficients are presented. Section III describes the PP truncation and compression in the MCMA module. Section IV describes the experimental results.Finally the simulation results are shown in section V and the work is concluded in section VI.


    A digital FIR filter design can be classified into three stages. First stage is finding filter order and coefficients, second stage is coefficient quantization, and third stage is hardware optimization which is shown in Fig. 2.To design a

    FIR filter the first step is to find the filter coefficients. The coefficient of the FIR filter can be calculated by different methods: frequency sampling method, window design method, parks McClellan method. The filter order and the coefficients are determined to satisfy the frequency response in the first stage. Next, the coefficients are quantized to finite bit accuracy. At last, different optimization approaches such as CSA are used to reduce the area of hardware implementations. Earlier FIR filter implementations focus on the hardware optimization stage.

    Then the output signals occupied larger bit width after FIR filter operation due to bit with expansion after multiplications. Here we used direct FIR structure with MCMA as the area of the flip-flops in the delay elements is smaller when compared with the transposed form. Additionally, we jointly consider the three design stages in Fig. 2 in order to achieve more efficient hardware design with faithfully rounded output signals.

    Parks McClellan algorithm is used to find the filer order for specific frequency response. Then the Matlab built in function ramez algorithm is used to find the coefficients for the FIR filter of order M. Next to coefficients quantization, we performed recoding to reduce the number of nonzero digits. Hence we considered radix-4 modified Booth recoding with digit set of {0, 1,-1, 2,-2} and most FIR filter designs use minimum filter order. We observed that it is possible to reduce the total area by considerably increasing the filter order. Practically the total number of PPBs in the MCMA is directly proportional to the number of FA cells necessary in the PPB compression as a FA reduces one PPB.Frequency response of digital filter order is shown in Fig. 3. Phase and Magnitude of the Low Pass Filter frequencies are plotted which is shown in Fig. 3.Corresponding coefficients are generated using the above mentioned algorithm. With these coefficients we can generate the bit width and therefore using floating point arithmetic.

    Fig. 3 Frequency Response of FIR filter order


    The major importance of truncated multiplier involves reduction in area, delay and power consumption. In general, the truncation which limits the number of digits right to the decimal points. M x N truncated multipliers which produce results less than m + n bits long. By reducing the partial product delay is not improved since the height of the matrix is remains unchanged. The direct form FIR filter design which is shown in Fig. 1(a), in which the MCMA module has sums up the entire partial product. It is more efficient to collect all the PPs into a single PPB matrix with carry save addition to reduce the height of the matrix to two instead of accumulating individual multiplication for each product.

    To avoid the sign extension bits, we could complement the sign bit of each PP row and add some bias constant using the property s 1 s , where s is the sign bit of a PP row which is shown in Fig. 4.In the PP matrix, all the bias constants are collected into the last row and the white circle with over bars denotes the complements of PPBs.

    The total error introduced through arithmetic operation in FIR implementations is no longer than one unit of last place (ulp).In the proposed method of truncation multiplier more PPBs could be deleted which leads to smaller area.

    Fig. 4 Generation of Partial product bit

    Fig. 6 Design of Truncated Multiplier

    The proposed method of faithfully rounded truncated multiplier design which is shown in Fig. 6.Here single row of PPBs is undeletable and hence the partial product bit elimination consist of only deletion and rounding.But in the previous method the removal of unneccesary partial product bit having three step process:deletion,truncation and rounding.The proposed method having the error range of deletion and rounding follows:

    -ulp ED 0 – ulp ED= ED + ulp ulp

    -ulp < ER 0 – ulp < ER = ER + ulp ulp

    -ulp < E =( ED + ER ) ulp

    Since the range of the deletion error in the improved version is twice larger than that in [14],more PPBs can be deleted,leading to smaller area in the subsequent PPB compression.


    We implemented different multipliers for FIR filter design with the specification in Table I.

    Fig. 5 Methods of PPB trucation



    Braun Array

    Baugh Wooley



    No of Logic














    Here the three different multipliers used are:Braun Array,Baugh Wooley and Truncated Multipliers.When compared with the Braun Array Multiplier, Baugh wooley multipliers having less number of adders and delay elements.Truncated multipliers achieved low cost when compred with the Braun Array and Baugh wooley multipliers.


    Figure 7 and 8 shows the simulated outputs using Modelsim.From these figure it shows that there is no truncation output.We proposed the truncation multipliers which shows the deletion and truncation occurred simultaneously.Thus the results shows that truncated multipliers achieves low power and an area.

    Fig. 7 Braun Array Multiplier

    Fig. 8 Baugh Wooley Multiplier


In this paper, low cost FIR filter design is proposed using rounded truncated multiplier which mainly reduces the power and area cost. Most prior FIR filter design are based on the transposed form, practically we observed that direct form is reduces the delay elements. When compared with the other

multipliers, truncated multiplier shows 30% reduction in power and area.


  1. M. M. Peiro, E. I. Boemo, and L. Wanhammar, Design of high-speed multiplierless filters using a nonrecursive signed common sub expression algorithm, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 49, no. 3, pp. 196203, Mar. 2002.

  2. C.-H. Chang, J. Chen, and A. P. Vinod, Information theoretic approach to complexity reduction of FIR filter design, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 8, pp. 23102321, Sep. 2008.

  3. F. Xu, C. H. Chang, and C. C. Jong, Contention resolutionA new approach to versatile subexpressions sharing in multiple constant multiplications, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 2, pp.559571, Mar. 2008.

  4. F. Xu, C. H. Chang, and C. C. Jong, Contention resolution algorithms for common subexpression elimination in digital filter design, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 52, no. 10, pp. 695700, Oct. 2005.

  5. I.-C. Park and H.-J. Kang, Digital filter synthesis based on an algo- rithm to generate all minimal signed digit representations, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 12, pp. 1525 1529, Dec. 2002.

  6. C.-Y. Yao, H.-H. Chen, T.-F. Lin, C.-J. J. Chien, and X.-T. Hsu, A novel common-subexpression-elimination method for synthesizing fixed-point FIR filters, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 11, pp.22152221, Sep. 2004.

  7. O. Gustafsson, Lower bounds for constant multiplication problems, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 11, pp. 974978, Nov. 2007.

  8. Y. Voronenko and M. Puschel, Multiplierless multiple constant multipli-cation, ACM Trans. Algorithms, vol. 3, no. 2, pp. 138, May 2007.

  9. D. Shi and Y. J. Yu, Design of linear phase FIR filters with high probabil-ity of achieving minimum number of adders, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 1, pp. 126136, Jan. 2011.

  10. R. Huang, C.-H. H. Chang, M. Faust, N. Lotze, and Y. Manoli, Sign- extension avoidance and word-length optimization by positive-offset rep-resentation for FIR filter design, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 12, pp. 916920, Oct. 2011.

  11. P. K. Meher, New approach to look-up-table design and memory-based realization of FIR digital filter, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 3, pp. 592603, Mar. 2010.

  12. P. K. Meher, S. Candrasekaran, and A. Amira, FPGA realization of FIR filters by efficient and flexible systolization using distributed arithmetic, IEEE Trans. Signal Process., vol. 56, no. 7, pp. 30093017, Jul. 2008.

  13. S. Hwang, G. Han, S. Kang, and J.-S. Kim, New distributed arithmetic al-gorithm for low-power FIR filter implementation, IEEE Signal Process. Lett., vol. 11, no. 5, pp. 463466, May 2004.

  14. H.-J. Ko and S.-F. Hsiao, Design and application of faithfully rounded and runcated multipliers with combined deletion, reduction, truncation, and rounding, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 5,pp. 304308, May 2011.

  15. H. Samueli, An improved search algorithm for the design of multipli- erless FIR filters with powers-of-two coefficient, IEEE Trans. Circuits Syst., vol. 36, no. 7, pp. 10441047, Jul. 1989.

  16. Y. C. Lin and S. Parker, Discrete coefficient FIR digital filter design based upon an LMS criteria, IEEE Trans. Circuits Syst., vol. 30, no. 10, pp.723739, Oct. 1983.

  17. A. Blad and O. Gustafsson, Integer linear programming-based bit-level optimization for high-speed FIR filter architecture, Circuits Syst. Signal Process., vol. 29, no. 1, pp. 81101, Feb. 2010.

  18. F. Xu, C. H. Chang, and C. C. Jong, Design of low-complexity FIR filters based on signed-powers-of-two coefficients with reusable com- mon subexpressions, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 26, no. 10, pp. 18981907, Oct. 2007.

  19. Y. J. Yu and Y. C. Lim, Design of linear phase FIR filters in subexpression space using mixed integer linear programming, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 10, pp. 23302338, Oct. 2007.

Leave a Reply