 Open Access
 Total Downloads : 279
 Authors : P. Kirithika, M. Devi, P. Nandhini
 Paper ID : IJERTV3IS11140
 Volume & Issue : Volume 03, Issue 01 (January 2014)
 Published (First Online): 01022014
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Design of LowPower Truncated Multiplier for DSP Applications
1P. Kirithika, 2M. Devi, 3P. Nandhini 1,2,3Department Of Electronics and Communication Engineering, 1,2,3Akshaya College of Engineering and Technology, Coimbatore.
Abstract
FIR digital filter is one of the fundamental components in many digital signal processing and communication systems. In this work, a lowpower finite impulse response (FIR) is designed using truncated multipliers, which consumes less power and low cost. MCMA (Multiple constant multiplication/ accumulation) in a direct FIR structure is implemented using an proposed truncated multiplier design. The MCMA module is realized by accumulating all the PP (partial products) where unnecessary PP bits (partial product bits) are removed without affecting the final precision of the outputs. Comparisons with previous FIR design approaches shows that the proposed design achieve the best area and power results. The numbers of operations used by stages are reduced in proposed truncated multiplier design. The simulation results indicate that the power is saved about 15% using truncated multiplier when compared to the conventional multiplier.
Index terms Digital signal processing (DSP), finite impulse response (FIR), truncated multipliers, Multiple constant multiplication/ accumulation (MCMA), VLSI design.

Introduction
Finite impulse response (FIR) digital filter is widely used as a basic tool in various digital signal processing and image processing applications. It is also used in many portable applications with less area and power consumption. A general FIR filter of order M can be denoted as,
There are two basic FIR structures, direct form and transposed form, as shown in Fig. 1. In the direct form in Fig.1(a), the multiple constant multiplication (MCM)/ accumulation (MCMA)
module performs the concurrent multiplications of individual delayed signals and respective filter coefficients, followed by accumulation of all the products obtained. Thus, the operands of the multipliers in MCMA are delayed input signals x[n i] and coefficients ai.
Fig.1. Structures of FIR filters: (a) Direct form and
(b) transposed form.
In this brief, low power implementations of FIR filters based on direct structure in Fig.1 (a) with truncated multipliers. Thus, MCMA module is realized by accumulating all the partial products (PPs), where unnecessary PP bits (PPBs) are removed without affecting the final precision of the outputs.
The structure of direct form FIR filter consists of delay elements, structure of adders and multiplier circuit. The proposed method develops a truncated multiplier and thus, the proposed truncated multiplier is placed instead of normal multiplier in the structure.
Multiplication of two numbers generates a product with twice the original bit width. It is usually
desirable to truncate the product bits to the required precision to reduce the area cost, leading to design of truncated multiplier. In this brief, a new truncated multiplier design can achieve faithful results. The proposed truncated multiplier design jointly considers the tree reduction, truncation, and also the rounding of the PP bits during the design of fast parallel truncated multipliers, hence, the final truncated product satisfies the precision requirement.

Reduction Of Parallel Tree Multiplier
A parallel tree multiplier design consists of three steps, i.e., PP generation, PP reduction, and final carry propagate addition. PP generation produces PP bits from the multiplicand and the multiplier. The goal of PP reduction is to compress the number of PPs to two, which is to be summed up by the final addition. The two most famous reduction methods are Wallace tree [4] and Dadda tree [5] reductions. Wallace tree reduction manages to compress the PPs as early as possible, whereas Dadda reduction only performs compression whenever necessary without increasing the number of carrysave addition (CSA) levels.
TABLE I
Number of FAs and HAs in one column for the reduction of h Bits
Fig. 2. Tree reduction of 8 Ã— 8 multiplication

Scheme 1 and (b) Scheme 2 in Table I. (a)
Scheme 1

(38 FAs, 8 HAs). (b) Scheme 2 (35 FAs, 7 HAs).
To allow more flexible columnbycolumn reduction to be used in the proposed truncated multiplier design in Section III, two reduction schemes are presented that intend to minimize the use of half adders (HAs) in each column because the full adder (FA) cell has a higher compression rate compared with the HA cell. Table I shows the number of FAs nFA and HAs nHA required to compress a column of h bits to one bit (Scheme 1) or two bits (Scheme 2) using FAs (32 counters) and HAs (22 counters). In this brief, we adopt hybrid Scheme1 and Scheme2 reductions for the truncated multiplier design in Section III in order to minimize the area cost.
Fig. 2(a) and (b) shows the reduction procedures by Scheme 1 and Scheme 2 to each column of PP bits, starting from the least significant column. Column heights h, including the carry bits from least significant columns, are also shown on the top row where the columns that need HAs are highlighted by square boxes. Note that Scheme 1 in Table I is only used to determine whether an HA is needed and how many FAs are required in the percolumn reduction that does not exceed the maximum number of CSA reduction levels. It is not necessary that the number of bits after the reduction is always one.


Proposed Truncated Multiplier Design
A.PP truncation and compression
The objective of the truncated multiplier design is to compute P MSBs of the product with a maximum truncation error of no more than 1 ulp, where 1 ulp = 2P .
The FIR filter design in this brief adopts the direct form in Fig.1 (a) where the MCMA module sums up all the products ai Ã— x[n i]. Instead of accumulating individual multiplication for each product, it is more efficient to collect all the PPs into a single PPB matrix with carrysave addition to reduce the height of the matrix to two, followed by a final carry propagation adder.
In order to avoid the sign extension bits, we complement the sign bit of each PP row and add some bias constant using the property sÂ¯ = 1 s, where s is the sign bit of a PP row, as shown in Fig.3. All the bias constants are collected into the last row in the PPB matrix. The complements of PPBs are denoted by white circles with overbars.
In the proposed truncated multiplier design in FIR filter implementation, it is required that the total error introduced during the arithmetic operations is no larger than one ulp. Fig.4 compares the two approaches. In [2], the removal of unnecessary PPBs is composed of three processes: deletion, truncation, and rounding. Two rows of PPBs are set undeletable because they will be removed at the subsequent truncation and rounding [1].
Fig.3. Generation of PPBs considering sign extension and negation.
Fig.4. Truncated multiplier designs using (a) the approach in [2] and (b) proposed design.
Fig.4 (a) shows an example of the approach in [2], where the gray circles, crossed green circles, and crossed red circles represent respectively the deleted bits, truncated bits, and rounded bits. In this brief, the proposed design of the truncated multiplier design is shown in Fig.4 (b). Only a single row of PPBs is made undeletable (for the subsequent rounding), and the PPB elimination consists of only deletion and rounding. The error ranges of deletion and rounding in the proposed design are as follows:/p>
Since the range of the deletion error in the proposed design is twice larger than that in [2], hence, more PPBs can be deleted, leading to smaller area in the subsequent PPB compression.
Fig.5 shows the overall FIR filter architecture using multiple constant multipliers/ accumulators with truncation that removes unnecessary PPBs. The white circles in the Lshape block represent the undeletable PPBs. The deletion of the PPBs is represented by gray circles. After PP compression, the rounding of the resultant bits is denoted by crossed circles. The last row of the PPB matrix
represents all the offset and bias constants required including the sign bit modifications.
demonstrated, the same approach can be extended for signed multipliers or Booth multipliers as well.

Experimental Results
The software used for the simulation purpose is ModelSim SE 6.3f and power analysis is demonstrated using Xilinx ISE 8.1i.
In this section, the proposed truncated multiplier design developed will be better performance than the previous applications. Most of prior FIR filter designs are based on the transposed structure because the major goal is to minimize the cost of adders in MCM that takes less than 20% of the total area. However, the SAs are not optimized, and the area of DFFs in the transposed forms is larger because of the range expansion of the results after MCM.
Although the area costs of the proposed designs are significantly reduced, but the critical path delay is increased because all the operations in the MCMA are executed within one clock cycle. It is possible to reduce the delay by adding pipeline registers in the PP compression as suggested in [3], where the major goal is to minimize the number of FAs, HAs. In this brief, we focus on low power FIR filter designs with moderate speed performance for mobile applications where area and power are important design considerations. In addition, unlike other methods, the proposed method does not increase the height of the PP matrix, which leading to a smaller delay.
Fig.5. Overall FIR filter architecture using multiple constant multipliers/ accumulators with truncation.
B. Extension to Booth Multipliers
Although the proposed truncated multiplier designs for unsigned multiplication are
Fig.6 shows the simulated result of proposed truncated multiplier design. The input of the truncated multiplier design is given and the final target precision output.Fig.7 shows the simulation result of FIR filter architecture using truncated multiplier design. Thus, the process is executed within one clock cycle.
Fig.6. Simulated result of truncated multiplier using proposed design
Fig.7. Simulated result of FIR filter using proposed design
The power consumption for the implementation of FIR filter using proposed truncated multiplier design is less when compared to the previous approaches. Fig.8. illustrates the power required for FIR filter using proposed design.
Fig.8. Power Analysis of FIR filter using proposed design.

Conclusion
This brief has presented low power FIR filter designs using the proposed design. Although most prior designs are based on the transposed form, we observe that the direct FIR structure with proposed truncated multiplier design leads to the smallest power consumption.
References

ShenFu Hsiao, JunHong Zhang Jian, and MingChih Chen, LowCost FIR Filter Designs based on faithfully rounded truncated multiple constant multiplication/ accumulation, IEEE Transactions On Circuits And SystemsIi: Express Briefs, vol. 60, no. 5, pp. 287291, May 2013.

H.J. Ko and S.F. Hsiao, Design and application of faithfully rounded and truncated multipliers with combined deletion, reduction, truncation, and rounding, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 5, pp. 304308,
May 2011

A. Blad and O. Gustafsson, Integer linear pro grammingbased bitlevel optimization for high speed FIR filter architecture, Circuits Syst. Signal Process., vol. 29, no. 1, pp. 81101, Feb. 2010.

C. S. Wallace, A suggestion for a fast
multiplier, IEEE Trans. Electron. Comput., vol. EC 13, no. 1, pp. 1417, Feb. 1964.

L. Dadda, Some schemes for parallel multipliers,
Alta Frequenza, vol. 34, pp. 349356, 1965.

C.Y. Yao, H.H. Chen, T.F. Lin, C.J. J. Chien, and X.T. Hsu, A novel commonsubexpression elimination method for synthesizing fixedpoint FIR filters, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 11, pp. 22152221, Sep. 2004.

O. Gustafsson, Lower bounds for constant multiplication problems, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no. 11, pp. 974978, Nov. 2007.

Y. Voronenko and M. Puschel, Multiplierless multiple constant multiplication, ACM Trans. Algorithms, vol. 3, no. 2, pp. 138, May 2007.

D. Shi and Y. J. Yu, Design of linear phase FIR filters with high probability of achieving minimum number of adders, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 1, pp. 126136, Jan. 2011.

R. Huang, C.H. H. Chang, M. Faust, N. Lotze, and Y. Manoli, Signextension
avoidance and wordlength optimization by positive offset representation for FIR filter design, IEEE Trans. Circuits Syst. II, Exp.Briefs, vol. 58, no. 12, pp. 916920, Oct. 2011.

P. K. Meher, New approach to lookuptable design and memorybased realization of FIR digital filter, IEEE Trans. Circuits Syst. I, Reg. Papers,
vol. 57, no. 3, pp. 592603, Mar. 2010.

P. K. Meher, S. Candrasekaran, and A. Amira,
FPGA realization of FIR filters by efficient and flexible systolization using distributed arithmetic, IEEE Trans. Signal Process., vol. 56, no. 7, pp. 3009 3017, Jul. 2008.

S. Hwang, G. Han, S. Kang, and J.S. Kim, New distributed arithmetic algorithm for lowpower FIR filter implementation, IEEE Signal Process. Lett., vol. 11, no. 5, pp. 463466, May 2004.

F. Xu, C. H. Chang, and C. C. Jong, Contention resolutionA new approach to versatile subexpressions sharing in multiple constant multiplications, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 2, pp. 559571, Mar. 2008.

F. Xu, C. H. Chang, and C. C. Jong, Design of lowcomplexity FIR filters based on signedpowers oftwo coefficients with reusable common subexpressions, IEEE Trans. Comput.Aided Design Integr. Circuits Syst., vol. 26, no. 10, pp. 18981907, Oct. 2007.