Approximate Multiplier for Low Power Applications

Download Full-Text PDF Cite this Publication

Text Only Version

Approximate Multiplier for Low Power Applications

Vigneshwar A

Electronics and Communication Engineering Sri Venkateswara College of Engineering Chennai, India

Dr. Sathish Kumar G A

Electronics and Communication Engineering Sri Venkateswara College of Engineering Chennai, India

Abstract Imprecise computing is best suited for error resilient applications, such as signal processing and multimedia. Imprecise computing provides meaningful and faster results with lower power consumption; that is particularly attractive for arithmetic circuits. In this paper, a new design is proposed to exploit the partitions of partial products using recursive multiplication by compressor-based approximate multipliers. Three multiplier designs are proposed using 4:2 approximate compressors. Extensive simulation results show that the proposed design achieve significant accuracy improvement together with power and area reduction compared to previous approximate multiplier designs. The first two mutiplier design uses first approximate compressor. The proposed multiplier design (1) is simulated and synthesized in xilinx software with less accuracy, less power and area compared with existing multipliers. The proposed multiplier design (2) is also simulated and synthesized in Xinlix software with high accuracy at the cost of more area and power than the proposed design (1). The proposed multiplier design (3) uses second approximate compressor. It is simulated and synthesized using Xilinx software. It uses less area and power compared to design (1) and design (2).

KeywordsApproximate computing, compressor, multiplier


    Many scientific and engineering problems are computed using accurate, precise and deterministic algorithms. However, in many applications involving signal/image processing and multimedia, exact and accurate computations are not always necessary, because these applications are error tolerant and produce results that are good enough for human perception [1]. In these error resilient applications, a reduction in circuit complexity, and thus, area, power and delay is very important for the operation of a circuit. Hence, approximate computing can be used in error tolerant applications by reducing accuracy, but still providing meaningful results faster and/or with lower power consumption [2]. Addition and multiplication are often used in these applications. For addition, full adders have been analyzed in detail and a number of approximate designs have been proposed [1]. In [3], several new metrics are proposed and a comparison is made among some of the adder designs. The error distance (ED) is defined as the arithmetic distance between an erroneous and the correct outputs for a given input. The mean error distance (MED) and normalized error distance (NED) are then proposed. Recently, approximate multipliers have also gained significance because of their importance in arithmetic operations [410]; several

    approximate 4:2 compressors have been proposed in the reduction of the partial products of a Dadda tree. In this paper, the approximate compressors of [10] are utilized to design 8×8 bit multipliers by a novel partition of the partial products. The newly-designed approximate multipliers are more accurate than the ones proposed in [10] and require approximately the same power and delay; it is shown that the improvement in accuracy is significant, albeit at a slightly increase in area. This paper is organized as follows. Section II reviews approximate multipliers and the compressors used in the proposed designs. Section III presents the proposed multipliers. Section IV provides the simulation results for the multipliers and compares the proposed design with [10]. Section V presents an image processing application using the approximate multipliers and Section VI gives the conclusion.


    1. Approximate multipliers

      An error tolerant multiplier (ETM) uses accuracy as a design parameter and divides the operands into two parts multiplication and nonmultiplication, depending on the required accuracy [4]. It performs the multiplication only in the first part, thus saving power and delay at the cost of accuracy. A novel 2×2 bit under designed multiplier (UDM) is proposed and used to build a larger multiplier [5]. [6] presents a 6×6 bit broken array multiplier (BAM), that is faster than an accurate array multiplier. [7] proposes a 4×4 imprecise counter-based multiplier (ICM) that uses a 4:2 inaccurate counter to reduce the partial product stages of a Wallace tree multiplier. It leads to a power efficient design, which can then be used to implement multipliers of large sizes. Four different modes of an approximate Wallace tree multiplier (AWTM) are presented in [8]. This design uses a carry-in prediction method, resulting in hardware reduction and thus, less power, area and delay compared to the accurate Wallace tree multiplier. Also, AWTM uses the simple recursive multiplication technique that has also been used in this paper and explained in Section II.C. [9] proposes a fast and power-efficient multiplier based on an approximate adder that can process data in parallel by cutting the carry propagation chain. Two new approximate 4:2 compressors and four approximate multipliers are proposed in [10]. Similar compressors have been used in the partial product reduction stage in the multipliers proposed in this paper. Most of the approximate multipliers aim for a tradeoff in accuracy, power, delay and area.

    2. Recursive multiplication

      The technique used in this paper for designing 8 x 8 multipliers using 4 x 4 multipliers is known as recursive multiplication. Suppose there are 2 numbers A & B of 2a bits each. It is possible to break the two numbers into two halves

      i.e. most significant a bits and least significant a bits. So Ah denotes the upper a bits of A. Al denotes lower a bits of A and similarly, Bh and Bl denotes upper and the lower a bits of B respectively. Then instead of performing a 2a x 2a multiplication, four a x a multiplications are performed (AhBh, AhBl, AlBh, AlBl) and added to get the final output as shown in fig. 1.

      Fig. 1. Format of the inputs for recursive multiplication

    3. Accurate compressor

      Compressors are used to reduce the number of partial product stages. The basic structure of an accurate 4:2 compressor chain utilized in the partial product reduction is shown in Fig. 2. A 4:2 compressor produces a sum for the same order of the next stage, and a carry for one order higher in the next stage. Also, a carry out (Cout) is generated and becomes the carry in (Cin) of the n ext higher-order compressor. A 4:2 accurate compressor is implemented using two full adder circuits as shown in Fig. 3. There are many other designs for implementing the accurate compressor. [11] describes a design for a 4:2 accurate compressor using three XOR-XNOR gates , one XOR gate and two 2:1 multiplexers. The logic equations for the three outputs of the compressor are as follows:

      Fig. 2. Adjacent compressors with in a chain in the partial product reduction stage

      Fig. 3. An accurate compressor by using two full adders

    4. Approximate compressors utilized

    The two designs of inaccurate compressors as proposed in

    [10] have been used in this paper when designing multipliers. Both designs are based on the modification of the truth table of the accurate compressor to reduce the hardware. In design 1, the carry signal is directly connected to the signal and the columns of the sum and signals are modified to reduce the hardware, and hencereducing the delay. The logic functions for design 1 are given as:

    In design 2, Cout is completely removed hence there is no need for fifth input as well. Hence this design further simplifies the circuit and gives better results interms of accuracy. The logic function of design 2, is given as

    The circuit diagrams for the two inaccurate designs are shown in the Fig. 4.

    Fig. 4. Two approximate compressors

    TABLE 1


    TABLE 2



    In this section, the proposed multiplier designs are presented. Since the technique of recursive multiplication issused, multipliers are required 8 x 8for the implementation of the product. Hence,the multiplier designs are presented 4 x 4 too. The method of recursive multiplication is shown using a partial product tree to illustrate its difference from a conventional design.

    1. 4X4 bit designs

      Three 4 x 4 bit multipliers have been implemented and further used in the 8 x 8 bit multiplication. All three designs are implemented using the Dadda tree technique by making use of different 4:2 compressors in the reduction stage. Using the compressors, the 4 x 4 bit product requires one reduction stage, making the product calculation faster.

      For the first design, Mul44_1, the design 1 compressor shown in Fig. 4 (a) is used in the partial product reduction stage. The Dadda tree implementation of Mul44_1 is shown in Fig. 5 (a); only two compressors are required in the partial product reduction stage.

      Similarly, for the second design, Mul44_2, design 2 compressor shown in Fig. 4 (b) is used in the reduction stage and as design 2 does not have a carry to the next stage, the design is a bit different from Mul44_1. The Dadda tree implementation of Mul44_2 is shown in Fig. 5 (b). Only one compressor is required in the reduction stage, which significantly simplifies the design.

      For the accurate 4×4 multiplier, Mul44_acc, the Dadda tree implementation is the same as Mul44_1, because the design 1 compressor and the accurate compressor use the same types of circuits. Hence, only accurate compressors need to be used in place of the design 1 compressors. In this multiplier, two accurate compressors are required in the reduction stage.

      Fig. 5. Use of compressors for partial product reduction

    2. 8X8 bit designs

      4 x 4 multipliers are used in the implementation of 8 x 8 multipliers. The partial product tree of the 8 x 8 multiplication are broken down to 4 products of 4×4 modules

      using the technique of recursive multiplication, as shown in Fig. 6. The advantage of breaking the products is to obtain smaller multiplication blocks that are performed in parallel and thus faster. Then, they merely need to be added, according to Fig. 1 to obtain the final product.

      The proposed multiplier Mul88-1 uses Mul44-1 for the computation of all the four partial products. For high accuracy designs, Mul44-acc can be used for the three approximate

      compressor design can be used for the least significant product, The proposed multiplier Mul88-2 uses MUL 44-1 for least significant product and MUL44-ACC for other three products. The proposed multiplier MUL88-3 uses MUL44-2 for all the four partial products. Its accuracy is high compared to MUL88-1.

      Fig. 6. 8 x 8 bit multiplication broken down into four parts of 4 by 4 bit multiplications (using recursive multiplication)


    In this section, the designs of the proposed multipliers as explained in Section III are evaluated. The proposed multipliers are simulated and synthesized using Xilinx software.

    TABLE 3






























    We can see from the above table area and power consumption of proposed multiplier is drastically reduced compared to existing multipliers.

    TABLE 4















    We can see from the above table that proposed multiplier using approximate compressor 2 consumes less area and power compared to proposed multipliers using approximate compressor 1.


In this paper three approximate multipliers were proposed. The proposed approximate multiplier design (1) uses first approximate compressor in all four partial products. It requires less area and power than the existing multipliers. Its accuracy is less compared to proposed approximate multiplier design (2). The proposed approximate multiplier design (2) uses accurate compressors in three most significant partial products and first approximate compressor in least significant product. It requires less power compared to existing multipliers. Its accuracy is higher than the existing multipliers. Depending on the application requirements one can select the proposed approximate multiplier design (1) or multiplier design (2). The proposed approximate multiplier design (2) is very much suitable for high accuracy applications. The proposed multiplier design (3) uses second approximate compressor in all the four partial products. Its accuracy is high compared to the proposed multiplier approximate multiplier design (1).


    1. J. Liang, J. Han, and F. Lombardi, New metrics for the reliability of approximate and Probabilistic Adders, IEEE Trans. Computers, vol. 63, no. 9, pp. 17601771, Sep. 2013.

    2. V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy, IMPACT: IMPrecise adders for low-power approximate computing, in Proc. Int. Symp. Low Power Electron. Design, Aug. 2011, pp. 409414.

    3. S. Cheemalavagu, P. Korkmaz, K. V. Palem, B. E. S. Akgul, and

      L. N. Chakrapani, A probabilistic CMOS switch and its realization by exploiting noise, presented at the IFIP Int. Conf. Very Large Scale Integ., Perth, Australia, Oct. 2005.

    4. H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, Bioinspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications, IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 57, no. 4, pp. 850862, Apr. 2010.

    5. M. J. Schulte and E. E. Sartzlander Jr., Truncated multiplication with correction constant, in Proc. Workshop VLSI Signal Process. VI, 1993, pp. 388396.

    6. E. J. King and E. E. Swartzlander Jr., Data dependent truncated scheme for parallel multiplication, in Proc. 31st Asilomar Conf. Signals, Circuits Syst., 1998, pp. 11781182.

    7. P. Kulkarni, P. Gupta, and M. D. Ercegovac, Trading accuracy for power in a multiplier architecture, J. Low Power Electron., vol. 7, no. 4, pp. 490501, 2011.

    8. C. Chang, J. Gu, and M. Zhang, Ultra low-voltage low- power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits, IEEE Trans. Circuits Syst., vol. 51, no. 10, pp. 19851997, Oct. 2004.

    9. D. Radhakrishnan and A. P. Preethy, Low-Power CMOS pass logic 4-2 compressor for high-speed multiplication, in Proc. IEEE 43rd Midwest Symp. Circuits Syst., 2000, vol. 3, pp. 12961298.

    10. Z. Wang, G. A. Jullien, and W. C. Miller, A new design technique for column compression multipliers, IEEE Trans. Comput., vol. 44, no. 8, pp. 962970, Aug. 1995.

    11. J. Gu and C. H. Chang, Ultra low-voltage, low-power 4-2 compressor for high speed multiplications, in Proc. 36th IEEE Int. Symp. Circuits Syst., Bangkok, Thailand, May 2003, pp. v-321v- 324.

    12. M. Margala and N. G. Durdle, Low-power low-voltage 4-2 compressors for VLSI Applications, in Proc. IEEE Alessandro Volta Memorial Workshop Low-Power Design, 1999, pp. 8490.

    13. B. Parhami, Computer Arithmetic; Algorithms and Hardware Designs, 2nd ed. London, U.K.: Oxford Univ. Press, 2010.

Leave a Reply

Your email address will not be published. Required fields are marked *