Low Power and Area Efficient Multiplier Design using ANT Architecture

DOI : 10.17577/IJERTCONV5IS13038

Download Full-Text PDF Cite this Publication

Text Only Version

Low Power and Area Efficient Multiplier Design using ANT Architecture

A. Priya, Assistant Professor, Department of ECE,

K.Ramakrishnan College of Technology, Trichy, India.

S. Geerthana, Assistant Professor, Department of ECE,

K.Ramakrishnan College of Technology, Trichy, India.

Abstract In this paper, we propose a reliable low-power multiplier design by adopting algorithmic noise tolerant (ANT) architecture with the xed-width multiplier to build the reduced precision replica redundancy block (RPR). The proposed ANT architecture can meet the demand of high precision, low power consumption, and area efciency. We design the xed-width RPR with error compensation circuit via analyzing of probability and statistics. Using the partial product terms of input correction vector and minor input correction vector to lower the truncation errors, the hardware complexity of error compensation circuit can be simplied. In a 12 × 12 bit ANT multiplier, circuit area in our xed-width RPR can be lowered by 44.55% and power consumption in our ANT design can be saved by 23% as compared with the state-of-art ANT design.

Index Terms Algorithmic noise tolerant (ANT), Fixed-width multiplier, Reduced-precision replica (RPR), Voltage overscaling(VOS), Column bypassing multiplier.

I.INTRODUCTION

The portable and wireless computing systems drives the need for ultralow power systems in recent years due to its rapid growth. To minimize the power dissipation, supply voltage scaling is widely used as an effective low-power technique [1]. However, in deep-submicrometer process technologies, noise interference problems have raised difculty to design the reliable and efcient microelectronics systems; hence, to enhance noise tolerance design techniques are developed [2][12].

An effective low power technique known as voltage over scaling(VOS), was proposed to lower the power supply without sacrificing the throughput[4]. However, VOS leads tosevere degradation in signal-to-noise ratio (SNR). To overcome this a novel algorithmic noise tolerant (ANT) technique [2] combined VOS main block with reduced- precision replica (RPR), which combats soft errors effectively while achieving signicant energy saving. Some ANT deformation designs are presented in [5][9] and the ANT design concept is further extended to system level in [10]. However, the RPR designs in the ANT designs of [5][7] are designed in a customized manner, which are not easily adopted and repeated. The RPR designs in the ANT designs of [8] and [9] can operate in a very fast manner, but their hardware complexity is too complex. As a result, the RPR design in the ANT design of [2] is still the most popular design because of its simplicity. However, adopting with RPR in [2] should still pay extra area overhead and power

consumption. In this paper, we further proposed an easy way using the xed-width RPR to replace the full-width RPR block

in [2]. Using the xed-width RPR, the computation error can be corrected with lower power consumption and lower area overhead. We take use of probability, statistics, and partial product weight analysis to nd the approximate compensation

vector for a more precise RPR design. In order not to increase the critical path delay, we restrict the compensation circuit in RPR must not be located in the critical path. As a result, we can realize the ANT design with smaller circuit area, lower power consumption, and lower critical supply voltage.

  1. ANT ARCHITECTURE DESIGNS

    The ANT technique [2] includes both main digital signal processor (MDSP) and error correction (EC) block as shown in Fig. 1. To meet ultralow power demand, VOS is used in MDSP. However, under the VOS, once the critical path delay Tcp of the system becomes greater than the sampling period Tsamp , the soft errors will occur. It leads to severe degradation in signal precision. In the ANT technique [2], a replica of the MDSP but with reduced precision operands and shorter computation delay is used as EC block. Under VOS, there are a number of input-dependent soft errors in its output ya [n]; however, RPR output yr [n] is still correct since the critical path delay of the replica is smaller than Tsamp [4]. Therefore, yr [n] is applied to detect errors in the MDSP output ya [n]. Error detection is accomplished by comparing the difference |ya [n] yr [n]| against a threshold Th. Once the difference between ya [n] and yr [n] is larger than Th, the output y [n] is yr [n] instead of ya [n]. As a result, y [n] can be expressed as

    y [n] = ya[n] , if |ya [n] yr [n]| Th

    y [n] = yr [n] , if |ya [n] yr [n]| Th Th is determined by,

    Th = max |yo [n] yr [n]|

    Where, yo [n] is error free output signal. In this way, the power consumption can be greatly lowered while the SNR can still be maintained without severe degradation [2].

    Fig.1. ANT Architecture [2].

  2. PROPOSED ANT MULTIPLIER DESIGN USING

    FIXED-WIDTH RPR

    In this paper, we further proposed the xed-width RPR to replace the full-width RPR block in the ANT design [2], as shown in Fig. 2, which can not only provide higher computation precision, lower power consumption, and lower area overhead in RPR, but also perform with higher SNR, more area efcient, lower operating supply voltage, and lower

    power consumption in realizing the ANT architecture. We demonstrate our xed-width RPR-based ANT design in an ANT multiplier.

    The xed-width designs are usually applied in DSP applications to avoid innite growth of bit width. Cutting off n-bit least signicant bit (LSB) output is a popular solution to construct a xed-width DSP with n-bit input and n-bit output. The hardware complexity and power consumption of a xed- width DSP is usually about half of the full-length one. However, truncation of LSB part results in rounding error, which needs to be compensated precisely. Many literatures [13][22] have been presented to reduce the truncation error with constant correction value [13][15] or with variable correction value [16][22]. The circuit complexity to compensate with constant corrected value can be simpler than that of variable correction value; however, the variable correction approaches are usually more precise.

    In [16][22], their compensation method is to compensate the truncation error between the full-length multiplier and the xed-width multiplier. However, in the xed-width RPR of an ANT multiplier, the compensation error we need to correct is the overall truncation error of MDSP block. Unlike [16] [22], our compensation method is to compensate the truncation error between the full-length MDSP multiplier and the xed-width RPR multiplier. In nowadays, there are many xed-width multiplier designs applied to the full-width multipliers. However, there is still no xed-width RPR design applied to the ANT multiplier designs.

    To achieve more precise error compensation, we compensate the truncation error with variable correction value. We construct the error compensation circuit mainly using the partial product terms with the largest weight in the least signicant segment. The error compensation algorithm

    makes use of probability, statistics, and linear regression analysis to nd the approximate compensation value [16]. To save hardware complexity, the compensation vector in the partial product terms with the largest weight in the least signicant segment is directly inject into the xed-width RPR, which does not need extra compensation logic gates [17]. To further lower the compensation error, we also consider the impact of truncated products with the second most signicant bit on the error compensation. We propose an error compensation circuit using a simple minor input correction vector to compensation the error remained. In order not to increase the critical path delay, we locate the compensation circuit in the noncritical path of the xed-width RPR. As compared with the full-width RPR design in [15], the proposed xed-width RPR multiplier not only performs with higher SNR but also with lower circuitry area and lower power consumption.

    Fig.2. Proposed ANT Architecture with fixed-width RPR.

    1. PROPOSED PRECISE ERROR COMPENSATION VECTOR FOR

      FIXED-WIDTH RPR DESIGN

      In the ANT design, the function of RPR is to correct the errors occurring in the output of MDSP and maintain the SNR of whole system while lowering supply voltage. In the case of using fixed-width RPR to realize ANT architecture, we not only lower circuit area and power consumption, but also accelerate the computation speed as compared with the conventional full-length RPR. However, we need to compensate huge truncation error due to cutting off many hardware elements in the LSB part of MDSP. In the MDSP is designed using column bypassing multiplier.

      The column bypassing structure has a full adder and the multiplexer. Column Bypassing with reference to multiplier means turning off some columns in the multiplier array whenever certain multiplicand bits are zero. In this technique, during working, the operations in a column can be disabled if the corresponding bit in the multiplicand is 0, to save the

      power. This technique is totally depended on the number of zeroes in the multiplicand bits. To implement this technique, we have to modify our full adder required in general multiplication as shown in fig.3.

      Fig.3.The Modified FA Cell For Column Bypass Multiplier.

      Fig. 4. 12 × 12 bit ANT multiplier is implemented with the six-bit xed-width replica redundancy block.

      Here where f (EC) is the error compensation function, f (ICV) is the error compensation function contributed by the input correction vector ICV(), and f (MICV) is the error compensation function contributed by minor input correction vector MICV(). The source of errors generated in the fixed-width RPR is dominated by the bit products of ICV since they have the largest weight. In [8], it is reported that a low-cost EC circuit can be designed easily if a simple relationship between f (EC) and is found. It is noted that is the summation of all partial products of ICV. By statistically analyzing the truncated difference between MDSP and fixed-width RPR with uniform input distribution, we can find the relationship between f (EC) and . As shown in Fig. 4, the statistical results show that the average truncation error in the fixed-width RPR multiplier is approximately distributed between and +1. More precisely, as = 0, the average truncation error is close to +

      1. As > 0, the average truncation error is very close to . If we can select as the compensation vector, the compensation vector can directly inject into the fixed-width RPR as compensation, which does not need extra compensation logic gates [17]. We go further to analyze the compensation precision by selecting as the compensation vector. We can find that the absolute average error in = 0 is much larger than that in other cases.

      Moreover, the absolute average error in = 0 is larger than 0.5 2(3n/2), while the absolute average error in other situations is smaller than 0.5 2(3n/2). Therefore, we can apply multiple input error compensation vectors to further enhance the error compensation precision. For the > 0 case, we can still select as the compensation vector. For the = 0 case, we select + 1 combining with MICV as the compensation vector. Before directly injecting the compensation vector into the fixed-width RPR, we go further to double check the weight for the partial product terms in ICV with the same partial product summation value but with different locations.

      For the = 0 case, we go further to analyze the error profile the ICV and MICV. In ICV, we can find that all the truncation errors are positive when = 0. It implies us that if we adopt the multiple compensation vectors for the average compensation error terms are larger than 0.5 2(3n/2), we can lower the compensation error effectively and no additional compensation error will be generated. The multiple compensation vectors are constructed by ICV() combined with MICV(). The weight of MICV() is only half of ICV(). The summation of all partial products of MICV(), which is denoted as l , have four possible values of 0, 1, 2, and 3 as n = 12 and = 0. The statistical results show that the average truncation error contributed by the MICV in the case of = 0 is approximately proportional to . Moreover, the absolute average truncation error in the situation of = 0 is smaller than 0.5 2(3n/2), while the absolute average truncation error in the situation of l > 0 is larger than 0.5 2(3n/2).

      For the case of the absolute average truncation error is smaller than 0.5 2(3n/2), l = 0, selecting as the

      compensation vector is suitable. However, for the case of the absolute average truncation error is larger than 0.5 2(3n/2), selecting as the compensation vector is not suitable since insufficient error compensation will occur. Therefore, we adopt ICV together with MICV to amend this insufficient error compensation case when = 0 and l > 0 as well.

      If = 0 is contributed by l > 0, we will inject one more carry-in compensated vector in the weight of 2(3n/2). In this way, we can remove the cases of || > 0.5 2(3n/2) effectively. The compensation error is effectively lowered by adopting ICV together with MICV while comparing with the case of fixed-width RPR only applying the compensation vector of and with the case of full-width RPR.

    2. Proposed Precise Error Compensation Vector for Fixed-Width RPR Design

      To realize the xed-width RPR, we construct one directly injecting ICV() to basically meet the statistic distribution and one minor compensation vector MICV() to amend the insufficient error compensation cases. The compensation vector ICV() is realized by directly injecting the partial terms of X n1 Yn/2 , X n2 Y(n/2)+1 , X n3 Y(n/2)+2 , . . . , X (n/2)+2 Yn2 . These directly injecting compensation terms are labeled as C1 , C2 , C3 , . . . , C(n/2)1. The other compensation vector used to mend the insufficient error compensation case is constructed by one conditional controlled OR gate. One input of OR gate is injected by X (n/2) Yn1 , which is designed to realize the function of compensation vector . The other input is conditional controlled by the judgment formula used to judge whether = 0 and l = 0 as well. The term Cm1 is used to judge whether = 0 or not. The judgment function is realized by one NOR gate, while its inputs are X n1 Yn/2 , X n2 Y(n/2)+1 , X n3 Y(n/2)+2 , . . . , X (n/2)+2 Yn2 . The term Cm2 is used to judge whether l = 0. The judgment function is realized by one OR gate, while its inputs are X n2 Yn/2 , X n3 Y(n/2)+1 , X n4 Y(n/2)+2 , . . . , X (n/2)+1 Yn2 . If

      both of these two judgments are true, a compensation term Cm is generated via a two-input AND gate. Then, Cm is injected together with X (n/2) Yn1 into a two-input OR gate to correct the insufcient error compensation. Accordingly, in the case of = 0 and l = 0 as well, one additional carry-in signal C(n/2) is injected into the compensation vector to modify the compensation value as + 1 instead of . Moreover, the carry-in signal C(n/2) is injected in the bottom of error compensation vector, which is the farthest location away from the critical path.

      Fig. 5. Proposed high-accuracy xed-width RPR multiplier with compensation constructed by the multiple truncation EC vectors combined ICV together with MICV.

  3. PERFORMANCE COMPARISONS

    To evaluate and compare the performance of the proposed fixed-width RPR based ANT design and th previous full width RPR-based ANT design, we implemented these two ANT designs in a 12-bit by 12-bit multiplier. The main performance indexes are the precision of RPR blocks, the silicon area of RPR blocks, the critical computation delay of RPR blocks, the error probability of RPR blocks under VOS, and the lowest reliable operating supply voltage under VOS. Through quantitative analysis of experimental data, we can demonstrate that our proposed design can more effectively

    restrain the soft noise interference resulting from postponed computation delay under VOS when the circuit operates with a very low-voltage supply. Moreover, hardware overhead and power consumption can also be lowered in the proposed fixed-width RPR-based ANT design.

  4. CONCLUSION

Due to ANT architecture the power consumption is reduced. The column bypassing design implemented in the main block reduces the area to the great extend than the normal array multiplier design. Thus due to the usage of the ANT architecture and the column bypassing method for multiplier design power as well as area is reduced. The performance of the system is also increased.

REFERENCES

    1. (2009). The International Technology Roadmap for Semiconductors [Online]. Available: http://public.itrs.net/

    2. B. Shim, S. Sridhara, and N. R. Shanbhag, Reliable low-power digital signal processing via reduced precision redundancy, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 497510, May 2004.

    3. B. Shim and N. R. Shanbhag, Energy-efficient soft-error tolerant digital signal processing, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 4, pp. 336348, Apr. 2006.

    4. R. Hedge and N. R. Shanbhag, Energy-efficient signal processing via algorithmic noise-tolerance, in Proc. IEEE Int. Symp. Low Power Electron. Des., Aug. 1999, pp. 3035.

    5. V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, Low- power digital signal processing using approximate adders, IEEE

      Trans. Comput. Added Des. Integr. Circuits Syst., vol. 32, no. 1, pp. 124137, Jan. 2013.

    6. Y. Liu, T. Zhang, and K. K. Parhi, Computation error analysis in digital signal processing systems with overscaled supply voltage, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 4, pp. 517526, Apr. 2010.

    7. J. N. Chen, J. H. Hu, and S. Y. Li, Low power digital signal processing scheme via stochastic logic protection, in Proc. IEEE Int. Symp. Circuits Syst., May 2012, pp. 30773080.

    8. J. N. Chen and J. H. Hu, Energy-efficient digital signal processing via voltage-overscaling-based residue number system, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 7, pp. 13221332, Jul. 2013.

    9. P. N. Whatmough, S. Das, D. M. Bull, and I. Darwazeh, Circuit- level timing error tolerance for low-power DSP filters and transforms, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 6, pp. 1218, Feb. 2012.

    10. G. Karakonstantis, D. Mohapatra, and K. Roy, Logic and memory design based on unequal error protection for voltage- scalable, robust and adaptive DSP systems, J. Signal Process. Syst., vol. 68, no. 3, pp. 415431, 2012.

    11. Y. Pu, J. P. de Gyvez, H. Corporaal, and Y. Ha, An ultra low energy/frame multi-standard JPEG co-processor in 65-nm CMOS with sub/near threshold power supply, IEEE J. Solid State Circuits, vol. 45, no. 3, pp. 668680, Mar. 2010.

    12. H. Fuketa, K. Hirairi, T. Yasufuku, M. Takamiya, M. Nomura,H. Shinohara, et al., 12.7-times energy efficiency increase of 16-bit integer unit by power supply voltage (VDD) scaling from 1.2V to 310mV enabled by contention-less flip-flops (CLFF) and separated VDD between flip-flops and combinational logics, in Proc. ISLPED, Fukuoka, Japan, Aug. 2011, pp. 163168.

    13. Y. C. Lim, Single-precision multiplier with reduced circuit complexity for signal processing applications, IEEE Trans. Comput., vol. 41, no. 10,pp. 13331336, Oct. 1992.

    14. M. J. Schulte and E. E. Swartzlander, Truncated multiplication with correction constant, in Proc. Workshop VLSI Signal Process., vol. 6. 1993, pp. 388396.

    15. S. S. Kidambi, F. El-Guibaly, and A. Antoniou, Area-efficient multipliers for digital signal processing applications, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 43, no. 2, pp. 9095, Feb. 1996.

    16. J. M. Jou, S. R. Kuang, and R. D. Chen, Design of low-error fixed-width multipliers for DSP applications, IEEE Trans. Circuits Syst., vol. 46, no. 6, pp. 836842, Jun. 1999.

    17. S. J. Jou and H. H. Wang, Fixed-width multiplier for DSP application, in Proc. IEEE Int. Symp. Comput. Des., Sep. 2000, pp. 318322.

    18. F. Curticapean and J. Niittylahti, A hardware efficient direct digital frequency synthesizer, in Proc. 8th IEEE Int. Conf. Electron., Circuits, Syst., vol. 1. Sep. 2001, pp. 5154.

    19. A. G. M. Strollo, N. Petra, and D. D. Caro, Dual-tree error compensation for high performance fixed-width multipliers, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 52, no. 8, pp. 501 507, Aug. 2005.

    20. S. R. Kuang and J. P. Wang, Low-error configurable truncated multipliers for multiply-accumulate applications, Electron. Lett., vol. 42, no. 16, pp. 904905, Aug. 2006.

    21. N. Petra, D. D. Caro, V. Garofalo, N. Napoli, and A. G. M. Strollo, Truncated binary multipliers with variable correction and minimum mean square error, IEEE Trans. Circuits Syst., vol. 57, no. 6, pp. 13121325, Jun. 2010.

Leave a Reply