 Open Access
 Total Downloads : 1562
 Authors : R. Sabitha, K. Sharmila, M. Sindhuja, T. Suganya
 Paper ID : IJERTV2IS4394
 Volume & Issue : Volume 02, Issue 04 (April 2013)
 Published (First Online): 13042013
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Design of LowPower Approximate Adders for Signal Processing Applications
R. Sabitha1, K. Sharmila2, M. Sindhuja3,T. Suganya4
VSB Engineering College/Dept of ECE, Karur,Tamilnadu,India.
AbstractLow power is the imperative requirement for portable multimedia devices employing several signal processing algorithms and architectures. Earlier research exploits error resiliency primarily through voltage over scaling, using algorithmic and architectural techniques to mitigate the resulting errors. In our paper, we propose logic complexity reduction at the transistor level as the alternative approach to take advantage of the relaxation of numerical accuracy. We examined this concept by proposing several imprecise or approximate full adder cell with reduced complexity at the transistor level, and used them to design approximate multibit adders. In addition to an inherent reduction in switched capacitance, our tier result in significantly shorter critical paths, enabling voltage scaling. We implement our concept in the application of Noise Cancellation algorithms (LMS algorithm). Simulation outputs indicate up to 69% power savings using the proposed approximate adders, when compared to previous implementations using accurate adders.
Index TermsApproximate computing, less power, mirror adder.

INTRODUCTION
Human beings have restricted perceptual abilities when interpreting a noise. This allows the results of these algorithms to be numerically approximate rather than accurate. This relaxation on numerical exactness gives some freedom to carry out imprecise or approximate computation. We can use this freedom to come up with low power designs at various levels of design abstraction, namely, logic, architecture, and algorithm.
It is shown in [1] that an embedded deduced instruction set computing processor consumes 70% of the energy in supplying data and instructions, and 6% of the energy while performing arithmetic only. In our paper, we consider applicationspecific integrated circuit implementations of errorresilient applications like Noise Cancellation algorithms (LMS least mean square algorithm).
A powerefficient multiplier architecture was proposed in [1] that uses a 2 Ã— 2 inaccurate multiplier block resulting from Karnaugh map simplification. Our paper considers logic complexity reduction using Karnaugh maps. Other works that aims on logic complexity reduction at the gate level are [4]. Various approaches use complexity reduction at the algorithm level to meet realtime energy constraints [5], [6].Previous works on logic complexity reduction have aimed on algorithm, logic, and gate levels. We used logic complexity reduction at the transistor level. We apply this to addition at the bit level by reducing the mirror adder (MA) circuit. We develop imprecise but reduced arithmetic units, which gives an extra layer of power savings over conventional lowpower design techniques. This is attributed to the decreased logic complexity of the proposed approximate arithmetic units. Complexity reduction brings power reduction in two different ways. First, an inherent reduction in internal node capacitance and leakage results
from having lesser hardware. Second, complexity reduction frequently gives shorter critical paths, facilitating voltage reduction with no timing errors. Our focus is to target low power design using simplified and approximate logic implementations.
An exampled version of our work appeared in [3]. We extend our paper in [3] by giving two more simplified versions of the MA. We also introduced a methodology that can be used to harness maximum power savings using approximate adders, subject to a specific quality constraint. Our contributions in this paper are summarized as follows.

To simplify the logic complexity of a conventional MA cell by reducing the number of transistors and switched capacitances. Keeping this aim in mind, we propose five various simplified versions of the MA, ensuring minimum errors in the full adder (FA) truth table.

To maintain a reasonable result, we use approximate FA cells only in the least significant bits (LSBs). We particularly aimed on adder structures that use FA cells as their basic building blocks. We prefered carry save adders (CSA) and ripple carry adders (RCA).

VOS was a very popular technique to get large improvements in power consumption. However, VOS will bring delay failures in the most significant bits (MSBs).This might lead to huge errors in corresponding outputs and severely mess up the output quality of the application. We use approximate FA cells particularly in the LSBs, while the MSBs use accurate FA cells.

We proposed designs for Noise Cancellation algorithms (LMS algorithm) using the proposed approximate arithmetic units and evaluate the approximate architectures in terms of output quality and power dissipation.

Finally, we demonstrate the optimization methodology with the help of Adaptive Noise Canceller System (LMS Algorithm).
The remainder of our paper is organized as follows. In Section II, we discuss various approximate FA cells. In Section III we propose 9T Full Adder Design. In Section IV, We derive mathematical models for mean error and power consumption of an approximate RCA. In Section V, we use the approximate FA cells to design architectures for Noise Cancellation (LMS) algorithm and highlight the potential benefits. Finally, conclusions are drawn in Section VI.


APPROXIMATE ADDERS
In this section, we discuss various methodologies for designing approximate adders. We use RCAs and CSAs throughout our simultaneous discussions in all sections.

Approximation Strategies for the Mirror Adder
In this section, we describe stepbystep procedures for coming up with various approximate MA cells with
fewer transistors. Cancellation of some series connected transistors will facilitate faster charging/discharging of node capacitances. Moreover, complexity reduction by removal of
transistors also leads in reducing the C term (switched capacitance) in the dynamic power expression
Fig. 1. Conventional MA.
Fig. 4. MA approximation 3.
.
Fig. 5. MA approximation 4.
Pdynamic = CV 2 f , where is a switching activity or
Fig. 2. MA approximation 1.
Fig. 3. MA approximation 2.
DD
average number of switching transitions per unit time and C is the load capacitance being charged/discharged. This directly results in less power dissipation. Area reduction is also produced by this process. Now, let us focus the conventional MA implementation followed by the proposed approximations.

Conventional MA: Fig.1 shows the transistorlevel schematic of a conventional MA , which is a famous way of implementing an FA. It contains a total of 24 transistors. Since this implementation is not based on complementary CMOS logic, it gives a good opportunity to design an approximate version with removal of selected transistors.

Approximation 1: In order to get approximate MA with lesser transistors, we start to remove transistors from the conventional schematic one by one. However, we should not do this in an arbitrary fashion. We have to make sure that
TABLE I
Truth Table for Conventional FA and Approximations 14
Inputs
Accurate Outputs
Approximate Outputs
A
B
C
in
Sum
C
out
Sum1
C
out1
Sum2
C
out2
Sum3
C
out3
Sum4
C
out4
0
0
0
0
0
0
0
1 Ã—
0
1 Ã—
0
0
0
0
0
1
1
0
1
0
1
0
1
0
1
0
0
1
0
1
0
0 Ã—
1 Ã—
1
0
0 Ã—
1 Ã—
0 Ã—
0
0
1
1
0
1
0
1
0
1
0
1
1 Ã—
0 Ã—
1
0
0
1
0
0 Ã—
0
1
0
1
0
0 Ã—
1 Ã—
1
0
1
0
1
0
1
0
1
0
1
0
1
1
1
0
0
1
0
1
0
1
0
1
0
1
1
1
1
1
1
1
1
0 Ã—
1
0 Ã—
1
1
1
Fig. 6. Layouts of conventional and approximate MA cells.
any input combination of A, B and Cin will not result in short circuits or open circuits in the simplified schematic. Another
main criterion is that the resulting simplification should introduce minimal errors in the FA truth table
out
out

Approximation 2: The truth table of an FA indicates that Sum= C 1 for six out of eight cases, except for the input combinations A = 0, B = 0, Cin = 0 and A = 1, B = 1, Cin = 1.
Now, in the conventional MA, is calculated in the first stage. Thus, an simple way to get a simplified schematic is to
set Sum= . However, we introduce a buffer stage after
(see Fig. 3) to produce the same functionality.

Approximation 3: Further simplification can be obtained by integrating approximations 1 and 2. Note that this produces one error in Cout and three errors in Sum, as shown in Table I.

Approximation 4: A deep observation of the FA truth table shows that Cout = A for six out of eight cases. Obviously, Cout = B for six out of eight cases. Since A and B are interchangeable, we consider Cout = A. Thus, we propose approximation 4 where we just use an inverter with
input A to calculate and Sum is calculated similar to approximation 1.

Approximation 5: If we need to make Sum independent of
Cin, we have two choices, Sum= A and Sum= B. Thus, we
had two alternatives for approximation 5, namely, Sum= A, Cout = A and Sum= B, Cout = A,.If we focus choice 1, we find that both Sum and Cout match with accurate outputs in only two out of eight cases. In choice 2, Sum and Cout match with perfect outputs in four out of eight cases. Therefore, to reduce errors both in Sum and Cout, we go for choice 2 as approximation 5. Our important thrust here is to ensure that for a particular input combination (A, B and Cin), ensuring correctness in Sum also makes Cout correct. Layouts of conventional MA and various approximations in IBM 90nm technology are shown in Fig. 6. Layout area for the conventional MA and various approximations are compared in Table III. Approximation 5 uses particularly buffers. The layout area of a buffer is 6.77 m2.


Voltage Scaling Using Approximate Adders
TABLE II Choosing Approximation 5
Choice 1
Choice 2
Sum= A
Cout = A
Sum= B
Cout = A
0
0
0
0
0 Ã—
0
0 Ã—
0
0 Ã—
0
1
0
0
0 Ã—
1 Ã—
0 Ã—
1
1 Ã—
0 Ã—
1 Ã—
1 Ã—
1
0
1
1 Ã—
1
1 Ã—
1
1
1
1
1


PROPOSED 9T FULL ADDER DESIGN

9T Full Adder Design
The schematic of 9T full adder cell is shown in Figure 3 and its truth table is given in Table 1. The principle of current circuit is differed from traditional circuits. The full adder operation can be given as follows. Given the three 1 bit inputs A, B, and Cin, it is desired to compute the two 1 bit outputs Sum and Cout, given by
Sum = A B Cin.
Cout = A Â· B + Cin (A B).
For generating the Sum output in the proposed design, the truth table has been segmented into two parts, one for input A = 0 and another for A = 1 rather than implementing the conventional Sum module. From the truth table shown in Table 1 it is clear that when A = 0, Sum can be produced by XORing inputs B and Cin. Similarly, when A = 1, Sum focusing the XNORing between inputs B and Cin. Therefore, the operation of Sum module depends on implementing XOR operation and XNOR operation between inputs B and Cin which is
TABLE III
Truth table of 1bit full adder.
Fig. 8. Proposed 9T full adder cell.
Fig. 9. Proposed 9T full adder cell
A
B
Cin
Sum
Cout
0
0
0
0
0
0
0
1
1
0
0
1
0
1
0
0
1
1
0
1
1
0
0
1
0
Fig. 10. Proposed 9T full adder cell
1
0
1
0
1
1
1
0
0
1
PSub indicates subthreshold power consumption and is given
1
1
1
1
1
by
Indicated below. The logic for Cout output is shown As following,
When A = 0,
Sum = B Cin.
When A = 1,
Sum = B Cin.
Total power consumption in CMOS logic circuits will be expressed as sum of three components are shown:
PSub = VDD Ã— ISub,
Where
ISub = KÃ— ( ) ,
Ã— 1 exp( )
The simplified value of V GS in sub threshold mode
decreases current exponentiall and thus reduces sub threshold power
P Total = P Switching + P Sub + P Short ,
where Pswitching indicates the average switching power consumption and is given by ,
P Switching = TC loadV 2 DDf .
TABLE IV
Figure
8
Figure
9
Proposed 9T (Figure 10)
N1 1
N2 4
N3 3
N1 1
N2 3
N3 3
N1 2
N2 2
N3 3
Total = 8
Total = 7
Total = 7
In a nutshell, the proposed 9T full adder proves itself to be a better option for lesspower compact systems. All the substrate terminals in Figures 8, 9 and 10 are connected to
their respective source terminals to nullify the substrate bias effect.
B.Simulations and Comparison
All schematic simulations are done on Tanner EDA tool version 13.0 using 45 nm technology and input voltage varies from 0.2 V to 0.3 V in steps of 0.02 V. In order to justify that proposed design is consuming less power and have better performance, simulations are carried out for powerdelay product at rising frequency, input voltage and temperature.
Fig. 11. Powerdelay product with increasing input voltage.
Fig. 12. Powerdelay product with increasing operating frequency at 0.3 V input voltage and supply voltage.
Figures 11, 12, and 13 shows that the proposed 9T full adder cell proves its superiority in terms of powerdelay product at different input voltages and frequencies and temperature sustainability over existing 9T adders.
The current equation of a pnjunction is given as
I=0 1
Fig. 13. Powerdelay product with varying temperature at 0.3 V input voltage and supply voltage.
This shows that with increasing in temperature, current increases exponentially and produces high power consumption.


MEAN ERROR AND VARIANCE IN APPROXIMATE ADDITION
In this section, we show mean error and mean variance in approximate addition.
Fig. 14. Mean error in approximate addition.
Fig. 15. Error variance in approximate addition.
We also related the mean error for the proposed approximations with truncation, another wellknown approximation technique. From the graph, it is clear that the proposed approximations are better than truncation in terms of mean error. Approximation 1 is the best (having an mean error of zero), where Approximation 4 is the poor.

Modelling Power Consumption of Approximate Adders
Now we calculate simple mathematical models for estimating the power consumption of an approximate RCA. Let and be the gate capacitance of a lower size nMOS and pMOS transistor, respectively. Obviously, let and
be the drain diffusion capacitances respectively. If the
pMOS transistor has three times the width of the nMOS transistor, then 3 and 3 . Let us also consider that . In a multilevel adder tree, the Sum bits of intermediate results become the input bits A and B for the subsequent adder level. The output capacitance at every Sum node is + . The schematic of the conventional MA in Fig. 1 is used to calculate the input capacitances at nodes A,B and . Thus, the total capacitance at node A shall be written as (Cdn + Cdp ) + 4(Cgn + Cgp ) 20Cgn .
TABLE V
Capacitances for Different Approximations
Approximation Technique
Node A
Node B
Node Cin
Approximation 1
12
15
13
Approximation 2
12
12
8
Approximation 3
8
11
8
Approximation 4
12
8
9
Approximation 5
4
8
0
Obviously, thetotal capacitance at node B is ( + )+ 4( + ) 20 , while the capacitance at node is ( + ) + 3( + ) 16 . Continuing this way, the total capacitances at nodes A,B and for all approximations can be calculated by their transistorlevel schematics. Table V gives these values (normalized with respect to ). Note that [1], [2]. . . [y 1] are not computed in approximation 5 (if y bits are approximate). Thus, the capacitance at node for approximation 5 is zero. Therefore, we will use normalized capacitances for all our subsequent discussions.
Since VDD 1/delay, the scaled voltage is given by
= (1(yp/ ))
VDD is the nominal voltage (perfect case), Tc = 1/fc is the clock period, and fc is the operating frequency. The term p can be estimated by simulating an Nb approximate RCA for different values of y.
Fig. 16. Comparison with partial products approach.
DDapp
DDapp
Hence, the same value of p shall be used for determining the scaled voltage as a function of y in an adder tree. Using the above equations, a first order estimate Papp for the power consumption of an approximate RCA shall be written as
The errorpower tradeoff for the proposed approximate multipliers were compared with the data in [1].In Fig. 16 we plot percentage power reduction versus mean error. We find that approximation 4 performs better than the partial product approach for higher mean errors. Also, the partial product approach reaches the saturation point quicker than approximations 4 and 5.


LEAST MEAN SQUARE ALGORITHM

Adaptive filter algorithms
An adaptive filter is a timevariant filter whose coefficients are adjusted to optimize a cost function or to satisfy some predetermined optimization criterion. Two main Characteristics of adaptive filters are they can automatically adapt (selfoptimize) in the face of changing environments and changing system requirements. They can be trained to perform particular filtering and decisionmaking tasks according to some updating equations (training rules). It can subsequently operate in, changing environments (e.g. signal detection in wireless channel) . nonstationary signal/noise conditions (e.g. LPC of a speech signal). timevarying parameter estimation (e.g. position tracking of a moving source)
It simply changes the cost function (n) = E [e2 (n)] by its instantaneous coarse estimate. The error estimation e(n) is
e (n) = d(n) w(n) X(n) Coefficient updating equation is
w (n+1) = w(n) + x(n) e(n),
Where is an appropriate step size will be chosen as 0 < <
0.2 for the convergence of the algorithm. The larger step sizes make the coefficients to fluctuate and eventually become unstable.

Noise canceller
The noise cancellers are used to cancel intense background noise. This configuration can be applied in mobile phones and radio communications, where in some situations these devices are used in highnoise environments. An adaptive noise cancellation system is shown in fig.6.
Adaptive filter
y(n)
Adaptive filter
y(n)
d(n)+r(n)
Papp = (1/2)CswV
2fc , which is a function of y.
r'(n)

Comparison With Approximate Multipliers
Approximate multipliers based on Karnaugh map simpli fication had proposed in [1]. There the errors are introduced through partial products. We constructed an 8 Ã— 8 multiplier using the approximate FA cells. The accumulation of partial
F ig.17. Adaptive Noise Canceller ystem
e(n)
Adaptive Algorithm
products (accurate) was done by using a carrysave tree followed by an RCA (approximate). The power consumption was computed by running exhaustive nanosim simulations for both accurate and approximate 8 Ã— 8 multipliers. The mean errors for the respective case were obtained using MATLAB.
The canceller employs a directional microphone to calculate and estimate the instantaneous amplitude of ambient noise r(n), and another microphone used to take the speech signal which is contaminated with noise d(n) + r(n). The ambient noise is processed by the adaptive filter to make it same as
the noise contaminating the speech signal, and then is subtracted to eliminate the noise in the desired signal. In order for effective noise cancellation the ambient noise should be highly correlated with the noise components in the speech signal, if there is not an access to the instantaneous value of the contaminating signal, the noise does not be cancelled out, but it should be reduced using the statistics of the signal and the noise process.
1
0.5
0
0.5
1
1
0.8
0.6
0 0.5 1 1.5 2 2.5
4
x 10

Audio Error Input
0.4
Speech Signals: A speech signal contains of three classes of sounds. There are voice, fricative and plosive sounds. Voiced sounds are generated by excitation of the vocal tract with
0.2
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2 2.5
4
x 10
quasiperiodic pulses of airflow. Fricative sounds are formed by constrict in the vocal tract and passing air via it, causing turbulence that results in a noiselike sound
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1

Desired Signal
0 0.5 1 1.5 2 2.5
4
x 10
1
0.8
0.6
0.4
0.2
0
0.2

Output for LMS Algorithm
F ig.18. Speech Signal
speech signal can be assumed as a linear composite of the above three classes of sound, each of the sounds are stationary and remains fairly constant over intervals of the order of 40 ms. By using matlab real time voice signal is recorded for 3 seconds and it will generate 24000 samples in the voice signal.
The adaptive white Gaussian noise (AWGN) is mixed
0.4
0.6
0.8
1
0 0.5 1 1.5 2 2.5
4
x 10

Error Signal for LMS Algorithm

Fig. 20.Output Waveform for Noise Cancellation


CONCLUSION
with the recorded signal to generate the error input, along with that desired signal is generated. This error input signal and desired signal are stored in text file saved as errorsignal.txt and desiredsignal.txt. These text files will be processed by VHDL code in Xilinx 13.1 and simulated using Modelsim. During simulation it produces another two text file named as error.txt and output.txt. At last these test files are read by matlab and generate the ouput signal and error signal.
Fig. 19. Experimental Setup for Adaptive Noise Cancellation
The real time voice signal is fed to the matlab code and it produces 24000 samples. These samples are synthesized for different adaptive algorithm such as LMS, SLMS, SSLMS and RLS for different Âµ values. The resulting outputs are shown in the figure 20.
In this paper, we proposed various imprecise or approximate adders that can be effectively utilized to trade off power and quality for errorresilient systems. Our approach focused to simplify the complexity of a conventional MA cell by reducing the number of transistors and also the load capacitances. Note that our approach differed from earlier approaches where errors were introduced due to VOS. A decrease in the number of series connected transistors helped in decreasing the effective switched capacitance and achieving voltage scaling. Using these models, we discussed how to apply these approximations to get maximum power savings subject to a given quality constraint. This procedure has been illustrated for Adaptive Filter in Noise Cancellation algorithms (LMS algorithm). We believe that the proposed approximate adders can be used on top of previously existing lowpower techniques like SDC and ANT to extract multifold benefits with a very minimal loss in output quality.
APPENDIX
DERIVATION OF MEAN ERROR FOR AN APPROXIMATE RCA
As mentioned in Section IVA, the expression for mean error is given by
(y) = E()
=
=
=0
2 E(e[x])
Clearly, e 1, 0, 1 . Thus, E(e[x]) forall
x 0, 1, , can be calculated as following:
E(e[x]) = P(e[x] = 1) – P(e[x] = 1)
= P( [] = 1 ` P Sum [] = 0 )
– P( [] = 0 P Sum [] = 1 )
= (1 – ) – (1 )
= –
2
2
= – 1 , x 0, 1, , 1
E(e[y]) = P( [] = 1 P( [] = 0
P( [] = 0 P( [] = 1
= –
+1
+1
= – 1 1
2 2
Expressions for and were derived in Section IVA. Thus, the mean error for the proposed approximations and truncation can be calculated as follows.

Approximation 1
(y) = – 1 1 2 1 1 1 + (1 1 + 1 )2
REFERENCES

P. Kulkarni, P. Gupta, and M. Ercegovac, Trading accuracy for power with an underdesigned multiplier architecture, in Proc. 24th IEEE Int. Conf. VLSI Design, Jan. 2011, pp. 346351.

V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy, IMPACT: Imprecise adders for lowpower approximate computing, in Proc. IEEE/ACM Int. Symp. LowPower Electron. Design, Aug. 2011, pp. 409414.

D. Shin and S. K. Gupta, Approximate logic synthesis for error tolerant applications, in Proc. Design, Automat. Test Eur., 2010, pp. 957960.

H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, Bio inspired imprecise computational blocks for efficient VLSI implementation of softcomputing applications, IEEE Trans. Circuits Syst. Part I, vol. 57, no. 4, pp. 850862, Apr. 2010.

Y. V. Ivanov and C. J. Bleakley, Realtime h.264 video encoding in software with fast mode decision and dynamic complexity control, ACM Trans. Multimedia Comput. Commun. Applicat., vol. 6, pp. 5:15:21, Feb. 2010.

M. Shafique, L. Bauer, and J. Henkel, enBudget: A runtime adaptive predictive energybudgeting scheme for energyaware motion estimation in H.264/MPEG4 AVC video encoder, in Proc. Design, Automat. Test Eur., Mar. 2010, pp. 17251730.

E. Lyons, V. Ganti, R. Goldman, V. Melikyan, and H. Mahmoodi,
6 =0
1
3 =0 2
2
6 3.221
2 1
2+1
1
Fullcustom design project for digital VLSI and IC design courses using synopsys generic 90nm CMOS library, in Proc. IEEE Int.
(y) = –
6
(2 1)
3
(1 2 ) + (
6
3.2
21 + 2 )
Conf. Microelectron. Syst. Edu., Jul. 2009, pp. 4548.

J. Choi, N. Banerjee, and K. Roy, Variationaware lowpower synthesis methodology for xedpoint FIR lters, IEEE Trans.
= 0.


Approximation 2
Comput.Aided Des. Integr. Circuits Syst., vol. 28, no. 1, pp. 87 97, Jan. 2009.

G. Karakonstantis, D. Mohapatra, and K. Roy, System level DSP synthesis using voltage overscaling, unequal error protection and adaptive quality tuning, in Proc. IEEE Workshop Signal
(y) = 1 1 Ã— 2 + 0
Processing Systems,Oct. 2009, pp. 133138.
=0 2+2
= .
4


Approximation 3
(y) = – 1 1 2 + 1 1 1
+ (1 1 + 1
)2

W. Dally, J. Balfour, D. BlacShaffer, J. Chen, R. Harting, V. Parikh,J. Park, and D. Shefeld, Efcient embedded omputing, Computer,vol. 41, no. 7, pp. 2732, Jul. 2008.
6 =0
3 =0 2+1
6 3.221
2+1
(y) = – 1 (2 1) + 1 (1 2 ) + (2 1
+ 1 )
6 3
= 1 2 .


Approximation 4
(y) = – 1 + 1 1 2 + 1 Ã— 1
6 3.221 2
Ã— 2
2 8 =1
2 2+1
= 121 .
4

Approximation 5
+1
+1
(y) = 0 + 1
2
Ã— 2
= 1 .
2

Approximation 6
y = 1 1 2 + (1 1
)2
2 =0
2 2+1
= 1 2