 Open Access
 Total Downloads : 388
 Authors : Hari Krishna Raja V S, Christina Jesintha R, Harish I
 Paper ID : IJERTV4IS030016
 Volume & Issue : Volume 04, Issue 03 (March 2015)
 DOI : http://dx.doi.org/10.17577/IJERTV4IS030016
 Published (First Online): 10032015
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
Implementation and Impact of LNS MAC Units in Digital Filter Application
Hari Krishna Raja .V .S*, Christina Jesintha .R* and Harish .I*
*Department of Electronics and Communication Engineering, Sri Shakthi Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India – 641062
Abstract The logarithmic number system (LNS) is an efficient way to represent data in VLSI processors because its roundoff error behavior resembles that of floating point arithmetic. LNS reduce the power dissipation in signalprocessingrelated application such as hearingaid devices, video processing and error control. This paper presents techniques for lowpower addition/subtraction in the LNS and quantifies their impact on digital filter VLSI implementation. The operation of addition and subtraction are difficult to perform in LNS as complex look up tables (LUTs) are needed. The impact of partitioning the look uptables required for LNS addition/subtraction on complexity performance and power dissipation is quantified. LNS base and LNS word are the two design parameters exploited to minimize complexity. A roundoff noise model is used to demonstrate the impact of base and wordlength on SNR of the output of FIR filters. In addition, techniques for lowpower implementation of an LNS multiply accumulate (MAC) units are investigated. The proposed techniques can be extended to cotransformationbased circuits that employ interpolators. The results are demonstrated by evaluating the power dissipation, complexity and performance of several FIR filter configurations comprising one, two or four MAC units. Simulation of placed and routed VLSI LNSbased digital filters using Xilinx ISE reveal that significant power dissipation savings are possible by using optimized LNS circuits at no performance penalty, when compared to linear fixedpoint twoscomplement equivalents.
Keywords Computer arithmetic, MAC, LNS, LUT

INTRODUCTION
Data representation is an important parameter in the design of lowpower processors since it affects both the switching activity and hardware complexity [4]. The logarithmic number system (LNS) has been investigated as an efficient way to represent data in special purpose VLSI processors, since it allows for simple arithmetic circuits under certain conditions. In particular, LNS exploits the properties of the logarithm to reduce the basic arithmetic operations of multiplication, division, roots, and powers to binary addition, subtraction, and right and left shifts, respectively. In addition to simplifying several operations, LNS provides efficient data representation because its round off error behavior resembles that of floatingpoint arithmetic. In fact, LNSbased systems have been proposed that exhibit characteristics similar to 32 bit singleprecision floatingpoint representation [1]. The operations of addition and subtraction are rather awkward to perform in LNS as complex lookup tables (LUTs) or other approximation circuitries are needed. While for short word lengths simple techniques based on LUTs suffice, more
elaborate approximation techniques are required for longer word lengths.

Existing System
Several authors have proposed solutions to reduce complexity of awkward LNS operations. Mahalingam et al.
[11] improve Mitchells Algorithm in terms of the accuracy of the logarithmic operations, while Johansson et al. [10] use a method based on sums of bit products to implement the basic logarithmic functions. Arnold et al. [2] suggest the use of cotransformations for the reduction of the LUT. Very recently, Ismail et al. [9] presented a co transformation procedure and an improved interpolation method that reduce the size of LUT to an extent that allows their easy synthesis in logic. Arnold et al. [3] propose complex LNS as a generalization of LNS, which represents complex values in logpolar form.For several practical applications, the benefits of LNS are found to be more important than its inherent disadvantages. In particular, several authors have shown that LNS reduces power dissipation in signalprocessingrelated applications, ranging from hearingaid devices and sub band coding, to video processing and error control. Moreover, logarithmic techniques have been employed in turbo code decoding for wireless communication applications. In particular, logarithmic representation has been proved to be suited for the implementation of the symbolbysymbol logarithmic maximum a posteriori algorithm used for iterative decoding. Peng et al. have adopted LNS for the implementation of an FFT based logsumproductdecoding algorithm used in decoding of non binary lowdensity parity check codes. In particular, the impact of the selection of the base b of the logarithm has been investigated as a means to explore tradeoffs between precision and dynamic range given a particular word length. Paliouras et al. address the low power LNS properties from a representational viewpoint and do not focus on power dissipation estimation data obtained by circuit simulations.

Proposed System
The proposed study focuses on the use of partitioning as a technique to limit the exponential growth of the size of LUTs with the word length. The technique is simple and leads to fast circuits. Initially, extending, optimal selection of LNS design parameters is sought, including word length and base assuming a simple partitioned LUT architecture. Subsequently, in the second stage of the proposed framework
the design techniques and the derived architectures are presented, targeting LNS MAC units in 90nm technology. To illustrate the use of the proposed framework, the area timepower design space of a lowpass finite impulse response (FIR) filter is explored for several configurations of MAC units. A similar study has recently been performed by Galal and Horowitz in the different context of floatingpoint arithmetic [7]. Departing from direct LUT organization, a variety of LNS architectures for addition, subtraction, and multioperand operations, has been proposed in the literature employing interpolation, linear or polynomial approximation aiming at reducing the memory requirements, particularly for larger word lengths, such as 32 bits [8]. These ideas have been combined with mathematical decompositions and transformations of the basic operations, exploiting the particular characteristics of the functions to further simplify approximation [9]. Beyond the representational properties of LNS that have an impact on switching activity, LNS arithmetic units have structural characteristics that can be exploited to reduce power dissipated. In particular, they comprise

mutually exclusive subunits, which can be used selectively, and

Imbalanced delay paths.
Therefore, simple lowpower design techniques are found to suit an LNS adder/subtractor organization very well; the impact is quantified for the case of lookupbased architectures, but other LNS architectures may benefit as well, in terms of reducing power dissipation. Extension to other architectures is demonstrated using an interpolation based subtractor as an illustrative example. Partitioned LUT circuits provide high speed; more sophisticated techniques can be used to reduce size. In summary, the contributions of this paper are as follows:

a lowpower design framework for LNS systems,

the quantification of power dissipation reduction and performance improvement made possible by using LNS
,compared to equivalent binary implementations,

the design space exploration using the number of LUTs fo addition /subtraction as a parameter ,for the case of using combinational logic for LUT implementation, and

the extensions of SNR models in LNS for the case of b2.


LNS BASICS
The basic idea in LNS is to use logarithms to represent data. Since the logarithm of a negative number is not real, to represent signed numbers in LNS, the sign information is stored as a separate bit sx, and used in combination with the logarithm of the magnitude of the number. Furthermore, since the logarithm of zero is not a finite number, an additional singlebit flag zx is used to denote that a number is zero. Summarizing, X denotes the original number, x denotes the logarithm of the absolute value of X, and XLNS is a triplet
containing the sign bit, the zero bit and x. Formally in LNS, a number X is represented as the triplet
LNS = (zx ,sx ,x), (1)
Where zx is asserted in the case that X is zero, sx is the sign of X and x=log b(X), if X is not zero, with b being the base of the logarithm, also called base of the representation. The choice of b plays a crucial role in the representational capabilities of the triplet in (1), as well as the computational complexity of the processing and forward and inverse conversion circuitry. Due to the basic properties of the logarithm, the multiplication of XLNS and YLNS is reduced to the computation of the triplet ZLNS
ZLNS = (zx, sx, z), (2)
Where zZ = zx V zy, sZ = sX xor sy, and z = x + y. Similarly, the case of division reduces to binary subtraction. The derivation of the logarithm a of the sum A of two triplets is more involved, as it relies on the computation of
a =max{x,y} + logb(1+bxy), (3)
= max{x,y} + a(d), (4)
Where a(d) = logb(1+bd) and
d= xy. (5)
Similarly, the derivation of the difference of two numbers, requires the computation of
c =max{x,y} + logb(1bxy), (6)
= max{x,y} + s(d), (7)
Assume that a TC word is used to represent the logarithm x, composed of a kbit integral part and an lbit fractional part.
The range DLNS spanned by x is given by
(8)
a linear TC representation of i integral bits and f fractional bits. In general, LNS offers a superior range, over the linear TC representation. This is achieved using comparable word lengths, by departing from the strategy of equispaced representable values and thus resorting to a scheme that
resembles floatingpoint arithmetic. The basic organization of an LNS adder/subtractor is shown in Fig. 1. The parallel subtraction are implemented, followed by a multiplexer, which computes d according to the rule
s1 = x y, (9)
s2 = y x, (10)
Fig. 1. The organization of an LNS adder/subtractor
(11)
The choice exploits the sign of either (9) or (10), as a select signal for the multiplexer. The same signal is used to select the maximum of x and y, required for the computation of (4) and (7). The complexity of LNS circuitry arises from the fact that the values of functions a and s should be computed by the LNS addition/subtraction circuit hardware for all required values of d. There are two main approaches to implement the evaluation of functions, namely the hardware implementation of an approximation algorithm or the offline precomputation and storage of all required values in an LUT. The former approach is generally adopted for highprecision applications, while the latter approach is generally preferable for smaller word lengths, i.e., in relatively lowprecision applications where the size of the required LUTs is moderate. Both approaches have been extensively studied in the context
of elementary function approximation. Let x denote the base b logarithm of X and x2 denote the base2 logarithm of X. Since x = logbx = x2(logb 2), the conversion between a baseb LNS and a base2 LNS requires scaling by a constant factor. Several authors have studied hardware implementations of converters to/from base2 LNS. In this paper, although conversion is neglected the conclusions about power consumption are valid for the complete application. To better clarify this we assume an FIR filter of order N, requiring about NMAC operations for each input conversion and each output conversion. If Ein is the average energy for one input conversion, Eout is the average energy for one output conversion, and Ey is average energy for one MAC, the total energy (after initialization) for the FIR filter to produce each result is EFIR=Ein+Eout+N.Ey. For sufficiently large values of N, the percentage of energy consumed in the multiplyadd units may approach 100 percent of the total as limN Ey/ EFIR =1.0.

LOWPOWER DESIGN OF LNS CIRCUITS
In this section, lowpower LNS architectures for addition and subtraction are presented. The memory structure is organized as a collection of LUTs and is the most complex part of the LNS adder/subtractor. Several designs were investigated, distinguished by two choices, i.e., first, the choice of using either latches or D flipflops (DFFs) to freeze the addresses of inactive subLUTs, and, second, the choice to select the active subLUT either based on the most significant bits (MSB) or on the least significant bits (LSB) of d in (11). In the proposed design framework, power dissipation reduction is sought by partitioning the particular LUTs into smaller LUTs, called subLUTs, only one of which is active per operation. This organization is shown in Fig. 1. To guarantee that no dynamic power is dissipated in the inactive subLUTs, the corresponding subLUT addresses are latched and remain constant throughout a particular operation. Complexity reduction in LNS processors by partitioning of the LUTs has been successfully applied. Here, we focus on combinational logic implementation of LUTs, instead of memorybased implementation. The organization of the LNS adder/subtractor comprises N subLUTs per operation, as shown in Fig. 1. The upper subLUT system corresponds to function a(d) required for LNS addition, i.e., addition of operands having the same sign, while the lower subLUT system is used for LNS subtraction, i.e., addition of operands of different signs.

Organization And Complexity Of LUT Subsystem In LNS Adder/Subtractor
Assume that b denotes the logarithmic base and l is the number of the fractional bits employed in the representation of the logarithms. Therefore, differences among the values stored in LUT2 are limited to their less significant part; therefore, the less significant part is the only one that needs to be stored for each value. Hence, fewer bits per entry are required to be stored in LUT2 than in LUT1. SubLUTs that correspond to the upper parts of the interval need to store data words of reduced length, since stored values share a common most significant part. The possibility to determine the active subLUT using the LSBs of d is of interest, as LSBs are available early in the computation of d; thus, allowing the fast
generation of selection signals. However, a partitioning scheme based on LSBs does not facilitate memory compression since consecutive function samples are stored in different subLUTs.

Implementation Of LNS Adder/Subtractor
Fig. 2. Fourlatch organization using LSB
Fig.2. depicts an architecture using a onebit LSB selection for LUT partitioning. Latches are connected to the inputs of the subLUTs. Sign s and d0 are used to generate a signal that enables the latches at the input of the subLUT required to be activated for a particular computation. The LSB should reach the latches fast enough, considering the additional delay of computing x – y, to avoid the violation of timing constraints defined by the standardcell library.
Fig.3. depicts a DFFbased architecture. It is noted that a latchbased gated clock is used for the DFFs, since additional signals are used to enable the corresponding flipflops. Significant advantage is obtained since gated clocks achieve further power savings and also the problem f setup and hold time violations is easier to resolve, since glitches are avoided.
Since the utilization of the MSB for LUT selection is not efficient for a latchbased design due to additional hardware used to introduce the required delay to fast paths of the circuit, a solution based on DFFs is preferable.
Fig. 3. DFF organizations using the MSB


PROPOSED LNS MAC ARCHITECTURES
Fig. 4. Organization of singleMAC architecture.
The basic structure of the singleMAC unit is shown in Fig.4. Symbols * and + denote a multiplier and an adder, respectively, while D denotes a delay unit, implemented as a register. The LNS equivalent to singleMAC architecture is depicted in Fig.5, where the binary multiplier has been replaced by an adder, and the binary adder is mapped to an LNS adder/subtractor. The LNS adder/subtractor is augmented with saturation circuitry and exploits a zero flag to avoid unnecessary activation of LUT partitions and further reduce power dissipation.
In the implementation of Fig.5, it is evident that the paths to the inputs of the final adder are not balanced; thus, leading to excessive switching activity at the adder following the memory structure.
Fig. 5. LNS MAC unit.

RESULT ANALYSIS
Fig. 6. Simulation of LNS MAC
The result analysis is made to compare the area, power, and timing constraints between single MAC architecture and LNS MAC architecture. From the synthesized results, LNS MAC architecture produces better performance. The estimated result of LNS MAC Units using Xilinx ISE is shown below in Fig.6.

CONCLUSION
The adoption of LNS can lead to very efficient circuits for digital filtering applications when appropriately selecting the logarithmic base and the word length in a contemporary 90 nm technology outperforming circuits based on TC arithmetic. An LNSbased system using the proposed adder/subtractor offers substantial power dissipation savings at no performance penalty. Partitioning of the LUTs is employed to create parts in the circuit that can be independently activated, thus, reducing power dissipation. Power has been reduced by latching the inputs to the LUTs. Furthermore, the gated clock technique has been used to
further reduce power consumption performed to the latched inputs due to the clock signal. It has been shown that the choice of number of subLUTs is an important design parameter that can be employed for exploration of the area, time, and power design space. Furthermore, the application of retiming is particularly useful in avoiding unnecessary switching activity, due to unbalanced delay paths in LNS arithmetic circuits. By properly defining wordlength, base, circuit architecture and LUT organization it has been shown that the LNSbased MACs can outperform the corresponding TC ones in both power and delay complexities, for specific practical word lengths.
ACKNOWLEDGMENT
The authors would like to thank the reviewers for their comments which helped improving the presentation of this work.
REFERENCES
[1]. Arnold .M.G, Bailey .T.A, Cowles .J.R, and Winkel .M.D, (1992) Applying Features of the IEEE 754 to Sign/Logarithm Arithmetic,IEEE Trans. Computers, vol. 41, pp. 10401050.
[2]. Arnold .M.G, Bailey .A.T, Cowles .J.R, and Winkel .M.D, (1998) Arithmetic CoTransformations in the Real and Complex Logarithmic Number Systems, IEEE Trans. Computers, vol. 47, no. 7, pp. 777786. [3]. Arnold .M and Collange .S, (2011) A Real/Complex Logarithmic Number System ALU, IEEE Trans. Computers, vol. 60, no. 2, pp. 202213. [4]. Chen .K.H and Chiueh T.D, (2006) A LowPower Digit Based Reconfigurable FIR Filter, IEEE Trans. Circuits and Systems II: Express Briefs, vol. 53, no. 8, pp. 617621. [5]. Coleman .J, Softley .C, Kadlec ,Matousek .J , Tichy .R .M, Pohl.Z, Hermanek .A, and Benschop .N, (2008) The European Logarithmic Microprocesor, IEEE Trans. Computers, vol. 57, no. 4, pp. 532546.
[6]. Collange .S, Detrey .J, and Dinechin F.de, (2006) Floating Point or LNS: Choosing the Right Arithmetic on an Application Basis, Proc. Ninth Euromicro Conf. Digital System Design (DSD 06), pp. 197203. [7]. Galal .S and Horowitz .M, (2011) EnergyEfficient Floating Point Unit Design, IEEE Trans. Computers, vol. 60, no. 7, pp. 913922. [8]. Henkel .H, (1989) Improved Addition for the Logarithmic Number System, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 37, no. 2, pp. 301303. [9]. Ismail R.C and Coleman J.N, (2011) ROMless LNS, Proc.IEEE Symp. Computer Arithmetic, pp. 4351.
[10]. Johansson .K .V, Gustafsson .O .S, and Wanhammar .L, (2008) Implementation of Elementary Functions for Logarithmic Number Systems, IET Computers and Digital Techniques, 2008. [11]. Mahalingam .V and Ranganathan .N, (2006) Improving Accuracy in Mitchells Logarithmic Multiplication using Operand Decomposition, IEEE Trans. Computers, vol. 55, no. 12, pp. 15231535.