Low Latency and Power Efficient Approximate Multipliers using Compressors

S. Suguna¹
Assistant Professor,
Department of ECE, SVS College of Engineering,
Coimbatore, Tamilnadu, India

R. Kiruthika²
²PG Scholar,
Department of ECE, SVS College of Engineering,
Coimbatore, Tamilnadu, India

Abstract:- Approximate computing has been considered to improve the accuracy-performance trade-off in error-tolerant applications. For many of these applications, multiplication is a key arithmetic operation. Given that approximate compressors are a key element in the design of power-efficient approximate multipliers, we first propose an initial approximate 4:2 compressor that introduces a rather large error to the output. According to the mean relative error distance (MRED), the most accurate of the proposed 16x16 unsigned designs has a 44% smaller power-delay product (PDP) compared to other designs with comparable accuracy. The radix-4 signed Booth multiplier constructed using the proposed compressor achieves a 52% reduction in the PDP-MRED product compared to other approximate Booth multipliers with comparable accuracy. The proposed multipliers outperform other approximate designs in image sharpening and joint photographic experts group (JPEG) applications by achieving higher quality outputs with lower power consumptions.

Keywords: Approximate Computing, low-Power, low-area

1. INTRODUCTION

In applications like multimedia signal processing and data mining which can tolerate error, exact computing units are not always necessary. They can be replaced with their approximate counterparts. Research on approximate computing for error tolerant applications is on the rise. Adders and multipliers form the key components in these applications. In approximate full adders are proposed at transistor level and they are utilized in digital signal processing applications. Their proposed full adders are used in accumulation of partial products in multipliers. To reduce hardware complexity of multipliers, truncation is widely employed in fixed-width multiplier designs. Then a constant or variable correction term is added to compensate for the quantization error introduced by the truncated part. Approximation techniques in multipliers focus on accumulation of partial products, which is crucial in terms of power consumption. Broken array multiplier is implemented in, where the least significant bits of inputs are truncated, while forming partial products to reduce hardware complexity. The proposed multiplier in saves few adder circuits in partial product accumulation. In two designs of approximate 4:2 compressors are presented and used in partial product reduction tree of four variants of 8 × 8 Dadda multiplier. The major drawback of the proposed compressors is that they give nonzero output for zero valued inputs, which largely affects the mean relative error (MRE) as discussed later. The approximate design proposed in this brief overcomes the existing drawback. This leads to better precision.

2. SOFTWARE SECTION

Xilinx ISE (Integrated Software Environment) is a software tool produced by Xilinx for synthesis and analysis of HDL designs, enabling the developer to synthesize ("compile") their designs, perform timing analysis, examine RTL diagrams, simulate a design's reaction to different stimuli, and configure the target device with the programmer. The Web Edition is a free version of Xilinx ISE that can be downloaded at no charge. It provides synthesis and programming for a limited number of Xilinx devices. In particular, devices with a large number of I/O pins and large gate matrices are disabled. The proposed approximation is utilized in two variants of 16-bit multipliers. Synthesis results reveal that two proposed multipliers achieve power savings of 72% and 38%, respectively, compared to an exact multiplier. They have better precision when compared to existing approximate multipliers. Mean relative error figures are as low as 7.6% and 0.02% for the proposed approximate multipliers, which are better than the previous works.

Performance of the proposed multipliers is evaluated with an image processing application, where one of the proposed models achieves the highest peak signal to noise ratio.
2.1 MODEL SIM PE
ModelSim PE simulator, offers VHDL, Verilog, or mixed-language simulation. Coupled with the most popular HDL debugging capabilities in the industry, ModelSim PE is known for delivering high performance, ease of use, and outstanding product support. Model Technology’s award-winning Single Kernel Simulation (SKS) technology enables transparent mixing of VHDL and Verilog in one design. ModelSim’s architecture allows platform independent compile with the outstanding performance of native compiled code. An easy-to-use graphical user interface enables you to quickly identify and debug problems, aided by dynamically updated windows. For example, selecting a design region in the Structure window automatically updates the Source, Signals, Process, and Variables windows. These cross linked ModelSim windows create a powerful easy-to-use debug environment. Once a problem is found, you can edit, recompile, and re-simulate without leaving the simulator. ModelSim PE fully supports the VHDL and Verilog language standards. You can simulate behavioral, RTL, and gate-level code separately or simultaneously. ModelSim PE also supports all ASIC and FPGA libraries, ensuring accurate timing simulations. ModelSim PE provides partial support for VHDL 2008.

2.2 INTELLIGENT GUI
An intelligently engineered GUI makes efficient use of desktop real estate. The intuitive arrangement of interactive graphical elements (windows, toolbars, menus, etc.) makes it easy to view and access the many powerful capabilities of ModelSim. The result is a feature rich GUI that is easy to use and quickly mastered. ModelSim redefined openness in simulation by incorporating the Tcl user interface into its HDL simulator. Tcl is a simple but powerful scripting language for controlling and extending applications.

2.3 VERILOG 2001 / SYSTEM VERILOG
ModelSim PE now fully supports IEEE 1364-2001, including SystemVerilog design language features. SystemVerilog is an Accellera standard that provides new constructs for modeling at higher levels of abstraction.

2.4 MEMORY WINDOW
Allows flexible viewing and changing of memory locations. VHDL and Verilog memories are auto extracted in the GUI allowing powerful search, fill, load and save functionality. Memory Window allows pre-loading of memories thus saving the time consuming step of initializing sections of your simulations just to load memories. All functions are available via the command line allowing their use in scripting.

2.5 WAVEFORM FILE MANAGER (wlfman)
This utility allows the manipulation of existing wlf files so you can reduce the amount of information to display. You can view a portion of the original waveform file and modify time scales to compare RTL versus gates.

2.6 PLATFORM AND STANDARDS SUPPORT
ModelSim PE supports both VHDL and Verilog and accelerates VITAL functions, procedures and timing checks. ModelSim PE runs on Windows XP, Vista and 7.

2.7 MODEL SIM PE FEATURES
- Partial VHDL 2008 support
- Transaction wlf logging
- Windows7 Support
- SecureIP support
- SystemC option
- RTL and Gate-Level Simulation
- Integrated Debug
- Verilog, VHDL and SystemVerilog Design
- Mixed-HDL Simulation option
- Code Coverage option
- Enhanced debug option
- Windows 32-bit

2.8 MODEL SIM PE BENEFITS
- Cost-effective HDL simulation solution
- Intuitive GUI for efficient interactive debug
- Integrated project management
- Easy to use with outstanding technical support
- Sign-off support for popular ASIC libraries
- Award-winning technical support.

3. COMPRESSOR
The 4:2 compressor design reduces the complexity of the Wallace tree structure for multiplication [1]. A 4:2 compressor has four inputs and two outputs i.e. the sum and the carry apart from a carry in (cin) and carry out (cout) [5,6,8,9,17,20]. Cin is the cout of the previous compressor and cout becomes the cin of the next compressor.

4. RESULT AND DISCUSSION

4.1 16:16 MULTIPLIER OUTPUT

4.2 COMPRESSOR OUTPUT
5. CONCLUSION

In this brief, a new binary counter based on a novel symmetric bit stacking approach is proposed. We showed that this counting method can be used to implement 6:3 and 7:3 counters, which can be used in any binary multiplier circuit to add the partial products. We demonstrated that 6:3 counters implemented with this bit stacking technique achieve higher speed than other higher order counter designs while reducing power consumption. This is due to the lack of XOR gates and multiplexers on the critical path. The 64-bit and 128-bit counter based Wallace tree multipliers built using the proposed 6:3 counters outperform both the standard Wallace tree implementation as well as multipliers built using existing 7:3 counters.

6. ACKNOWLEDGEMENT

I would like to thank KANNAN, SABARIGIRIRAJ for their support, services, and fruitful discussions.

7. REFERENCES