- Open Access
- Total Downloads : 67
- Authors : Manasa C , Ashwini V
- Paper ID : IJERTV8IS090022
- Volume & Issue : Volume 08, Issue 09 (September 2019)
- Published (First Online): 11-09-2019
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Low Power Implementation and Optimization of a Multiplier with a Desired Throughput
Electronics and Communication department, BMS College of Engineering,
Assistant Professor, Department of ECE BMS College of Engineering Bengaluru, India
Abstract The CMOS technology scaling has led to adverse effects in terms of static power consumption. Technology scaling beyond 100nm makes the CMOS transistor behaving as an ideal switch consuming power only when there is change in the state. All kinds of CMOS circuits can be affected due to static or leakage power. Hence there is a need to reduce power which is very critical. The design of a multiplier is chosen as addition is the most frequently used operation in general purpose system or in any application specific processor. In digital systems, power consumption can be understood and evaluated effectively to come up with several techniques to reduce the same using power saving tools. Supply voltage scaling is also considered in these techniques and thus results in comparison of power at relative voltages. Since power consumption is a fundamental issue, the proposed work presents various techniques to reduce power in the design of a multiplier using tools such as ModelSim, NanoSim and Design Compiler. This work optimizes a design of 16X16 multiplier for low power whilst maintaining a throughput of 100MS/s. The summary and conclusion tabulates the results showing relative comparison and further trade-offs in the system.
Keywords ModelSim, NanoSim, Design compiler.
This paper presents the techniques used to achieve low power using tools such as NanoSim, Design Compiler and ModelSim. The overall task consists of optimizing a 16X16 multiplier for low power whilst maintaining a throughput of 100MS/s. Several CMOS system facilitates for off- chip board level testing on power dissipation effects including tristate on chip buses for various transition in signaling . Low power design gives rise to leakage power and static power dissipation which can be solved by various methodologies of power reduction which has several key elements. Power consumption and power dissipation issues are considered very critical in the current digital systems due to the high usage which demands for effectiveness and high accuracy of the system . Design of a multiplier in various ways can exhibit glitches in data signals and this can be prevented by reconstructing the multiplier networks .
In this work, a multiplier design is chosen. A 16X16 multiplier consists of 3 blocks:
PPGEN- Partial product generator.
CLA- Carry look-ahead adder.
CSA_TREE- Carry save adder tree.
The entire design is replicated on a VHDL system and tested for the circuitry. The VHDL is tested for each block of the multiplier using the tool ModelSim. Further, power estimation simulation is done using Design Compiler which also compiles
the design generating a spice file. Design Compiler can also be used to calculate total area, power consumed at each stage on block level, net power, detailed power, timing report, slack etc. The spice file is saved as .sp file and further used for NanoSim simulations. The critical path for the design at a desired voltage can be calculated using NanoSim. Using the spice netlist, voltage scaling can be done and further checked for reduced values in power.
A procedure of 6 tasks is followed and each stage has its own technique which results in gradual reduction in power. A suitable method is adopted at the end. A brief project flow is represented in the following Fig. 1.
Fig. 1. : Project Flow
Task-1: The circuit is tested with its 3 blocks.
Task-2: A pipeline register is added. This reduces the power. Task-3: 2nd pipelining register is added to reduce critical path delay.
Task-4: A 3rd register of pipeline is added.
Task-5: An interleaving architecture is used to reduce power considerably more compared to pipelining.
Task-6: Combination of pipelining and interleaving is used in parallel which is faster and reduces more power.
A detailed study of variation in the supply voltage to reduce power and corresponding techniques used can be studied to have an ideal supply voltage which would not hamper the functioning of the system . A 16X16 multiplier pipelining is replicated in the following Fig. 2. The representation of the tasks explained above can be seen as the colored dotted lines in the figure below.
Fig. 2. : 16X16 Multiplier Pipelining
The method used in each task is explained in detail in the below sections. The VHDL tested design is compressed using Design Compiler and to calculate all the required details of the design in its initial 3 blocks without any power reducing techniques. There are other values computed on area, timing, slack after introducing each technique in consecutive tasks.
In this task the base version of the 16×16 multiplier is tested. The completed multiplier consists of 3 blocks; a partial product generator (ppgen), a carry-save adder tree (csa_tree) and a carry-lookahead adder (cla).
From function testing, using ModelSim, a throughput of 100MS/s was ensured and using Design Compiler the critical path tcp was found to be 7.57ns. From the same report the slack was determined to be 2.43ns. The total area for this design was reported to be 0.48mmÂ². To estimate the power used by multiplier, a test using NanoSim was performed. The setup was using a specic spice model for the included blocks, a vector of input values and ran for 1000ns. The described test resulted in an average power consumption of about 79.6mW. From the results for the critical path and slack it is clear that the propagation delay of the internal block could be slower without aecting the functionality. To achieve a lower power consumption, voltage scaling was used. From specication for the multiplier, a timing margin of 10% is required. This means the maximum critical path delay could be at most 9ns. This results in the following propagation time scaling factor:
9/ 7.57 = 1.189
Fig. 3 shows the functional tested block from Design compiler. At this step, clock can be set for having the required time margin.
Fig. 3. : Multiplier design block
Design compiler provisions to generate a detailed power report based on different values of clock. For clock= 10ns, power report is generated and shown below in Fig. 4.
Fig. 4. : Detailed power report
From data for the specied spice models used the required supply voltage was determined to be about 2.75V. Performing the same average power estimation simulation as before resulted in a new value of 48.9mW. The relative power consumption could be calculated to be a reduction of 39%. This task is represented in Fig. 5.
Fig. 5. : Reference Multiplier
To improve the performance of the multiplier a 32-bit pipelining register was added in the csa_tree. To minimize the critical path the register was added between the 3rd and 4th level, ensuring equal delay on both sides. The pipelining cut is shown in red in Fig. 2.
The same test as described in task 1 were performed on the modied multiplier. The critical path, slack and area was found to be 5.15ns, 4.84ns and 0.52mmÂ² respectively. The power consumption was estimated to be 70.2mW.
The scaling factor was calculated to be 1.75 which resulted in supply voltage Vdd of 2.02V. After voltage scaling the resultingpower consumption was 21.7mW. The relative power consumption could be calculated to be a reduction of 69%. This task is represented in Fig. 6.
Fig. 7. : Multiplier with two pipeline registers
For this task another pipelining stage was introduced in the csa tree, making it three in total. The nal piplinging stage is shown in blue in Fig. 2.
The test results were as follows; critical path 3ns, slack of 7ns and total area used 0.68mmÂ². The estimated average power was 89.2mW. The scaling factor was calculated to 3.0 resulting in a new supply voltage of 1.48V. After voltage scaling the estimated average power consumption was 13.8mW. The relative power consumption could be calculated to be a reduction of 89%. This task is represented in Fig. 8.
Fig. 8. : Multiplier with three pipeline registers
Fig. 6. : Pipelined Multiplier
C. Task 3
To further improve the critical path delay another pipelining stage was introduced. The placement of the cuts follows the reasoning in task 2 and is shown in yellow in Fig. 2.
The same testes performed, resulted in a critical path 3.77ns, slack of 6.22ns and total area of 0.59 mmÂ². The average power consumption was estimated to 79.8mW. The scaling factor was calculated to 2.39 which resulted in supply voltage of 1.71V after voltage scaling. This resulted in a reduction of the average power to 16.7mW. The relative power consumption could be calculated to be a reduction of 79%. This task is represented in Fig. 7.
In this task an interleaving approach is used for better reduction in power where two multipliers, used in task 1, are parallelized and clocked at half of the original, i.e., 50 MHz whilst having a throughput of 100 MS/s.
The same tests as in previous tasks were performed but the spice le with 50MHz was used instead. These tests resulted in a critical path 7.46ns, slack of 12.54ns and total area of approximately 0.91mmÂ². The average power consumption was estimated to about 91.2mW. Using the equation in task 1 the scaling factor was calculated to 2.4 which resulted in supply voltage of about 1.66V after voltage scaling. This resulted in a reduction of the average power to be about 16.3mW. The relative power consumption could be calculated to be a reduction of 85%. This task is represented in Fig. 9.
Vdd Scale d (V)
Power old (mW)
Power new* (mW)
Relativ e power reducti on (%)
Power reduction comp. to original (%)
Table-1: Power Summary
Fig, 9. : Two parallel multipliers
F. Task 6
The same setup as in the previous task was used for testing and resulted in a critical path 5.12ns, slack of 14.88ns and total area of approximately 1.06mmÂ². The average power consumption was estimated to about 88.85mW.
Using the equation in task 1 the scaling factor was calculated to
3.52 which resulted in supply voltage of about 1.38V after voltage scaling. This resulted in a reduction of the average power to be about 9.5mW. The relative power consumption could be calculated to be a reduction of 82%. This task is represented in Fig. 10.
Fig. 10. : Two parallel pipelined multipliers
In conclusion, increase in the number of pipelining stages results in larger power savings after voltage scaling. On the hand, the interleaving architecture also yields a relatively better reduction in power consumption. The downside of using these topologies, especially interleaving, is the increase in eective area. Dependent on the specication for the desired system either one of the tested techniques could be a viable solution for reducing power consumption. The nal results for all tested cases is presented in Table 1.
The relative power reduction is the reduction in power due to voltage scaling for each task separately. Power reduction compared to the original unmodied multiplier, used in task 1, is presented in the rightmost column.
From the computed values, the considerable power reduction can be seen. But the interleaving approach takes a large area which is a trade-off.
I am fortunate to have the benefit of the guidance of Mrs. Ashwini V, Assistant Professor, E&C Department, BMSCE for her valuable pointers and support in the course of this work.
Mircea R. Stan, Member,Wayne P. Burleson, Low-Power Encodings for Global Communication in CMOS VLSI, IEEE transactions on very large scale integration (VLSI) systems, VOL. 5, NO. 4, December 1997.
Christer Svensson, J. Jacob Wikner Power consumption of analog circuits Received: 1 February 2010/Revised: 24 May 2010/Accepted: 27 May 2010/Published online: 11 June 2010 @Springer Science + Business Media, LLC 2010.
M. Pedram and A. Abdollahi, Low-power RT-level synthesis techniques: a tutorial, IEEE Proc.-Comput. Digit. Tech., Vol. 152, No. 3, May 2005.
Anand Raghunathan, Sujit Dey, and Niraj K. Jha,, Register Transfer Level Power Optimization with Emphasis on Glitch Analysis and Reduction, IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 18, NO. 8, AUGUST 1999.
James T. Kao and Anantha P. Chandrakasan, Dual-Threshold Voltage Techniques for Low-Power Digital Circuits, IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000.
Ashok Vittal, Memberand Malgorzata Marek-Sadowska, Fellow, Low- Power Buffered Clock Tree Design, IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 16, NO. 9, SEPTEMBER 1997.
Andrea Calimera, Student Member ,IEEE, Alberto Macii, Senior Member, IEEE, Enrico Macii, Fellow, IEEE, and Massimo Poncino, Member, IEEE Design Techniques and Architectures for Low-Leakage SRAMs IEEE transactions on circuits and systemsI: regular papers,vol.59,no.9,september2012.
Dhon-Gue Lee, Loai G. Salem, and Patrick P. Mercier Narrowband Transmitters VV 1527-3342/15Â©2015IEEE April 2015.
Ms. Manasa C, a M.tech student in VLSI Design & Embedded systems in the department of Electronics and Communication from BMS College of Engineering, Bengaluru. I have obtained my Bachelors Degree in Telecommunication engineering from BMS Institute of Technology and Management, Bengaluru that is affiliated to Visvesvaraya Technological University (VTU).
Mrs. Ashwini V, Assistant Professor, Department of Electronics and Communication, BMS College of Engineering, Bengaluru. She has obtained her Bachelors degree in Electronics and communication engineering from NIE, Mysore and holds an M.tech (Electronics) from BMS college of engineering.