Power Optimized Multi-Bit Flip-Flops Using Gated Driver Tree

DOI : 10.17577/IJERTV2IS50799

Download Full-Text PDF Cite this Publication

Text Only Version

Power Optimized Multi-Bit Flip-Flops Using Gated Driver Tree

1K. Anita Reddy,2R. Jaya Lakshmi ,3S. Madhava rao

1,2,3Dept. of Electronics & Communication Engineering, Malineni Lakshmaiah Engineering College, Singarayakonda, Prakasam District, Andhra Pradesh.

Abstract

In this paper, we will review multi-bit flip-flop concepts, and introduce the benefits of using multi-bit flip-flops in our design. The underlying idea behind multi-bit flip-flop method is to eliminate total inverter number by sharing the inverters in the flip-flops. we proposed to use double-edge-triggered (DET) flip-flops instead of traditional DFFs in the ring counter to halve the operating clock frequency. A novel approach using the C-elements instead of the RS flip-flops in the control logic for generating the clock-gating signals is adopted to avoid increasing the loading of the global clock signal. The technique will greatly decrease the loading on distribution network of the clock signal for the ring counter and thus the overall power consumption. The same technique is applied to the input driver and output driver of the memory part in the delay buffer. Both simulation and experimental results indicate that multi-bit flip-flop using gated driver tree is very effective and efficient method in lower-power designs.

  1. Introduction

    Portable multimedia and communication devices have experienced explosive growth recently. Longer battery life is one of the crucial factors in the widespread success of these products. As such, low- power circuit design for multimedia and wireless communication applications has become very important. In many such products, multi-bit flip-flops and delay buffers (line buffers, delay lines) make up a significant portion of their circuits [1][3]. Such serial access memory is needed in temporary storage of signals that are being processed, e.g., delay of one line of video signals, delay of signals within a fast Fourier transform (FFT) architectures [4], and delay of signals in a delay correlator [2]. Currently, most circuits adopt static random access memory (SRAM) plus some

    control/addressing logic to implement delay buffers. For smaller-length delay buffers, shift register can be used instead. The former approach is convenient since SRAM compilers are readily available and they are optimized to generate memory modules with low power consumption and high operation speed with a compact cell size. The latter approach is also convenient since shift register can be easily synthesized, though it may consume much power due to unnecessary data movement.

    Besides, for a design when considering power consumption, smaller flip-flops are replaced by larger multi-bit flip-flops, device variations in the corresponding circuit can be effectively reduced.

    Fig.1. Maximum loading number of a minimum-sized inverter of different technologies.

    As CMOS technology progresses, the driving capability of an inverter-based clock buffer increases significantly. The driving capability of a clock buffer can be evaluated by the number of minimum-sized inverters that it can drive on a given rising or falling time. Fig. 1 shows the maximum number of minimum- sized inverters that can be driven by a clock buffer in different processes. Because of this phenomenon, several flip-flops can share a common clock buffer to avoid unnecessary power waste. However, the locations of some flip-flops would be changed after this replacement, and thus the wire lengths of nets

    connecting pins to a flip-flop are also changed. To avoid violating the timing constraints, we restrict that the wire lengths of nets connecting pins to a flip-flop cannot be longer than specified values after this process. Besides, to guarantee that a new flip-flop can be placed within the desired region, we also need to consider the area capacity of the region.

  2. MULTI-BIT FLIP-FLOP CONCEPT.

    In this section, we will introduce multi-bit flip-flop conception. Before that, we will review single- bit flip-flop. Figure 2 shows an example of single-bit flip-flop. A single-bit flip-flop has two latches (Master latch and slave latch). The latches need Clk and Clk signal to perform operations, such as Figure2 shows.

    Fig 2: Single-Bit Flip-Flop

    In order to have better delay from Clk-> Q, we will regenerate Clk from Clk. Hence we will have two inverters in the clock path. Figure 3 shows an example of merging two 1-bit flip-flops into one 2-bit flip-flop. Each 1-bit flip-flop contains two inverters, master-latch and slave-latch.

    Due to the manufacturing rules, inverters in flip-flops tend to be oversized. As the process technology advances into smaller geometry nodes like 65nm and beyond, the minimum size of clock drivers can drive more than one flip-flop. Merging single-bit flip-flops into one multi-bit flip-flop can avoid duplicate inverters, and lower the total clock dynamic power consumption. The total area contributing to flip- flops can be reduced as well.

    Fig 3: An example of merging two 1-bit flip-flops into one 2-bit flip-flop.

    By using multi-bit flip-flop to implement ASIC design, users can enjoy the following benefits:

    • Lower power consumption by the clock in sequential banked components

    • Smaller area and delay, due to shared transistors and optimized transistor-level layout.

    • Reduced clock skew in sequential gates.

      Fig 4: A dual-bit flip-flop cell.

      Figure 4 shows an example of dual-bit flip- flop cell. It has two data input pins, two data output pins, one clock pin and reset pin. Use dual-bit flip-flop can get the benefits of lower power consumption then single-bit, and almost no other additional costs to pay. Figure 5 shows the true table of dual-bit flip-flop cell. We could find that when CK is positive edge, the value of Q1 will pass to D1, and the value of Q2 will pass to D2. Or Q1 and Q2 will keep original value.

      Fig 5: The true table of dual-bit flip-flop cell.

  3. MULTI-BIT FLIP-FLOP METHODOLOGY.

    In the section, we will introduce that how to use Design Compiler and Faradays multi-bit flip-flop to implement ASIC design.

    3.1 The criteria of using multi-bit flip-flop.

    Multi-bit flip-flop cells are capable of decreasing the power consumption because they have shared inverter inside the flip-flop. Meanwhile, they can minimize clock skew at the same time.

    To obtain these benefits, the ASIC design must meet the following requirements. The single-bit flip-flops we want to replace with multi-bit flip-flop must have same clock condition and same set/reset condition. When you set the variable hdlin_infer_multibit as default_all, Design Compiler will use multi-bit flip-flop to replace bus type single-bit flip-flops. For non-bus condition, your must use create_multibit to identify the multi-bit flip-flop candidates.

  4. MEMORY ORGANIZATION BETWEEN EACH NODE.

    In the proposed memory organization, several power reduction techniques are adopted. Mainly, these circuit techniques are designed with a view to decreasing the loading on high fan-out nets, e.g., clock and read/write ports.

      1. RING COUNTER.

        This ring counter proposed to replace the RS flip-flop by a C-element and to use tree-structured clock drivers with gating so as to greatly reduce the loading on active clock drivers. Additionally, DET flip- flops are used to reduce the clock rate to half and thus also reduce the power consumption on the clock signal. The proposed ring counter with hierarchical clock gating and thecontrol loic is shown in above figure. Each block contains one C-element to control the delivery of the local clock signal CLK to the DET flip-flops, and only the CKE signals along the path passing the global clock source to the local clock signal are active. The gate signal (CKE ) can also be derived from the output of the DET flip-flops in the ring counter. The C-element is an essential element in asynchronous circuits for handshaking.

        Fig 4.1: Ring Counter with clock gated by C-elements.

      2. GATED DRIVER TREE.

    To save area, the memory module of a delay buffer is often in the form of an SRAM array with input/output data bus as in [6]. Special read/write circuitry, such as a sense amplifier, is needed for fast and low-power operations. However, of all the memory cells, only two words will be activated: one is written by the input data and the other is read to the output. Driving the input signal all the way to all memory cells seems to be a waste of power.

    The same can be said for the read circuitry of the output port. In light of the previous gated-clock tree technique, we shall apply the same idea to the input driving/output sensing circuitry in the memory module of the delay buffer. The memory words are also grouped into blocks. Each memory block associates with one DET flip-flop block in the proposed ring counter and one DET flip-flop output addresses a corresponding memory word for read-out and at the same time addresses the word that was read one-clock earlier for write-in.

    Fig 4.2 : Gated Driver Tree

  5. RESULTS.

    Fig 5.1: Simulation Results

    Fig 5.2: Power Analysis Report

  6. CONCLUSION.

    In this paper, we presented Multi-Bit Flip- flops in combination with gated driver tree to reduce the power consumption. The ring counter with clock gated by the C-elements can effectively eliminate the excessive data transition without increasing loading on the global clock signal. The gated-driver tree technique used for the clock distribution networks can eliminate the power wasted on drivers that need not be activated. The Simulation Results and Power Analysis Report indicate that multi-bit flip-flop in combination with gated driver tree is very effective and efficient method in lower-power designs.

  7. REFERENCES.

[1]Ya-Ting Shyu, Jai-Ming Lin, Chun-Po Huang, Cheng-Wu Lin, Ying-Zu Lin, and Soon-Jyh Chang Effective and Efcient Approach for Power Reduction by Using Multi-Bit Flip-FlopsIEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS.

[2]W. Eberle et al., 80-Mb/s QPSK and 72-Mb/s 64-QAM flexible and scalable digital OFDM transceiver ASICs for wireless local area networks in the 5-GHz band, IEEE J. Solid-State Circuits, vol. 36, no. 11, pp. 18291838, Nov.

2001.

  1. M. L. Liou, P. H. Lin, C. J. Jan, S. C. Lin, and T. D. Chiueh, Design of an OFDM baseband receiver with space diversity, IEE Proc.Commun., vol. 153, no. 6, pp. 894900, Dec. 2006.

  2. M. R. Stan and W. P. Burleson, Bus-invert coding for low-power I/O, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 3, no. 1, pp. 49 58, Mar. 1995.

  3. J. F. Tabor, Noise reduction using low weight and constant weight coding techniques, M.Sc. thesis, Artif. Intell. Lab., MIT, Cambridge, MA, 1990.

  4. Y. Benezeth, P. Jodoin, B. Emile, H. Laurent, and C. Rosenberger, Review and evaluation of commonly- implemented background subtraction algorithms, in IEEE International Conference on Pattern Recognition (ICPR), pp. 14, December 2008

[8] M. R. Stan and W. P. Burleson, Coding a terminated bus for low power, in Proc. 5th GLSVLSI, 1995, pp. 7073.

Leave a Reply