Area Efficient and Low Power Shift Register Using Delay Circuits and Latch

DOI : 10.17577/IJERTV12IS040219

Download Full-Text PDF Cite this Publication

Text Only Version

Area Efficient and Low Power Shift Register Using Delay Circuits and Latch

Mrs. K. Deenu, R. Vaishnavi N. Sowntharya M. Keerthana

Assistant professor, Electronics and communication engineering,

P.A. College of engineering and technology, Pollachi,


UG Scholar,

Electronics and communication engineering,

P.A. College of engineering and technology, Pollachi,


UG Scholar,

Electronics and communication engineering,

P.A. College of engineering and technology, Pollachi,


UG Scholar,

Electronics and communication engineering,

P.A. College of engineering and technology, Pollachi,


Abstract:- In this study, pulsating latches are suggested as a low- power and space-effective shift register. Pulsed latches are used in place of flip- duds to save space and electricity. By using multitudinous non-overlap delayed palpitated timepiece signals rather of the more common single palpitated timepiece signal, this fashion fixes the timing issue between palpitated latches. By grouping the latches into several sub shifter registers and utilising redundant temporary storehouse latches, the shift register only utilises a small portion of the palpitated timepiece signals. Using a CMOS system with palpitated latches, a 256- bit shift register was created. The focal point is.1.2 mW of electricity are used at a 100 MHz timepiece frequence. In comparison to the traditional shift register with flip- duds, the suggested shift register saves area and power .

Keywords: Area efficient, flip flop, latch, pulsed clock, shift register


One of the primary goals in the construction of VLSI is low power consumption and area reduction. The fundamental unit of a VLSI device is a shift register. It frequently appears in a variety of uses. The shift register's design is very straightforward. M data flip-flops can be used to create the M bit shift register. To minimise space and power usage, shift registers should be designed with the smallest flip-flops possible. A component used for data storing is the flip-flop. The timing frequency of the flip-flops controls how they function.

Multistage Flip-Flop processes with high clock switching activity and then increases time latency when it is handled with regard to clock frequency. As a result, it has an impact on the circuit's pace and energy efficiency.

The fundamental components for keeping information are latches and flip-flops. The static and dynamic design approaches could be used to categorise flip-flops and latches. One piece of information can be stored in a lock or flip-flop. As long as the enable signal is asserted, latches' outputs are continuously influenced by their inputs, which is the primary distinction between them and flip-flops. In other words, when they are turned on, their information is instantly updated in response to input changes. Flip-flops, on the other hand, only experience content changes at the enable signal's rising or lowering edges. Typically, the clock signal that controls this allow signal. Even if the input alters

after the rising or falling edge of the clock, the flip-flop content is unaffected.

Latches are used in a technique that is activated by pulse clock waves. This technique allows for timing analysis and optimization of a latch design while lowering the power consumption of the clock networks.When data is sensitive, which is defined by the width of the clock waveform, a latch can record it. If a latch is activated by a pulse clock waveform, the latch and clock are synchronised in a manner akin to an edge-triggered flip-flop because the rising and falling margins of the pulse clock are timed almost exactly the same. The setup times of a pulsed latch are described using this method with regard to the rising edge of the pulse clock, and hold times are described using the falling edge of the pulse clock. This indicates that edge-triggered flip-flops and pulsed latches both use comparable representations of timing models.


A VLSI circuit's fundamental building component is a shift register.Many uses, including digital filters, communication receivers, and image processing, frequently use shift registers.IC programming Lately, as the amount of image data increased,the word length of the shifter register grows to process big image data in image processing ICs due to the high demand for high quality image data. A 4K-bit shift register is used in a picture extraction and vector generation VLSI chip. 208 channel LCD column controller IC with 10- bit output and a 2K-bit shift register. A shift register with 45K bits is used by a 16 megapixel CMOS image sensor.

The area and power consumption of the shift register are crucial design factors as the word length of the shifter register rises. The shift register can use the smallest flip-flop possible to save space and electricity. Because a pulsed latch is much smaller than a flip-flop, they have recently taken the position of flip-flops in many applications. The synchronisation issue between the pulsed latches, however, prevents the pulsed latch from being used in a shift register.

Figure 1: (a) Master Slave flip-flop

(b) Pulse Generation circuit


In this study, pulsating latches are suggested as a low-power and space-efficient shift register. Instead of the usual single pulsed clock signal, the shift register solves the timing issue using numerous non-overlap delayed pulsed clock signals. By grouping the latches into several subshifter registers and utilising additional temporary storage latches, the shift register utilises a minimal amount of the pulsed clock signals. Shift registers can have inputs and outputs that are sequential or serial. These are frequently set up as "parallel- in, serial-out" or "serial-in, parallel-out" (SIPO) (PISO). Additionally, there are kinds with both serial and parallel output, as well as types with both serial and parallel input. Additionally, "bidirectional" shift registers are available, enabling shifting in both ways.

L, R, or L, R The serial input and last output of a shift register can also be connected to make a circular shift register In earlier studies, the clock was switched every cycle while only a small collection of data patterns were used to measure energy consumption. However, the clock and data activity in actual designs varies greatly between various TE instances. For instance, many TEs whose energy consumption is low come from the extensive use of clock gating in low-power microprocessors. Transitions in the incoming data predominate over transitions in the clock. In comparison, other TEs receive very little data input activity but are still timed every cycle. Similar to counters, shift registers are a type of linear logic. Contrary to combinational logic, sequential logic is influenced not only by the current data but also by the prior history. n other terms, sequential logic retains information about the past. An edge-triggered pulse generator is used in pulsed latch structures to create a brief transparency frame. Pulsed latches have a benefit over master-slave flip-flops in that they only require one latch stage per clock cycle and permit time borrowing across cycle boundaries. The energy consumption of the local clock pulse generators and the greater susceptibility to timing risks are the main drawbacks of pulsed latch structures.

A pulsed latch, which consists of a latch and a pulsed clock signal in Figure 1(a), can be used in place of a master-slae flip-flop employing two latches (b). The pulse production circuit for the pulsed clock signal is shared by all pulsed latches. As a result, the master-slave flip-space flop's and power consumption are nearly cut in half for the pulsed latch. The pulsed latch is a desirable option for its tiny size and low power requirements. As depicted in Fig. 2, the pulsed latch's timing issue prevents it from being used in shift registers. A pulsed clock signal (CLK pulse) and several latches make up the shift registers in Fig. 2

Figure 2: Shift Register with latches

The shifter register's timing issue is demonstrated by the operation waveforms in Fig. 2(b). Because the input signals to the first latch (IN) are harmonious throughout the clock pulse width, the output signal of the first latch (Q1) changes rightly (TPULSE). But, the second latch's output signal (Q2) is unknown since its input signal (Q1) fluctuates during the clock pulse width.


Fig. 3 illustrates how adding delay circuits between latches can help with the timing issue (a). The latch's output signal is laid over and arrives to the following latch after the clock pulse. As seen in Fig. 3(b), the input signals of the second and third latches (D2 and D3) come the same as the output signals of the first and second latches (Q1 and Q2) after the clock pulse, but the output signals of the first and second latches (Q1 and Q2) change during the clock pulse width. As a result, there are no timing issues between the latches and all latches have steady input signals during the clock pulse. Yet, the delay circuits affect in significant area and energy charge.

Figure 3: Shift Register with latches &delay circuits

Utilizing multiple non-overlap delayed pulsed clock signals is another option, as shown in Fig. 4. (a). When a pulsed clock signal passes through delay circuits, the delayed pulsed clock signals are produced. The pulsed clock signal utilised by each latch is different from the pulsed clock signal used by its neighbouring latch. Each latch updates the data as a result of updating the data in its subsequent latch. As a result, there are no timing issues between latches and each latch receives a consistent input for its clock pulse. This technique, however, also necessitates numerous delay circuits.

Figure 4: Shift Register with latches &delayed pulsed clocks

In order to cut down on the quantity of delayed pulsed clock signals, the proposed shift register is separated into sub shifter registers. A 4-bit sub shifter register has five latches and uses the CLK pulse[1:4] and CLK pulse[T] non-overlap delayed pulsed clock signals to conduct shift operations. Four latchesQ1 through Q4store four bits of data in the four-bit subshift register #1, while the final latchT1 stores one bit of temporary data that will be placed in the first latch (Q5) of the four-bit subshift register #2. The delayed pulsed clock generator in Fig. 6 produces five non- overlap delayed pulsed clock signals. The order of the five latches is reversed in the pulsed clock signal sequence. The latch data T1 from the pulsed clock signal CLK pulse is updated first.

Originally, the latch data T1 from Q4 is updated by the pulsed clock signal CLK pulse. Then, the four latch data are progressively updated by the pulsed clock signals CLK pulse1:4. The first latch Q1 receives data from the input of the shift register while the latches Q2-Q4 receive data from their predecessor latches Q1-Q3 (IN). The remaining sub shift registers operate similarly to sub shift register #1, with the exception that the first latch gets its data from the sub shift register before it's temporary storage latch.

Figure 5: Proposed shift register a) Schematic b) Waveforms


Pipeline structures are also utilised to boost system speed, while distributed arithmetic structures are used to increase resource utilisation. The divided LUT technique is also utilised to reduce the quantity of memory needed.

The inner product of two vectors can be calculated using the bit serial distributed arithmetic (DA) algorithm with a predetermined number of cycles. The original DA design maintains a memory or lookup table that contains all feasible binary combinations of the coefficients w[k] of (1). It is clear that for large values of L, the memory space required to store the pre-calculated terms expands exponentially and becomes impractically enormous. By breaking up a single large memory (2L words) into m smaller memories, each of which is 2k words in size, the memory size can be decreased. Using offset binary coding and utilising the resulting symmetries identified in the memories' contents might further reduce the memory capacity to 2L1 and 2L2.Due to the exponential relationship between memory size and filter length, the listed alternatives still run into problems when trying to store coefficient combinations for very large values of L.

Figure 6: Serial distributed arithmetic FIR filter

DA based computations are bit-serial in nature which implies serial distributed arithmetic (SDA) FIR. The effectiveness of mechanisation is a distributed arithmetic approach's benefit.

Distributed Arithmetic still, states that we can use a Look- Up-Table (LUT) to preserve the MAC values and rally the values in agreement with the input data as demanded . In order to conserve hardware resources, LUT might be developed to replace MAC units. The Distributed Arithmetic approach uses hardware for add, shift, and LUT to construct a FIR filter with no multiplications. The potential sums of the coefficients combinations are stored in LUT. Using LUT contents, add and shift hardware is used to implement the filter capability. One of the most well-known ways to use FIR filters is distributed arithmetic. With this arithmetic, digital filters are enforced using registers, memory, and a scaling accumulator.

Figure7: LUT-based DA implementation filter


Figure 1: Output

16- bit shift register with latch Figure 2 : Output 1

Output 2

Proposed shift register with latch and delayed circuits Figure 3: Output 3




Transistor count






Timing summary



LUT-based DA implementation filter PERFORMANCE COMPARISION


This study suggested using pulsed latches to create a shift register that reduces area and power.The shift register uses pulsed latches in place of flip-flops to save area and power. Instead of using a single pulsed clock signal, the timing issue between pulsed latches is resolved using numerous non- overlap delayed pulsed clock signals. By grouping the latches to several sub shifter registers and utilising extra temporary storage latches, only a small number of the pulsed clock signals are utilised. Comparing the suggested shift register to the traditional shift register with flip-flops, area and power savings total 37% and 44%, respectively.


[1] E. Consoli, M. Alioto, G. Palumbo, and J. Rabaey, Conditional push-pull pulsed latch with 726 fJops energy delay product in 65 nm CMOS, in IEEE Int. Solid State Circuits Conf.(ISSCC) Dig. Tech. Papers, Feb. 2016, pp. 482483.

[2] P. Girard, "Low power testing of VLSI circuits: Problems and solutions," in First International Symposium on Quality Electronic Design, March, 2017, pp. 173-179.

[3] H. Partovi et al., Flow-through latch and edgetriggered flip-flop hybrid elements, IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.Papers, pp. 138139, Feb. 2017.

[4] Manoj kumar Nimbalkar, Veeresh PujariDesign of low power shift register using implicit and explicit type flip flop, Vol 5, Article 05357June 2015.

[5] V. Stojanovic and V. Oklobdzija, Comparative analysis of masterslave latches and flip flops for high-performance and low- power systems,IEEEvo 3 no. 4, pp. 536548, 2016.

[6] G. Singh and V. Sallekhana, "Low Power Dual Edge-Triggered Static D Flip-Flop," arXiv preprint arXiv, 1307.3075, 2018.

[7] Xiaowen Wang, and William H. Robinson, A LowPower Double Edge Triggered Flip-Flop with Transmission Gates and Clock Gating IEEE Conference, pp 205-208, 2014.

[8] Y. W. Kim, J. S. Kim, J. W. Kim, and B.-S. Kong, "CMOS

differential logic family with conditional operation for low-power application," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 55, No. 5, 2019, pp. 437-441.

[9] Young-Hyun Jun, Bai-Sun Kong, Sam-Soo Kim August( 2001),Conditional-Capture Flip-Flop for Statistical Power Reduction IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 8.

[10] H. Yamasaki and T. Shibata, A real-time image-feature-extraction and vector-generation vlsi employing arrayed-shift-register architecture, IEEE J. Solid-State Circuits, vol. 42, no. 9, pp. 2046 2053, Sep. 2007.