Construction of a Match-Line based CAM Cell with Pipelining

DOI : 10.17577/IJERTCONV3IS19199

Download Full-Text PDF Cite this Publication

Text Only Version

Construction of a Match-Line based CAM Cell with Pipelining

Supriya. V

M. Tech, VLSI and Embedded Systems, ECE

T. John Institute of Technology Bangalore, India

Mrs. Shashikala. N

Assistant Professor, ECE

  1. John Institute of Technology Bangalore, India

    AbstractContent addressable memory (CAM) provides high- speed search function in a single clock cycle than other hardware and software based search systems. CAM is used in many applications in network routers for packet forwarding and in data compression and in data acceleration. Due to match- line (ML) comparison and search line, CAM consumes more power. Thus, high-speed and low-power ML sense amplifiers are highly sought-after in CAM designs. We introduce a parity bit that leads to sensing delay reduction and power consumption and enhance the robustness of the design against process variations. A feedback loop is employed to auto -turn off the power supply to the comparison elements and hence reduce the average power consumption. The proposed design can work at a supply voltage.

    Index Terms CMOS, content addressable memory (CAM), match-line, search line, match line pipelining.

    1. INTRODUCTION

      The main role of content addressable memory (CAM) is to compare a search data against stored data and return the address of the matching data. In CAM the data are accessed by their contents rather than physical locations. All words in CAM are compared concurrently.

      CAM has three operation modes: READ, WRITE and COMPARE. COMPARE is the main operation, CAM rarely reads or writes. Fig. 1(a) shows a simplified block diagram of a CAM with a search data register and an output encoder. The compare operation starts by loading an n-bit input search word into the search data register. The n pairs of complementary search-lines directly compared with every bit of stored words using comparison circuits. During a pre- charge stage, both SL and ~SL are at VDD and MLs are at ground voltage level. During evaluation stage, complementary search data are broadcast to the SL and ~SLs. When mismatch occurs in any CAM cell, transistor P3 and P4 will be turned on, charging up the ML to a higher voltage level. If their no mismatch in any CAM cell, no charge up path will be formed and the voltage on the ML will remain unchanged.

      CAMs are faster than other hardware and software based search system. It offers high speed search function in single clock cycle. CAMs are used in variety of applications requiring high search speeds. CAM has an attractive solution for number of search applications, such as image coding, Huffman coding. The commercial use of CAM in high

      Fig. 1. Block diagram of conventional CAM

      Through put applications is network routers, data compressors and data packet forwarding. The critical challenges in designing a low-power system for high-speed high-capacity CAMs [1], for parallel search operation are: 1) consumes more power due to high switching activity of the SLs and MLs and 2) a huge peak current occurs at the beginning of the search operation due to concurrent evaluation of MLs may cause a serious IR drop on the power grid, this affects the operational reliability of chip. The speed of a CAM increases the cost of silicon area and power consumption, these two design parameters design to be reduced. Many efforts have been done to reduce both the peak and total dynamic power consumption [2]-[8]. To reduce this peak and average power consumption; they have introduced selective pre-charge and pipe-line architectures

    2. MATCHLINE SENSING SCHEMES

      1. Selective-Precharge Scheme

        We examine three schemes that allocate power to match lines non uniformly. The selective pre-charge, performs a match operation on the first few bits of a word before activating the search of the remaining bits. For example, in a 144 bit word, selective pre charge initially searches only the first 3 bits and then searches the remaining 141 bits only for words that matched in the first 3 bits. Assuming uniform random data distribution, the first 3-bit search should allow only 1/23 words to survive to the second stage to save match line power. There are two sources of overhead that limit the power saving. First, to maintain speed, the initial match implementation may draw a higher power per bit than the search operation on the remaining bits. Second, an application may have data distribution that is not uniform.

        Fig. 2. Simplified schematic of Selective pre charge.

        Fig. 2 is a simplified schematic of selective pre charge [9]. The first bit for initial search and the remaining (n-1) bits for the remaining search. The first cell on the match line is a NAND cell, while the other cells are NOR cells. If there is no match in the first cell, the pre charge transistor is disconnected from the match line, thus saving power. Selective pre charge is the most common method used to save power on match lines, since it is simple to implement and it can reduce power by a large amount in many CAM applications

      2. Pipelining Schemes

      In selective pre charge, the match line is divided into two segments. An implementation may divide the match line into number of segments, where a match in a given segment results in a search operation in the next segment but a miss terminates the match operation for that word. A design uses multiple match line segments in a pipelined fashion is the pipelined match lines scheme. Fig. 3a shows the simplified schematic of a conventional NOR match line structure where all the cells are connected in parallel. Fig. 3b shows the same set of cells are connected in parallel, in this match line are broken into four match line segments that are serially evaluated. If any stage misses, the remaining stages shut off, results in power saving. The disadvantages of this scheme are the increased latency and the area overhead due to pipeline stages.Pipelining enables the use of hierarchical search lines, thus saving power.

      Fig. 3. Pipelined match lines reduce power by shutting down after a miss in a stage

    3. HIERARCHICAL SEARCHLINES Hierarchical search lines are built on the top of pipelined

      match lines. In hierarchical search lines, the few match lines

      survive the first segment of the pipelined match lines Fig. 4 shows an simplified hierarchical scheme. The hierarchical search line scheme divides the search line into a two-level, global search lines (GSLs) and local search lines (LSLs). The match lines are pipelined into two segments, and the search lines are divided into four LSLs per GSL. Each LSL feeds only a single match line. The LSLs are active only when necessary, but the GSL are active every cycle. LSLs are necessarily activated when at least one of the match lines fed by the LSL is active. In many cases, an LSL wont have active match lines in a given cycle; hence there is no need to activate the LSL, thus saving power.

      Fig. 4. Hierarchical search line structures

    4. SEARCH SPEED BOOST USING A PARITY BIT

      1. Pre-computation CAM Design:

        This uses the additional bits to filter some mismatched CAM words before the actual comparison. For the first comparison stage, the extra bits are derived from the data bits. For example, in fig. 5a number of 1s in the stored words are counted and kept in the counting bits segment. When search operation starts, number of 1s counted in the word and stored to the segment on the left of fig. 5a. these extra information are compared first and only those that have the same number of 1s are turned on in the second stage for further comparison. This reduces the pwer required for data comparison.

        The pre-computation and all other existing design shares a similar property. The ML sense amplifier has to distinguish between 1-mismatch ML and matched ML. This makes CAM designs face challenges since the driving strength of the single turned on path gets weaker after each process generation while leakage gets stronger. This problem is referred as Ion/Ioff. Thus, a new auxiliary bit is introduced that concurrently boost the sensing speed of the ML and improves the Ion/Ioff of the CAM by two times.

      2. Parity bit based CAM:

      The parity bit based CAM design is as shown in Fig. 5b consisting of an extra bit derived from the actual data bits and original data segment. We obtain the parity that is odd or even number of 1s.

      Fig. 5. Conceptual view of (a) conventional pre-computation CAM and (b) proposed parity-bit based CAM.

      Fig. 6. 1-mismatch ML waveforms of the original and the proposed architecture with parity bit during the search operation

      The obtained parity is placed to the corresponding word or ML. During the search operation, there is only one single stage as in conventional CAM. Parity bit does not improve the power performance. But it reduces the sensing delay and boosts the driving strength of mismatch case.

      In the case of match, the data segment and the parity bits of the search and the stored word is same, thus word returns a match. When 1 mismatch occurs in the data segment, search word and number of 1s in the stored word must be different by 1. Thus, the corresponding parity bits are different. Therefore we have two mismatches, one from data bits and one from parity bit. If there are two mismatches in data segment, the parity bit are same and overall has two mismatches. If there are more mismatches, we can ignore has not crucial case. The sense amplifier has to identify between 2-mismatch and match cases. This design improves the search speed and Ion/Ioff ratio. Fig. 6 shows the waveform of original and proposed architecture during search operation.

    5. GATED POWER ML SENSE AMPLIFIER DESIGN

  1. Operational Principle:

    The CAM architecture with an effective gated power technique is depicted in Fig. 7. That will be organized into words (rows) and bits (columns).It uses P-type NOR CAM and an ML structure. Transistors M1-M4 will be acting as a comparison unit and the cross coupled inverters will be act as SRAM storage. These are powered by two separate metal rails, namely VDDML and the VDD. The VDD is controlled by a Power transistor Px and a feedback loop can auto turn- off the ML current to save power. Leakage current is one of the sources of power dissipation in low power VLSI design. Due to the charging and discharging of match line the leakage is getting stronger. The purpose of two power rails is to isolate the SRAM cell from any possibility of power disturbances during COMPARE cycle.

    The gated-power transistor Px as shown in Fig. 7, is controlled by a feedback loop, which is denoted by a power control, will automatically turn off Px once the voltage on the ML reaches a threshold. Beginning of each cycle, the ML is initialized by a global control signal EN. The signal EN is set to low and power transistor Px is turned off. This make the signal ML and C1 initialized to ground and VDD respectively. After that, signal EN turns to HIGH and initiates COMPARE phase. If mismatches happen in CAM cells, ML will be charged up. If any number of mismatches, the cells of a row will share the limited current offered by the transistor Px. When ML voltage reaches the threshold voltage of M8, at node C1 voltage will be pulled down. After a minor delay, the NAND2 gate will be toggled and thus power transistor Px is turned off again. ML is not completely charged to VDD, but limited to voltage slightly above the threshold voltage of M8.

    The simulation result of the proposed power controller is as shown in Fig. 8. The slopes of the ML, node C1 and node MLout depends on number of mismatches. If more mismatches happens (128 in the simulation), the ML node and node C1 change faster. If less number of mismatches happens (1 in the simulation) will slow down the transition of node C1 and therefore results in longer delay to turn off transistor Px. The voltage on ML is charged to around 0.5V which is below VDD and hence power consumption reduces.

    Thus we combine this sense amplifier with parity bit scheme. Therefore the new CAM architecture offers both low-power and high-speed operation.

  2. CAM cell layout:

The CAM cell using 65-nm CMOS process is as shown in Fig. 9. The new CAM cell has a similar topology and similar layouts of that of conventional design. These two cell layouts have different heights but same length.

PERFORMANCE COMPARISON

The performance of the proposed design will be evaluated using the conventional circuit and those in [5], [6] as references. In [5], The power consumption is limited by the amount of charge injected to the ML at the beginning of the search. In [6], utilizes the similar concept with a positive feedback loop to boost the sensing speed. Both designs are power efficient.

Fig. 7. (a) Proposed CAM architecture. (b) Each CAM cell is powered by two rails, VDDML for the compare transistors VDD for the SRAM transistors. (c) SRAM

Fig. 8. Waveform of a proposed design power control

Fig. 9. Layout of proposed CAM cell

  1. Peak Current and IR Drop Attenuation:

    The power controller demonstrates a reduction in the transient peak current. This is explained by using bottleneck effect of transistor Px.

    The transient current function is as shown in the Fig. 10, the number of mismatches occurred in a row of 128 CAM cells during the COMPARE cycle of the conventional and proposed designs. The conventional designs peak current increases almost linearly from 25A (1 mismatch) to 1.45mA (64 mismatches) and finally 2.8mA (128 mismatches).In proposed design the overall transient ML charge up and also

    increases with the number of mismatches, it reach its limit due to gated-power transistor Px.

  2. Dynamic power consumption:

    The proposed design consumes less power because the power-gated transistor is turned off when output is obtained from the sense amplifier. This is due to reduced voltage swing on the ML bus. Another factor to reduce average power consumption is the new design does not need to pre- charge the SL buses because the EN signal turns off transistor Px of each row and hence the SL buses need not be pre- charged.

  3. Temperature Variation Analysis:

    The temperature variation analysis of four designs is as shown in Fig. 11. [5] Can work throughout the whole temperature range but having more speed fluctuations. [6] Is the most vulnerable design and thus can work in a narrow range of temperature variation the proposed and conventional design are much more stable with less sensing delay variation.

  4. Process Variation Analysis:

This is a critical issue in a Nano-scale CMOS technology. The feedback loop to turn off the gated-power transistor Px operates digitally and hence is almost insensitive to process variation. There are two scenarios similar to conventional design where the proposed design may sense the result wrongly: 1) the sense amplifier is enabled early, the 1- mismatch ML has not been pulled up to a voltage higher than the threshold value and thus trigger the output inverter 2) the delay of the enable signal is too long, resulting in the matched ML to be pulled up by the leakage current, indicating wrong miss. The [5] and [6] are sensitive to process variations with

(a)

(b)

Fig. 10. Simulated transient current (a) conventional (b) proposed designs.

more errors count. They stop working at 0.9V supply. The proposed and conventional design has no sensing error. At lower supply voltage, the conventional design continues to work correctly, while the proposed design has error conts. This is because both designs operate at the same frequency. The proposed design has a smaller pull-up current due to gated-power transistor Px and hence sometimes error happens. By extending the period for [5] and [6] does not result in any error count reduction, these designs are based on feedback loop structure and decisions are made at the begging of the sensing cycle.

CONCLUSION

By keeping reference [9] and the base paper [10], [9] are efficient in search line scheme and [10] is efficient in speed. By combining both the paper we can obtain even more efficiency.

ACKNOWLEDGMENT

We take this opportunity to express our deepest gratitude and appreciation to all those who have helped us directly or indirectly towards the successful completion of this paper.

REFERENCES

  1. K. Pagiamtzis and A. Sheikholeslami, Content-addressable memory (CAM) circuit and architectures: A tutorial and survey, IEEE J. Solid- State Circuits, vol. 41, no. 3, pp.712-727, Mar. 2006.

  2. A. T. Do, S. S. Chen, Z. H. Kong, and K. s. Yeo, A low-power CAM with efficient power and delay trade-off, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), 2011, pp. 2573-2576.

  3. I. Arsovski and A. Sheikholeslami, A mismatch-dependent power allocation technique for match-line sensing in Content-addressable memories, IEEE J. Solid-State Circuits, vol. 38, no. 11, pp. 1958- 1966, Nov. 2003.

  4. N. Mohan and M. Sachdev, Low-leakage storage cells for ternary content addressable memories, IEEE Trans. Very Large Scale Intgr. (VLSI) Syst., vol. 17, no. 5, pp. 604-612, May 2009.

  5. O. Tyshchenko and A. Sheikholeslami, Match sensing using match line stability in content addressable memories (CAM), IEEE J. Solid-State Circuits, vol.43, no. 9, pp. 1972-1981, Sep. 2008.

  6. N. Mohan, W. Fung, D. Wright, and M. Sachdev, A low-power ternary CAM with positive feedback match-line sense amplifiers, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 3, pp. 566-573, Mar. 2009.

  7. S. Baeg, Low-power ternary content-addressable memory design using a segmented match line, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 6, pp. 1485-1494, Jul. 2008.

  8. K. Pagiamtzis and A. Sheikholeslami, A low-power CAM using pipelined hierarchical search scheme, IEEE J. Solid-State Circuits, vol. 39, no. 9, pp. 1512-1519, Sep. 2004.

  9. C. A. Zukowski and S.-Y. Wang, Use of selective precharge for low- power content-addressable memories, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), vol. 3, 1997, pp. 1788-1791.

  10. Anh-Tuan Do, Shoushun Chen, Zhi-Hui Kong, and Kiat Seng Yeo, A High Speed Low Power CAM with a Parity Bit and Power-Gated ML Sensing

Leave a Reply