Development of Architecture for Data Matching using Systematic Error-Correcting Codes

Download Full-Text PDF Cite this Publication

Text Only Version

Development of Architecture for Data Matching using Systematic Error-Correcting Codes

Mr. A. Sivasankar. M.E., R. Sekar

Assistant Professor PG Scholar (VLSI design) Dept. of ECE Dept. of ECE

Regional Office, Anna University Regional Office, Anna University Madurai, TN, India Madurai, TN, India.

R.Gowri

Final Year M.E (VLSI design) Dept. of ECE

Regional Office, Anna University Madurai, TN, India

Abstract: In computing system, to locate the matching entry where the incoming data information is needs to be compared with the stored information.e.g.Cache tag array lookup and translation look-aside buffer matching. Based on the fact that the code word of an ECC is usually represented in a systematic form consisting of the raw data and the parity information generated by encoding, the proposed architecture uses parallelism method to compare the data and that of the parity information. If the stored data is protected with error- correcting codes (ECC) for reliability reason, it is necessary to decode and correct the possible errors before compare with the incoming data. We propose a method to improve the compare latency for information encoded with ECC. The results show that 30% gate count reduction and 12% latency reduction are achieved.

Keywords: Data Comparison, Error Correcting Codes (ECC), Systematic Codes, Encoder.

I INTRODUCTION

In computing system, Data comparison logic that has many applications. A piece of information in cache, the address of information in memory is compared to all cache tags in the same set that contains address. Another place that uses a data comparison circuit is in the translation look-aside buffer (TLB) unit. TLB is used to speed up virtual to physical address translation. Error correcting codes (ECC) are widely used in modern microprocessors to enhance the reliability and data integrity of their memory structures. Correcting codes (ECCs) to protect data and improve reliability, complicated decoding procedure, which must before the data comparison, enlarges the critical path and deterioting the complexity overhead. Data comparison circuit is usually in the critical path of a pipeline stage because the result of the comparison

When the memory array is protected by ECC, it exacerbates the criticality because of the added latency due

to ECC logic. The most recent solution for the matching problem is the direct compare method which encodes the incoming data and then compares it with the retrieved data that has been encoded. Therefore, the method eliminates the complex decoding from the critical path. In addition, as the SA always forces its output not to be greater than the number of detectable errors by more than one, it is the increase of the entire circuit complexity. In this brief, we modified the SA-based direct compare architecture. In proposed system we renovate the SA-based direct compare architecture to reduce the latency and hardware complexity by resolving the aforementioned drawbacks. Consider systematic codes in designing proposed system and propose a low-complexity processing element that computes the Hamming distance faster. Therefore, the time delay and the hardware complexity are decreased considerably even compared with the SA based architecture.

Power consumption is one of the major concerns in modern embedded computing systems. On-chip caches represent sizeable fractions of the total power consumption of microprocessors. Its size reduction is becoming fundamental to develop both low-power and high- performance systems. Even though large caches can improve performance, they equally increase the power consumption. The operation frequency and the size of the transistor are other important factors of the power consumption. Basically, cache power consumption is mainly due to two factors: dynamic switching power (charging and discharging capacitors) and static power (short-circuit currents). Static power is increasing in importance in newer CMOS technologies (like e.g. 0.65 m technology) and it is surpassing dynamic power. Recently, many studies describe new techniques for the reduction of both static and dynamic power consumption. A new proposal for the reuse of the charge potentially lost

during the discharging of unused cells is presented. Two ideas are presented. The first uses the residual charge from cell put into drowsy-mode to charge the neighbors cell. The second and less complex idea is based on the use of the residual charge to drive the nearest cell drowsy bit, through an adequate network and circuitry.

  1. LITERATURE SURVEY

    The incoming data is directly compared to the tag information of the cache memory. The incoming data is encoded and compared with the tag information and also the tag information is decoded and compare with the incoming data. After applying hamming code technique, to find out the number of 1s from the compared result for data matching proposed in [1]. This process takes much more time for data matchingby using saturate adder technique increases the hardware complexity of the architecture. The SPARC64 processor implements an error detection mechanism for execution units and data path logic circuits in addition to on-chip arrays to detect data corruption. Intermittent errors are detected in execution units and data paths are recovered via instruction retry [2]. A soft barrier clocking scheme allows amortization of the clock skew and jitter over multiple cycles and helps to achieve high clock frequency. By using static circuits contributes to achieve short development time. This processor is mostly opted for the RISC processor. The sleep transistors are used in SRAM memory array and peripherals to reducing the cache leakage by more than 2X. Dynamic cache line disable with a history buffer protects the cache from latent defects and infant mortality failures. By using long channel length devices to further reduce the leakage power consumption wherever possible. Aggressive clock gating, fine-grained sleep resolution and wake up counters were implemented to minimize the dynamic power [3]. Error correcting codes and Intel cache safe technology are implemented to ensure the reliability of the cache. Latency is the main problem during the operations in the cache memory. A BCH Encoder\Decoder are used to achieve high transfer rate and decoding of throughput 6.4Gb\s for solid state drives which is built with flash memory channels is usually connected to the host through an advanced high speed serial interface such as SATA III associated with a transfer rate of 6 Gb/s proposed in [4]. Error correcting codes are necessary to overcome the high error rate. Hard to achieve high throughput strong BCH decoders, multiple BCH codes are typically on a high performance SSD controller, leading to a significant increase of hardware complexity. A microprocessor works at 5.2 GHz and the area is 512 mm2, which is used to enable high frequency operations for zEnterprise system. In addition to new tools, high frequency operations are enabling in the proposed system [5]. The design of processor met simultaneously adding new RAS features to the design, the combination of industry leading new RAS and unprecedented chip frequency allows the zEnterprise system to reach new levels of performance and stability for the most critical work load. The proposed system has

    hardware complexity and power consumption problems in the system.

  2. EXISTING METHOD

    The existing system consists the two methods; they are a) encode compare method and b) decode compare method. In first method retrieved bit is directly given to the hamming distance and incoming bit encoded afte given to the hamming distance. This hamming distance determine the both bits are matched are mismatched or any fault in the bits. It takes more time to detect the error. To overcome the above drawback other method is used. Decode and compare architecture here incoming data is directly given to the comparator and retrieved data given to decoder after performing decoding operation compactor check whether the bits are matched or mismatched. In this method complexity is increased and most complicated to process this method.

    Fig .1. a) Decode and compare structure

    b) Encode and compare architecture

  3. PROPOSED ARCHITECTURE

    A new architecture that can reduce the latency and complexity of the data comparison process by using the characteristics of systematiccodes. In addition, a new processing element is presented to reduce the latency and area complexity further.

    Fig.2.Timing diagram of the tag match in

    1. Direct compare method and

    2. Proposed architecture.

      Fig.3. Systematic representation of an ECC code word

      Fig.4.Proposed architecture optimized for systematic code words

      A. Design of Systematic Codes

      In the SA-based architecture [1], the comparison of two code words is applied after the incoming tag is encoded. Therefore, the complicated path consists of a series of the encoding and the n-bit comparison as shown in Fig. 2(a).systematic error correcting codes in which data and parity bits are completely separated as shown in fig.2. The data part is exactly same as the incoming tag field which is immediately offered for the comparison while the parity part is available only after the encoding is completed. However, the comparison of the k-bit tags can

      be started before the remaining (nk)-bit comparison of the parity bits. In the proposed architecture, the encoding technique is used to generate the parity bits from the incoming tag which is performed in parallel with the tag comparison as shown in Fig. 2(b), reducing the overall time delay.

      B.To Computing The Hamming weight

      It contains multiple butterfly-formed weight accumulators proposed to improve the delay and complexity of the hamming distance computation. The function of BWA is to count the number of 1s among the input. It contains multiple stages of half adder each can associated with weight. Mainly it can be connected in the butterfly form to accumulate the carry bits and sum bits. The path reaching the half adder is equal to the weight of the output d can mainly calculated as,

      D = 8I + 4 (J + K + M) + 2 (L + N + O) + P

      In fig 6. represents each half adder contains twoinput and associate with some weights. To reduce the number of half adders OR gate is used. Because to avoid the overlap between sum and carry bits It generates the bitwise difference for either parity or data bits , and the following processing elements count the number of 1s in the architecture.

      Fig.5. Hamming Weight Architecture

      C. Architecture forComputing theHamming Distance

      The proposed system contains multiple butterfly- formed weight accumulators (BWAs) which improves the latency and complexity of the Hamming distance computation. The ultimate function of the BWA is used to count the number of 1s among its input bits.in butterfly weight accumulation unit OR gate is used instead of half adder. OR gate performs addition operations between the two inputs where the output is driven to the next stage of half adder.the combination of half adder and OR gate structure with some weights looks like a butterfly form, so it is called as butterfly weight accumulator unit.

      Fig.6. (a) General structure and (b) new structure revised for the matching of ECC-protected data.

      Hamming distanceand weightare very important and useful concepts in coding. The knowledge of Hamming distance is used to determine the capability of a code to detect and correct errors. we compute the Hamming distance d between the two code words and classify the cases according to the range of d.

      DECISION UNIT

      Table1.Truth table for decision unit

      Let tmaxis thethe numbers of maximally correctable error and rmax denotes the detectable errors. The cases are summarized as follows.

      1. If d = 0, X matches Y exactly.

      2. If 0 <d tmax, X will match Y provided at most tmax errors in Y.

      3. If tmax<d rmax, Y has detectable errors but also have uncorrectable errors.

      4. If rmax<d, X does not match Y.

      The complexity as well as the latency of combinational circuits strongly depends on the algorithm basis. In addition, as the complexity and the latency are usually fighting with each other, it is hard to derive an analytical and fully deterministic equation that shows the relationship between the number of gates and the latency for the proposed architecture and also for the conventional SA-based architecture.

  4. SYNTHESIS AND COMPARISION

    The proposed architecture result reveals that the incoming data is matched with the retrieved code word where the parity bits are compared.

    Fig 6.Data Matching Simulation Output

    is the time delay between the processes which is carried out by using Xilinx 12.1 software module. Area complexity is reduced by choosing appropriate gates for BWA unit which reduces the gate counts in the architecture.

    Table.2 shows that the timing and area related reports which are observed during the process.Ultimately the Results show that 30% gate count reduction and 12% latency reduction are achieved.

    Performance Table:

    Method

    Code Word (n,k)

    Delay

    Area

    Existing System

    (8,4)

    17.4ns

    79%

    Proposed System

    (8,4)

    11.6ns

    70%

    Table 2. Latency and area comparison table

  5. CONCLUSION

    In the proposed architecture if a certain number of erroneous bits are corrected, the incoming data matches the stored data. To reduce the latency,parity information is generated by comparison of the data is parallelized with the encoding process. Separate fields for the data and parity are presented on systematic code, which is used to enable the

    parallel operation. Thus the latency and complexity is minimized using an efficient processing architecture .this proposed processing architecture is effective in minimizing the latency and complexity ,it is consider as a promising solution for comparison of ECC protected data.The proposed method is applicable to diverse application that needs such applications like tag matching of a cache memory.

  6. REFERENCES

  1. W. Wu, D. Somasekhar, and S.-L. Lu (2012.)Direct compare of information coded with error correcting codes IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 20, no. 11, pp. 2147 2151.

  2. H. Ando, Y. Yoshida, A. Inoue, I. Sugiyama, T. Asakawa, K. Morita, T.Muta, and T. Motokurumada, S. Okada, H. Yamashita and Y. Satsukawa, (2003)A 1.3 GHz fifth generation SPARC64 microprocessor in IEEEISSCC. Dig. Tech. Papers, pp. 246247.

  3. J. Chang, M. Huang, J. Shoemaker, J. Benoit, S.-L. Chen, W. Chen, S. Chiu, R. Ganesan, G. Leong, V. Lukka, S. Rusu, and D. Srivastava,( 2007) The 65-nm 16-MB shared on-die L3 cache for the dual-core Intel xeon processor 7100 series IEEE J. Solid- State Circuits, vol. 42, no. 4, pp. 846852.

  4. Youngjoo Lee, Hoyoung Yoo, Injae Yoo, In Cheol Park,(2012,) 6.4Gb/s Multi-Threaded BCH Encoder and Decoder for Multi-

    Channel SSD Controllers pp. 426427

  5. J. D. Warnock, Y.-H. Chan, S. M. Carey, H. Wen, P. J. Meaney,

    G. Gerwig, H. H. Smith, Y. H. Chan, J. Davis, P. Bunce, A. Pelella, D. Rodko,P. Patel, T. Strach, D. Malone, F. Malgioglio, J. Neves, D. L. Rude,and W. V. Huott ( 2012.) Circuit and physical design implementation of the microprocesor chip for the zEnterprise system, IEEE J. Solid-StateCircuits, vol. 47, no. 1, pp. 151163.

  6. Youngjoo Lee, Hoyoung Yoo, Injae Yoo, and In-Cheol Park(2013.), High-Throughput and Low-Complexity BCH Decoding Architecture for Solid-State Drives.

  7. Pedro Reviriego, Juan A. Maestro, and Mark F. Flanagan , (2013.) Error Detection in Majority Logic Decoding of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes VOL. 21, NO. 1.

  8. Mike Sablatash, Senior Member, IEEE (1990) An error- correcting coding scheme for teletext and other transparent data broadcasting VOL. 36, NO. 1,

  9. Hongchao Zhou, Member, IEEE, Moshe Schwartz, Senior Member, IEEE, Anxiao (Andrew) Jiang, Senior Member, IEEE, and Jehoshua Bruck Fellow, IEEE ( 2013.) Systematic Error-

    Correcting Codesfor Rank Modulationvol-25

  10. Khaled A. S. Abdel-Ghaffar and Hendrik C. Ferreira, Member, IEEE (1998)Systematic Encoding of the VarshamovTenengolts Codes and the ConstantinRao Codes VOL. 44,NO. 1,

Leave a Reply

Your email address will not be published. Required fields are marked *