Crosstalk Effects on Global Interconnects in Multi core Processors

DOI : 10.17577/IJERTV1IS5116

Download Full-Text PDF Cite this Publication

Text Only Version

Crosstalk Effects on Global Interconnects in Multi core Processors

Kalpana.A.B, P.V.Hunagund,

Assistant Professor Department of Electronics and communication, Bangalore Institute of Technology, Bangalore, INDIA,

Professor Department of Applied Electronics, Gulbarga University, Gulbarga, INDIA.

Abstract

One of the most harmful effects of noise on circuit operation is the degradation of signal integrity causing uncertainty in the signal delay. The uncertainty of the propagation delay of a signal can cause a catastrophic violation of the timing constraints within a system. For capacitively coupled interconnect lines, In this paper the effect of signal degradation for different interconnects lengths have been observed and simulations are done at 32nm and 45nm technology nodes.

  1. Introduction

    Due to continuous advances in technology scaling, modern integrated circuits consist of billions of transistors [1]. Traditionally, the operating speed of an integrated circuit had been assumed proportional to the speed of a logic gate. The interconnects between the gates were considered as ideal conductors that propagated signals instantaneously and had little effect on circuit operation. Such approximations are however no longer adequate, since the physical dimensions of interconnects have been greatly reduced while the operating speeds have increased. For example, in a modern 32 nm technology [2] the width and thickness of local wires are measured in only tens of nanometers, while the clock frequency is in the range of several GHz. Due to this scaling, the performance of interconnects is increasingly affected by their electrical parasitics, i.e. resistance, capacitance and inductance. These parasitics may result in long propagation delays for a signal travelling on interconnects or in signals that have been distorted by noise. The transmission of such a signal requires charging or discharging the wire capacitances which in turn consumes energy. This energy dissipated in the interconnect structure is projected to grow dramatically due to higher frequencies and increases in the number of metal layers [3]. For example, in [4] over 50% of the dynamic power consumption of a microprocessor was determined

    to be consumed by interconnects. In addition to transmitting data signals, on-chip wires are also used to distribute an operating voltage and the clock signal. The wires need to provide a constant operating voltage across the chip despite the increasing switching speeds and device count. The design of digital systems is further complicated by the fact that both wires and devices also suffer from process variations, i.e. their manufactured properties differ from the ideal designed values. Overall, due to these growing delay, signal integrity and energy issues in interconnects, there has been a shift of focus from devices to wires, or from computation to communication. This has resulted in a need for novel design tools and models that can be used to analyze and optimize on-chip interconnects

    Figure 1. Delay for local (Metal 1) and global wiring versus feature size [3].

  2. On-Chip Global Communication

    The interconnects in an integrated circuit can be loosely divided into local, intermediate and global interconnects depending on their length, size and metal layer. An integrated circuit today often contains several large intellectual property (IP) blocks, such as memory, processing elements and interfaces. These IP blocks need to communicate with each other over long distances and they are linked by wide global interconnects that span at least one block or at most the length of the chip edge. While these global interconnects are routed in the top metal layers, the lower metal layers in turn are used by local narrow interconnects that connect

    neighboring gates. The aforementioned scaling issues do not affect all interconnect types in an equal manner, as illustrated in Fig. 1. Unlike gate delays which are reduced as their dimensions become smaller, the delay of a fixed-length wire increases when its dimensions are scaled [5]. For local wires this delay increase is alleviated by the fact that their length is reduced with scaling since they need to connect nearby gates whose sizes diminish with scaling. However, the length of global wires is not scaled with technology since they may need to run across the chip. This has resulted in a growing delay gap between gates and global interconnects. Despite such efforts as increased aspect ratios, low-resistivity wire materials like copper, and low-k (permittivity) dielectric, global signaling often remains a major bottleneck in modern integrated circuits. In order to provide a high bandwidth, on-chip communication links are normally constructed of multiple wires. Among common communication architectures are point-to-point links, buses and a network-on-chip (NoC) [6, 7, and 8].

    Figure .2: An on-chip communication link consisting of multiple parallel wires.

    In practice, buses are often implemented using techniques such as bus splitting [9] to reduce the total wire load. NoC links on the other hand are typically modular and structured interconnects running between routers. In addition, because of delay and signal integrity issues interconnects are commonly broken with repeaters [10] into segments. Therefore, in the physical level the communication often reduces to multiple wires running in parallel. In this paper, the focus is on long, multiple parallel wires that typically form a part of a communication link as depicted in Fig. 2. A common way to implement a long on-chip communication link is by using voltage-mode signaling with buffering. The delay of an RC interconnects increases quadratically with length since both resistance and capacitance increase linearly with wire length. The basic principle behind buffering is to reduce this delay increase to linear by inserting repeaters along the wire. The total delay then becomes equal to the number of wire segments multiplied by the individual segment

    delay. In addition to delay reduction, buffering can be used to reduce noise. In order to achieve the desired objective, the repeaters need to be both spaced and sized appropriately. In addition to the common voltage mode signaling, other signaling techniques for global on-chip communication have also been proposed. These include e.g. encoded, current-mode, and differential signaling. The objective is typically to enhance signaling speed, power dissipation, signal integrity or a combination of these. Bus encoding uses additional bus wires and encoding and decoding logic to alter the signals to be transmitted on a bus. The encoding is used to avoid certain bit patterns that would result in high noise, delay or power. On the other hand, in differential signaling, a signal is transmitted over a pair of wires where the second wire is carrying the complement of the original signal. A differential signal acts as its own receiver reference and offers improved noise immunity by rejecting common mode noise. The signal swing is also effectively doubled, thus increasing noise margins and improving speed as the rise and fall times at the receiver are reduced [11]. In voltage-mode signaling, the interconnects need to be fully charged to propagate a signal. This is avoided in current modesignaling, where the interconnects are terminated with a resistor. Because of the resistive termination, there is a current flow that the receiver detects to determine the transmitted logic value. It has been shown that for high data rates current sensing can be very speed and power efficient in comparison to voltage sensing [12]. there are also emerging on-chip interconnect paradigms such as carbon nanotubes [13, 14], optical [15] and RF communications [16]. These interconnects however have several issues that need to be resolved before they can be used in on-chip communication,

  3. Interconnect Delay

    Interconnect delay is a primary design criterion due to the close relationship to the speed of a circuit. Early interconnect design methodologies [19, 20] focused primarily on delay optimization. A typical data path in a synchronous digital circuit is shown in Fig. 3. In the case of zero clock skew, the minimum allowable clock period is [21]

    Tp_min = TC-Q + Tint+ Tlogic_max + Tsetup (1)

    where TC-Q is the time required for the data to leave the initial register after the clock signal arrives, Tint is the interconnect delay, Tlogic_max is the maximum logic gate delay, and Tsetup is the required setup time of the receiving register. From (1), by reducing Tint, the clock period can be decreased,

    increasing the overall clock frequency of the circuit (assuming the data path is a critical path).

    Figure .3. A data path in a synchronous digital system.

    In advanced microprocessors, multiple computational cores can be fabricated on the same die [5]. Communication among these cores and on- chip memories generally requires multiple clock cycles. Sometimes the computational core enters an idle state waiting for the required data or control signals from other regions of the IC. The computational resource of these cores, therefore, cannot be efficiently utilized due to the large amount of multi-cycle communication. By reducing the interconnect delay, the speed of the system, i.e., the computational efficiency of the cores, can be improved at the architecture level.

  4. Results

    Fig.6. Output voltage waveforms for 32nm Technology

    Figure.6. describes output signal degradation with increase in interconnect lengths of 1mm,2mm,3mm,4mm,5mm and 6mm at 32nm Technology

    PORT_SQR P=2

    Z=50 Ohm AMP=1.8 V TR=0.01 ns TF=0.01 ns

    TD=0 ns

    PORT_SQR P=1

    Z=50 Ohm AMP=1.8 V TR=0.01 ns TF=0.01 ns

    TD=0 ns

    WINDOW=DEFAULT

    Offset=0 V

    1

    SUBCKT ID=S1

    NET="EM Structure 1"

    W

    3

    PORT P=3

    Z=50 Ohm

    WINDOW=DEFAULDTCVal=0 V

    Offset=0 V DCVal=0 V

    W

    2

    4 PORT

    P=4

    Z=50 Ohm

    Fig.7. Output voltage waveforms for 45nm Technology

    Figure.7. describes output signal degradation with

    Fig.4.Experimental setup

    Figure.4.describes circuit simulation setup for the interconnects in AWR software.Figure.5.reprents the top model of interconnect.

    Fig.5. Top model of Interconnect lines

    increase in interconnect lengths of 1mm,2mm,3mm,4mm,5mm and 6mm at 45nm Technology

  5. Conclusion

    Crosstalk, caused by EM coupling between multiple transmission lines running parallel. It can cause noise pick up on the adjacent quiet signal lines that may lead to false logic switching. Crosstalk will also impact the timing on the active lines if multiple lines are switching simultaneously. Depending on the switching direction on each line the extra delay introduced may significantly increase/decrease the sampling window. The amount of crosstalk is related to the signal rise time, to the spacing between the lines, and to how long these multiple lines run parallel to each other.

  6. References

[1]. N. A. Kurd, S. Bhamidipati, C. Mozak, J. L. Miller,

T. M. Wilson, M. Nemani,and M. Chowdhury. Westmere: A family of 32nm IA processors. In Digest of IEEE Int. Solid-State Circuits Conference, pages 9697, 2010.

[2]. S. Natarajan et al. A 32nm logic technology featuring 2nd-generation highk+ metal-gate transistors, enhanced channel strain and 0.171m2 SRAM cell size in a 291Mb array. In Proc. IEEE Int. Electron Devices Meeting,pages 13, 2008.

[3]. International Technology Roadmap for Semiconductors. Interconnect,2005, 2007 and 2009 editions. Online, http://www.itrs.net.

[4].N. Magen, A. Kolodny, U. Weiser, and N. Shamir. Interconnect-power dissipation in a microprocessor. In Proc. Int. Workshop on System-Level Interconnect Prediction, pages 713, 2004.

[5].R. Ho, K. W. Mai, and M. A. Horowitz. The future of wires. Proceedings of the IEEE, 89(4):490504, Apr. 2001.

[6].S. Kumar, A. Jantsch, J.-P. Soininen, M. Forsell, M. Millberg, J. ¨Oberg,K. Tiensyrj¨a, and A. Hemani. A network on chip architecture and design methodology. In Proc. IEEE Computer Society Annual Symp. on VLSI,pages 105112, 2002.

[7].T. Bjerregaard and S. Mahadevan. A survey of research and practices of network-on-chip. ACM Computing Surveys, 38(1):151, Mar. 2006.

[8].S. Vangal et al. An 80-tile 1.28TFLOPS network-on- chip in 65nm CMOS. In Digest of IEEE Int. Solid-State Circuits Conf., pages 9899, 2007.

[9].C.-T. Hsieh and M. Pedram. Architectural energy optimization by bus splitting. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 21(4):408414, Apr. 2002.

[10].A. Narasimhan and R. Sridhar. Variability aware low-power delay optimal buffer insertion for global interconnects. IEEE Transactions on Circuits and Systems I, 57(12):30553063, Dec. 2010.

[11].W. J. Dally and J. W. Poulton. Digital Systems Engineering. Cambridge,UK: Cambridge University Press, 1998.

[12].A. Katoch, E. Seevinck, and H. Veendrick. Fast signal propagation for point to point on-chip long interconnects using current sensing. In Proc.28th European Solid-State Circuits Conference, pages 195 198, Sept. 2002.

[13].K. Banerjee, H. Li, and N. Srivastava. Current status and future perspectives of carbon nanotube interconnects. In Proc. IEEE Conf. on Nanotechnology, pages 432436, 2008.

[14].S. Pasricha, F. J. Kurdahi, and N. Dutt. Evaluating carbon nanotube global interconnects for chip multiprocessor applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18(9):1376 1380,Sept. 2010.

[15].M. Haurylau, G. Chen, H. Chen, J. Zhang, N. A. Nelson, D. H. Albonesi,E. G. Friedman, and P. M. Fauchet. On-chip optical interconnectroadmap: Challenges and critical directions. IEEE Journal of Selected Topics in Quantum Electronics, 12(6):1699 1705, Nov./Dec. 2006.

[16].M. F. Chang, J. Cong, A. Kaplan, M. Naik, G. Reinman, E. Socher,and S.-W. Tam. CMP network-on-

chip overlaid with multi-band RFinterconnect.In Proc. IEEE Int. Symp. on High Performance Computer Architecture, pages 191202, 2008.

[17].H. Veendrick, Deep Submicron CMOS ICs – From Basics to ASICs. Deventer,

Netherlands: Kluwer, 1998.

[18].Physical Synthesis. [Online].Available: http://direct.xilinx.com/bvdocs/whitepapers/wp140.pdf [19].H. B. Bakoglu and J. D. Meindl, \Optimal Interconnection Circuits for VLSI,"IEEE Transactions on Electron Devices, Vol. ED-32, No. 5, pp. 903-909, May 1985.

[20].L. P. P. P. van Ginneken, \Buffer Placement in Distributed RC-tree Network for Minimal Elmore Delay," Proceedings of the IEEE International Symposium of Circuits and Systems, pp. 865-868, May 1990.

[21]. E. G. Friedman, \Clock Distribution Networks in Synchronous Digital Integrated Circuits," Proceedings of the IEEE, Vol. 89, No. 5, pp. 665-692, May 2001.

P.V.Hunagund received his M.Sc and Ph.D from the Department of Applied Electronics, Gulbarga University,

Gulbarga, in the year 1982 and 1992 respectively. He is the Senior Professor of Applied Electronics Department, Gulbarga University, Gulbarga, INDIA.

He has more than 50 research publications in national and international reputed journals, more than 155 research publications in international

symposium/Conference and more 100 research publications in national symposium/Conference. He presented many papers in India & abroad. He has guided many Ph.D and M.Phil students. He has completed three major research projects funded by

A.I.C.T.E. and D.S.T. New Delhi. At present he is the Cordinator of the Non-SAP project funded by UGC, New Delhi.

A.B.Kalpana received B.E, Degree from, S.J.C.I.T, Bangalore University, In 1995, M.E, Degree from U.V.C.E, Bangalore University, Bangalore, India in 2001, pursuing Ph.D in the Department of Applied Electronics, Gulbarga University, Gulbarga, INDIA, currently she is working as Assistant Professor

in the Department of Electronics and communication, Bangalore Institute of Technology, Bangalore, INDIA, her research interests include Analysis and Design of VLSI circuits and Power Electronics.

Leave a Reply