# Performance Analysis of Eight Port Interconnect Architecture for SOC

Karthikeyan I Department of ECE Engineering PSNA CET Dindigul, India

Abstract-The increase in System on Chip (SoC) has introduced a popular communication infrastructure known as Network on Chip (NoC). Switches are backbone of NoC, which is responsible for the runtime establishment and management of communication between inter Processing Elements (PE). Thus, it is also called as a interconnect module. It has become mandatory for an efficient interconnect design, as switches directly affect the efficiency and performance of the overall NoC system. In this work, an efficient interconnect architecture is design which is capable of providing low channel setup latency and low overall area efficient structure in NoC. The interconnect design is based circuit switching mode for interconnecting the PEs providing guaranteed throughput across the data transmission. The switch design uses the divided distributed round robin arbitration for selecting the incoming request for switch and local ports fairly which is responsible for reducing the channel setup latency.

Keywords—System On Chip(SOC), Network on chip, switching modes, circuit switching.

# I. INTRODUCTION

The era of SoC, has raised the integration of various modules into a single chip. SoC has made room for various modules, providing a specific system function. As the modules increases the need for efficient communication between various modules has been increased [1]. Traditionally, the interconnection between the modules was based on dedicated wires and shared wires. The various drawbacks of dedicated wires and shared wires such as high manufacturing cost and high overall latency respectively has led to the concept of Networks on Chip (NoC) [2]. The NoC has provided a network of modules that are interconnected to each other through which the communications between modules are made possible.

The main backbones of the NoC are Switches that connect the neighbouring switches and one or more number of Processing Elements (PE) or Modules. The switches are responsible for routing of the data from source to destination where the data can be broken into packets or it may be of streamed pattern over the data communication channels.

Generally the NoC's performance and efficiency can be monitored through the NoC's topology used and the structural and transmission patterns of switch. This work has focused mainly at the structural and transmission pattern of switch which can provide a low latency and high performance to the NoC. Depending upon the switching Dr. Batri K Department of ECE Engineering PSNA CET Dindigul, India

mode the switch can be divided mainly into packet switching and circuit switching

In circuit switching, whole channel pathway from source node to destination node is previously established by the header packet which has the information of source address and destination address and reservation are made for the transmission of the whole packet until the payload is not sent unless the whole pathway channels are reserved. Once the data transfer is complete the reserved path channels is released for the other packets reservation and transfer. Various representatives like PNoc [5], Ethreal [4], SOCBus [10], Octagen [8], express switch switching [8] are based on CS. It provides a static data communication channels between source and destination nodes, thereby providing a guaranteed throughput. The main drawback of circuit switching is high channel setup latency where the channel establishment time is quite high.

Packet based switching in which all flits of the packet are sent along with the header which establishes the connection between routers. PS can configure the buffering strategies for storing the whole packet in each router before establishing the connection to the next router. The major representatives of packet switching are DBAR [10], SOCIN [11], Pipes [16] which provides various techniques of packet switching such as store and forward, wormhole and virtual cut through mechanism. The packet switching provides Best Effort (BE) communication service which provides only transmission guarantees and not timing guarantees. The main drawbacks of PS include packet buffering and high channel latency where the time taken for transmission between source node and destination node is high and not fixed. PS mainly suffers under the heavily loaded networks and lags to provide GS.

In this work, we have addressed a switch architecture that is based on the circuit switching providing a guaranteed throughput and reduced channel setup latency. The main limitation of the circuit switching can be rectified through using the round robin arbitration and the area requirements of the overall network is to be reduced through the clustering of PE's. Through clustering of corresponding PEs which interacts frequently, higher performance results can be observed.

# II. RELATED WORKS

To increase the architectural customization and reduce area overhead of NoC switch SCORES is introduced which is 1D circuit switched architecture [6]. The stream based data transfer is done in scores which supports effective transfer of large bit of data. In this paper, scalable and highly parametric streams-based communication architecture for inter-module communication for FPGA-based systems is implemented. SCOREs utilizes a streams-based approach to transmit data between computing modules through dynamically established non-shared streaming channels. These channels enable low latency and guaranteed throughput. The main advantage of SCORES is that SCORES had a low area architecture that reduced communication bottlenecks and high frequency of operation but the 1D architecture has raised its drawbacks of lower number of available paths.

In order to reduce the complexity of the switch wiring an interconnect architecture for Networking System on Chips has been implemented [8]. It provides an alternative to the crossbar interconnection providing low complexity. It provides a pattern of connection between the eight module ports such that each and every connection is possible with a maximum of 2 hops. The eight nodes at the switch are connected by 12 bidirectional links. By providing low complexity the work has minimized the wiring cost but has affected with the drawback of providing 2 hop count connection which in turn increases the overall latency.

To reduce the channel setup latency and to reduce total number of switches to be used, MACS provides a reliable work [3]. MACS is based on circuit switching where it uses the technique of PE clustering that is connection of two PE to a single switch thereby enabling fast circuit establishment and provides data transfer between source and destination pair directly between these two PEs without traversing network topology. MACS use a distributed Round-Robin arbitration which enables reduced channel setup latency through selection of request in a round robin fashion. To increase the design flexibility and system customization MACS is provided with numerous tunable architectural parameters but the design can be further enhanced providing additional reduced latency and reduce the number of switches.

In our work, OSCS provides an efficient switch architecture for the NOC which can decrease the number of overall switch that is to be used through clustering of four PE's to a single switch which may provide fast channel establishment between the local PE's without affecting the network topology. These clustering also reduces the total number of switches used subsequently reducing the area requirement of overall network. The overall channel setup latency can be reduced uses the arbitration which is divided into two sectors, one for selecting the switch connection and other is selection of local PE connection. Thus the overall contention of the switch can be highly reduced. In our work, Interconnect switch is initially based on the circuit switching such that providing a guaranteed throughput. The overall switch architecture is implemented in Xilinx and corresponding results are observed.



Fig 1: 3\*3 switch implemented in mesh topology.

The 3\*3 mesh topology NoC has been described in *Fig.1* which show that each switch is connected to four neighbouring PEs. Interconnect switch is an eight port switch which has a total of eight ports among which four ports are connected to the neighbouring switches (up, down, left, right) and remaining four ports connecting the local PEs. Each of these ports has two control logic for the routing in and out the data signal from the switch. Switch is mainly based on the circuit switching that can provide a guaranteed throughput with the distributed arbitration.

# A. Switch Operation

As circuit switching is based on the pre-establishing of the channel from the transmission of data, the operation generally requires three states which include channel establishment, waiting and releasing. Channel establishment is the processes of connecting the input lane to an output lane thereby allocating the corresponding channel resources for routing the particular request from source PE such that a dedicated communication channel can be formed. Once the channel is established the switch has to wait till the transaction is completed until then the path is locked for the particular producer-customer pair. In case of the transaction completion the corresponding path should be released such that the path can be utilized for next request.

Each port has two block of control logic such as input block controller (IBC) and output block controller (OBC) which is responsible for controlling all communication operations such as serving for request, channel establishment and channel release.



Fig. 2: Eight Port Switch Architecture

## B. Switch Architecture

The main component in switch includes the port which has two control logic blocks Input Block Controller (IBC) and Output Block Controller (OBC) as described in *Fig.2*. Both the blocks of controllers has both control signals (request, grant, deny) and data signals (data channel). The control signal provide the channel establishment while data channel provide the inter PE communication bandwidth.

The IBC is responsible for forwarding the request signal to the all OBC. The IBC receives the requests from corresponding PE or Switches and route accordingly, to the corresponding destination ports as addressed by the request signal. All IBC has internal connection to the OBC of all the ports.

The OBC with is implanted with the round robin arbitration which selects single request signals from a set of request from the other IBC ports in a round robin fashion. The round robin arbitration is the key factor of employing reduced channel setup latency. The arbitration pattern is divided into two sectors among which one is for the selection of the request from the switch elements and other sector which is used for selection of request from the local PE elements.

This division is made by duplicating the arbitration such that the OBC of a local port has a set of arbitration pattern for 3 local port signals and other arbitration sector includes the 4 switching ports signals. Similarly, for the OBC of a switch port has separate arbitration for the 4 local port's request signal and separate arbitration for 3 switch port's request signal. Once the request is selected the corresponding data channel is locked by the request register and wait for the completion of data transaction. Once the transaction is completed the channel is released for the service of next set of request.



#### Fig. 3: OBC Structure.

C. Round Robin

Each OBC receives the request signals from all the IBC including request signals from the switch ports and local ports expect the request signal of the same port. Thus, for a reduced channel setup latency an efficient way of selecting and maintaining the incoming request requires efficient arbitration which can be satisfied by the distributed round robin. The round robin arbitration used has the two components such as MUX and COUNTER. The OBC implements a per output channel distributed round robin arbiter to choose a single request from multiple requests for connecting to particular output channel. To select an incoming request, the counter's output is connected to the MUX's selection lines and the active low ENABLE signal of COUNTER is attached to the MUX's output.

## IV.EXPERIMENTAL RESULT

The experiment is based on the eight port architecture of the switch. In which, circuit switching mode is used to transmit the data by using VHDL codes. Request is given as the input to the architecture initially to select the path of the circuit using round robin technique.

The proposed architecture decreases the connection establishment time through enhancing the controller of switch thereby reducing the channel setup latency. The data transmission is shown in the *Fig. 4*. The work is synthesised and simulated in Xilinx software.



Fig.4. Data transmission and its output

## V.PERFORMANCE ESTIMATED

Thus the synthesis report based on the design gives the utilization of area and latency. The number of switch usage also gets reduced by using the octagon structure as the architecture. So the overall area utilization gets reduced.

TABLE I. COMPARISION TABLE BETWEEN EXISTING AND PROPOSED WORK

| Parameters                             | OCTAGON<br>[8] | 8 PORT<br>INTERCONNECT |
|----------------------------------------|----------------|------------------------|
| Maximum<br>distance (Hops<br>required) | 2              | 1                      |
| Number of slice utilized               | ~130           | 96                     |

For an 8 bit data transmission the latency for this design is 1.190ns. Compared to MACS [20] the frequency is 148.1MHz which has been increased, in the proposed structure to 187.98MHz. The area requirement for 8bit single is about 125 slices which is reduced to 89 slices in the proposed system.

## VI. CONCLUSION:

In this paper, the eight port circuit switching based switch has been designed and analyzed. The switch is capable of reducing the overall channel setup latency and area efficiency. The clustering of frequently communicating PE's under same switch could provide high performance. The division of switch and local port arbitration through distributed round robin provides low channel setup latency. Thus the clustering helps to reduce the number of switches used in a NoC subsequently reducing the overall area requirements.

## REFERENCES

- Ali, M., Welzl, M. &Zwicknagl, M, "Networks on Chips: Scalable Interconnects for Future System on Chips" 2008.
- [2] Henkel, J., Wofl, W., & Chakradhar, S., "On-chip networks: A Scalable, Communication-centric Embedded System Design Paradigm" 2004.
- [3] N. Chin-Ee and N. Soin, "Qualitative and quantitative evaluation of a proposed circuit switched network-on-chip," Jun 2010.
- [4] K. Goossens, J. Dielissen, and A. Radulescu, "Æthereal network on chip: Concepts, architectures, and implementations," May 2005.
- [5] C. Hilton and B. Nelson, "PNoC: A flexible circuit-switched noC for FPGA-based systems," 2006.
- [6] Hu W.H., Lee S.E. and Bagherzadeh N. (2008), 'DMesh: A Diagonally- Linked Mesh Network-on-Chip Architecture', Int. Workshop NoC Archit. MICRO-41.
- [7] A. Jara-Berrocal and A. Gordon-Ross, "SCORES: A scalable and parametric streams-based communication architecture for modular reconfigurable systems," 2009
- [8] F. Karim, A. Nguyen, and S. Dey, "An interconnect architecture for networking systems on chips," Sep 2002.
- [9] J. Lin and X. Lin, "Express circuit switching: Improving the performance of bufferless networks-on-chip," Nov 2010.
- [10] A. K. Lusala and J.-D.Legat, "Asdm-tdm based circuit-switched router for on-chip networks," Jun 2011.
- [11] S. Ma, N. E. Jerger, and Z. Wang, "DBAR: An efficient routing algorithm to support multiple concurrent applications in networks- onchip,"2011.
- [12] Priyanka N. Chopkar and Mahendra A. Gaikwad. (2013), 'Review of XY Routing Algorithm for 2D Torus Topology of NoC Architecture', International Journal of Computer Application (0975-8887).
- [13] Stafford E., Bosque J.L., Martinez C., Vallenjo F., Beivide R. and Camarer C. (2010), 'A first approach to king topologies for on-chip networks', in Euro-Par: Parallel Processing. Berlin, Heidelberg: Springer-Verlag, pp. 428-439.
- [14] D. Wiklund and D. Liu, "Design of a system-on-chip switched network and its design support,"2002.
- [15] D. Wiklund and D. Liu," SoCBUS: switched network on chip for hard real time"2003.
- [16] C. A. Zeferino and A. A. Susin," SoCIN: A parametric and scalable network-on-chip" 2003.
- [17] L. R. Zheng and H. Tenhunen, "A circuit-switched network architecture for network-on-chip,"2004.
- [18] KunjJain ,Sandeep K Singh , AlakMajumder , Abir J Mondal ,"Problems Encountered in Various Arbitration Techniques Used in NOC Router: A Survey"2015.
- [19] L. Benini and D. Bertozzi, "Xpipes: A network-on-chip architecture for gigascale systems-on-chip," Sep. 2005
- [20] Rohit Kumar and Ann Gordon Ross, "MACS: A highly customizable low latency communication architecture,"jan.2015.