Design of Efficient Virtual Channel Router for Network-On-Chip

Omprakash Ghorse; Nitin Meena; Shweta Singh

doi:10.17577/IJERTV2IS120746

Volume 02, Issue 12 (December 2013)

Design of Efficient Virtual Channel Router for Network-On-Chip

DOI : 10.17577/IJERTV2IS120746

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 92
Total Downloads : 402
Authors : Omprakash Ghorse, Nitin Meena, Shweta Singh
Paper ID : IJERTV2IS120746
Volume & Issue : Volume 02, Issue 12 (December 2013)
Published (First Online): 20-12-2013
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Design of Efficient Virtual Channel Router for Network-On-Chip

Omprakash Ghorse

M. Tech Scholal, Electronics and Communication

Department, IES College of Technology, Bhopal

Nitin Meena

Asst. Prof. Electronics and Communication Department, IES College of Technology, Bhopal

Shweta Singh

Associate Prof. Electronics and Communication Department, IES College of Technology, Bhopal

Abstract

As the technologys increased capability, the SoC (System-On -Chip) is comprised of more and more heterogeneous IP (intellectual property) cores such as processors, DSPs, memory blocks, etc… The requirement of high-performance, flexible, scalable interconnections in such SoC are becoming a major design challenge. A new chip design paradigm called NoC( Network-on-Chip) offers a promising architectural choice for future systems on chips. Network-On-Chip overcomes main constraints of traditional bus- based system on chip by using point to point link connection and packet switching. The design and characteristics of the router directly impacts on the total NoC performance. In a virtual channel router, a head flit has to insure first it has reserved an output virtual channel for the packet before it can request for its own passage through the crossbar and leave for the next hop. There hence exists a dependency between virtual channel allocation and switch allocation. Efficient virtual channel router removes this dependency by using speculation, performing these two operations in parallel. In this assumes a flit will succeed in its virtual channel allocation, and proceeds to request for crossbar switch passage in parallel. Due to this parallelism device can operate at higher frequency. The performance of NoC is increases. In this paper we had worked on network parameter and compare the performance analysis of efficient virtual channel router with that of other existing routers for network on chip.

Keywords– network on chip, virtual channel, speculation.

Introduction

According to the Moores Law, the capacity and complexity of a chip has been boosted significantly in recent decades. The function of a board level system in the last decade can be integrated into a single chip by using system-on-chip (SoC) designs. With the advent of multiprocessors, there arises the need for communication among the processes running in parallel on the multiple processors. The performance of interconnection networks is critical, as they significantly impact the performance of the overall system. As a replacement for traditional hierarchical bus systems and a point to point connections, the on- chip network infrastructure called Network On Chip (NoC) provides a unified interface for new IP(Intellectual Property) blocks to be easily plugged into a system. A modern MPSoC (Multi Processor SoC) is a communication-centric system lying on an on-chip network communication fabric. The NoC has compromise the three main component are links which are communication medium between router to router and router to IP block, NI (Network Interface) it is responsible for packetization and de packetization of data traffic, and routers the data communication using packets and the path of the traversing from source to destination is determined by the routers according to the routing algorithms. The router used communication between PE by using packets, and packets are also divided in to the flit (flow control digit). When flits are arrive at a router there are Multiple complex operation are performs such as routing computation for finding the destination rout, virtual channel allocation for determining the output channel, switch allocation for allocating the time slot in to the crossbar switch, and switch traversal for transferring the flit through the crossbar. These operation causes increases the communication latency.

The outline of this paper is as follows after the introduction part we discuss about the virtual channel router in section 2 and then we discuss about the proposed router architecture in section 3, after that we observe the simulation result and conclude on that in section 4 & 5 respectively.
Background
1. Virtual Channel
  
  When a physical channel is divided into multiple numbers of logical channels these logical channels are called virtual channel. The goal of virtual channel is reducing congestion when different flows compete for the same path in to the network.
2. Channel Multiplexing
  
  Physical channel can be multiplexed allowing the use of a same channel by different flows in the same direction to improve the performance of network on chip. Time division multiplexing (TDM) is the sharing of physical channel in time, dividing these in to logical channels. Time is usually partitioned into equally sized period called time-slots. During a time slot, the available bandwidth is exclusively dedicated to a given logical channel. A given packet may need several time slots to be transmitted through a logical channel, and these time slots may be interspaced with time slots used by another packets flowing in another logical channel.TDM reduces overall network congestion separate buffer are required for each virtual channel and time slot is require to store virtual channel allocation.
  
  First there are two terms which are define the system performance are: network throughput and latency.
  
  Network Throughput: It is define at the rate at which the network can successfully accept and deliver the injected packets.
  
  Latency: Delay experienced by message as they traverse from source to destination from the instant when the first bit is injected to the network at the source till when the last bit of the message is received at the destination.
  
  To well understand the operation of efficient virtual channel router by using speculation first we should explain the virtual channel router architecture and its operation.
  
  The virtual channel router as shown in figure 1 consists of five input ports and five output ports, connected together by using the intermediate crossbar
  
  switch. The topology used is mesh, in each input and output port is connected with a specific direction: East (E), West (W), North (N), South (S), and Local (L). The local input and output ports are connected to the network interface which is connected to the processing element (PE).
  
  Fig.1 architecture of virtual channel router
  
  In a typical virtual-channel flow-control based router, the flits are travels through a four-stage pipeline: RC (Routing Computation), VA (virtual-channel allocation), SA (switch allocator), and ST (Switch Traversal). When a head flit gets to the top of its virtual-channel buffer queue and go in to the RC stage, it is decoded by the RC module and produces a specified direction request. The direction request of this flit is then send to the VA module to get a free virtual- channel at the downstream router. There might be some contentions in between some packets that are request for the same virtual-channel at the downstream router. Those packets are saved into the input buffer. Note that the processes of RC and VA actually take place only by the head flit. The remaining part of the flit i.e. body flits and tail flit of a packet simply following the rout acquired by the head flit and require no further computation at the RC and VA stages. Once the output virtual-channel selection is decided at the VA stage, at that time SA module will assign physical channels to intra-router flits. After assigning physical channel to flit, it will traverse through the crossbar switch to the input buffer of dow-stream router during the ST stage and the same procedure repeats until the packet reaches at its destination.
  
  Crossbar
  
  Crossbar
  
  In this architecture there is the dependency of each stage to their next stage.
  
  Routing Computation
  
  VC
  
  Allocation
  
  Switch Allocation
  
  Routing Computation
  
  VC
  
  Allocation
  
  Switch Allocation
  
  Fig. 2 Pipeline stage of virtual channel router architecture
Proposed Router Design
1. Speculation
  
  The speculation is define, when Virtual-channel and switch allocation are performed in parallel. Packets that are awaiting VC allocation are permitted to make speculative requests for switch allocation. This enables flits to be received and forwarded on desired output in a single cycle. This removes the dependency of virtual channel allocation and switch allocation, so that the pipeline stage goes to shorten and hence increase the network performance.
  
  Fig.4 pipeline stage of speculative virtual channel router architecture
  
  VC
  
  VC
  
  Routing
  
  Routing
  
  Crossbar
  
  Crossbar
  
  Switch
  
  Switch
2. Crossbar switch
  
  Arbiter 2
  
  Arbiter 2
  
  The crossbar switch is used to connect input paths to the output paths, enabling the routing of flits through the network. In this crossbar we used 5×5 crossbar connection. It has five input and five output ports. An arbiter is an important part of the crossbar switch, which selects one of the input depending upon the control logic which we apply. At the input of the crossbar we examine three quantities data, destination address, and request. The data is the information which we send outside, destination is the address of the output port that the data is forwarded, when 16 bit message is appear at the input of crossbar the least three bits of this message we used it to define the output port through
  
  Input port1
  
  Input port1
  
  Data1
  
  Input port2
  
  Input port2
  
  Data2
  
  Data3
  
  Input port4
  
  Input port4
  
  Data4
  
  Input port5
  
  Input port5
  
  Data5
  
  Data1
  
  Data2
  
  Data3
  
  Data4
  
  Data5
  
  Arbiter 1
  
  Arbiter 1
  
  Dest1
  
  Dest2
  
  Dest3
  
  Dest4
  
  Dest5
  
  Input port3
  
  Input port3
  
  Crossbar (5X5)
  
  Arbiter 5
  
  Arbiter 5
  
  Req1 Req2 Req3 Req4 Req5
  
  Dout1
  
  Dout2
  
  Dout3
  
  Dout4
  
  Dout5
  
  which the message will be forwarded. If the last three bits of message are 001 it define that the message is routed to the first port of the router which. . If the last three bits of message are 010 it define that the message is routed through the second port of the router and so on The last one is request when it is high it means that the data is routed forward at that input port. Here we used the arbiter which has the fixed priority.
3. Fixed priority arbiter
  
  In our arbiter scheme we used a fixed priority arbiter. Each input port has its own fixed priority level and an arbiter grants an active request signal with the highest priority depending on this priority level. For instance
  
  (1) has the highest priority among N requests, and request (1) is active it will be granted regardless other request signals. If request (1) is not active, the request signal with the next higher priority will be granted. In other words, the current request (lower priority) only will be served if the previous request (higher priority) has not appear or been served already. We have design fixed priority arbiter using the finite state machine. The grant signal will be activated upon the below condition.
  
  Fig.3 architecture of efficient virtual channel router using speculation
  
  Grant1 = 1 When State = G1; Grant2 = 1 When State = G2; Grant3 = 1 When State = G3; Grant4 = 1 When State = G4; Grant5 = 1 When State = G5.
  
  Idle
  
  same is shown below. As we see that the highest priority is given to the req1 and the lowest is to req5.
  
  State=G1 & Req1=1
  
  State=G2 & Req2=1
  
  Idle & Req
  
  G1
  
  Req5=1 Req4=0 Req3=0 Req2=0 Req1=0
  
  State=G5 & Req5=1
  
  G2
  
  Req2=1 &
  
  G5
  
  Fig.7 simulation for fixed priority arbiter
  
  4.3. Simulation result for proposed router
  
  We have presented the proposed router design with the help of Xilinx ISE- 9.1 design suit for device xc3s200-5ft256 and the simulation result for the same is shown below. As we see that the output port is define by the three least significant bits of the message.
  
  Fig.5 FSM for fixed priority arbiter
Simulation result and discussion
1. Simulation for input port
  
  We have presented input port design with the help of Xilinx ISE- 9.1 design suit for device xc3s200-5ft256 and the simulation result for the same is shown below. From the simulation result we can see that the every data required three clock cycles to travel data across the input port. This shows the latency of the network.
  
  Fig.6 simulation for input port
2. Simulation for fixed priority arbiter
  
  We have presented the fixed priority arbiter design with the help of Xilinx ISE- 9.1 design suit for device xc3s200-5ft256 and the simulation result for the
  
  Fig.8 simulation for router

Conclusion

This paper proposed method to improve the performance of NoC routers. This is approach to significantly reducing the clock cycle of on-chip routers. Simulation results shown that the critical path is reduced significantly without compromising router efficiency by performing these two operations (VC allocation and SA) in parallel. Flip-flop is used in this router are 1074 which are large in numbers as compare to the other routers architecture, but the frequency is maximum so that the network latency is reduced, and performance is increases.

Table 1: Device utilization Summary

Device Utilization Summary
Slice Logic Utilization	Used	Available	Utilization
Number of slice registers	1,074	44,800	2%
Number used as flip- flop	1,074	–	–
Number of occupied slices	404	11,200	3%
Number of flip- flop pairs used	1,362	–	–

Future Work: Here we saw that the flip- flops are used so much so that area is more utilize. Our future plan for this is to find the best solution for buffer architecture, so that we reduce the number of flip-flops and also improvement in the crossbar switch for fast arbitration.

References

Mostafa S. Sayed, A. Shalaby, M. El-Sayed Ragab, Victor Goulart, Congestion Mitigation Using Flexible Router Architecture for Network-on-Chip2012 IEEE.
L.Rooban, S.DhananjeyanDesign of Router Architecture Based on Wormhole Switching Mode for NoC International Journal of Scientific &Engineering Research Volume 3, Issue 3, March-2012.
Anh T. Tran and Bevan M. Baas NoCTweak: a Highly Parameterizable Simulator for Early Exploration of Performance and Energy of Networks On-Chip Technical Report, VLSI Computation Lab, ECE Department, and UC Davis July 2012.
U. Saravanakumar, R. Rangarajan and K. Rajasekar Hardware Implementation of Pipeline Based Router Design for On-Chip Network intact journal on communication technology, December 2012, volume: 03, issue: 04.
Ye Lu, John McCanny, Sakir Sezer Generic Low Latency Noc Router Architecture for FPGA Computing Systems ,,Journal of IEEE , Page no. 82 89, 978-1-4577- 1484-9 , 2011 21st International Conference on Field Programmable Logic and Appication IEEEs.
Son Truong Nguyen Shigeru Oyanagi The Design of On-the-fly Virtual Channel Allocation for Low Cost High Performance On-Chip Routers2010 IEEE.
Daniel U. Becker, William J. Dally Allocator Implementations for Network-on-Chip Routers2009 IEEE.
Ankur Agarwal, Florida Atlantic University, Boca Raton, Survey Of Network On Chip (Noc) Architectures &

Contributions journal of engineering and computer architecture issn 1934-7197 volume 3, issue 1, 2009.
Daniel U. Becker, William J. Dally Allocator Implementations for Network-on-Chip Routers 2009 ACM/IEEE Conference on High Performance Computing, Networking.
Ebrahim Behrouzian-Nezhad and Ahmad Khademzadeh BIOS: A New Efficient Routing Algorithm for Network on Chip Contemporary Engineering Sciences, Vol. 2, 2009, no. 1, 37 46.
Arnab Banerjee, Robert Mullins and Simon Moore A Power and Energy Exploration of Network-on-Chip Architectures IEEE 2007.
Nicopoulos et al., ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers, in Proc. of the 39th Intl Symp. On Microarchitecture, 2006.
R. Mullins, A. West, and S. Moore, The Design and Implementation of a Low-Latency On-Chip Network, in Proc. Asia & South Pacific Design Automation Conf., 2006, pp. 164169.
Ville Rantala Teijo Lehtonen Juha Plosila Network on Chip RoutingAlgorithms TUCS Technical Report No 779, August 2006.
Robert Mullins, Andrew West and Simon Moore Low-Latency Virtual-Channel Routers for On-Chip Networks Computer Laboratory, University of Cambridge.
Ioannis Nousias, Tughrul Arslan Wormhole Routing with Virtual Channels using Adaptive Rate Control for Network-on-Chip (NoC)2006 IEEE.
Aline Mello, Leonel Tedesco, Ney Calazans, Fernando Moraes Virtual Channels in Networks on Chip: implementation and evaluation on hermes noc sbcci'05, september 4-7, 2005.
Jongman Kim Dongkook Park T. Theocharides N. Vijaykrishnan Chita R. Das A Low Latency Router Supporting Adaptivity for On-Chip Interconnects June 1317, 2005, Anaheim, California, USA.

Design of Efficient Virtual Channel Router for Network-On-Chip

Network Throughput: It is define at the rate at which the network can successfully accept and deliver the injected packets.

Latency: Delay experienced by message as they traverse from source to destination from the instant when the first bit is injected to the network at the source till when the last bit of the message is received at the destination.

Future Work: Here we saw that the flip- flops are used so much so that area is more utilize. Our future plan for this is to find the best solution for buffer architecture, so that we reduce the number of flip-flops and also improvement in the crossbar switch for fast arbitration.

Leave a Reply