

# Power Analysis of Embedded Low Latency Network on Chip

Hemasundari. H.

PG student

M.E.Applied Electronics, ECE Department  
Meenakshi College of Engineering  
Chennai, India

G. Akila

Assistant professor

ECE Department

Meenakshi College of Engineering  
Chennai, India

**Abstract**— A Network-on-chip (NOC) is a new paradigm in complex system-on-chip (SOC) designs that provide efficient on chip communication networks. The data is routed through the networks in terms of packets. The routing of data is mainly done by routers. So the architecture of router must be an efficient one with a lower latency and higher throughput. In this project we designed, implemented and analyzed crossbar router architectures for a network on chip communication in a FPGA. The routers have five ports, four ports connected to other ports in four different directions and the fifth port connected to the processing element through a network interface. Our Proposed architecture contains 4x4 crossbar switch, switch allocator, path and channel request, data ram and 4 i/o ports. The datas are sent through the routers inorder to prevent congestion. The switch allocator and VC allocator are used to allocate the datas in priority order. The switch allocator will allocate the datas according to the path and channel request. The XY algorithm with a scheduler is used in this project for proper destination of the datas.

**Keywords:** NOC, FPGA, switch allocator, VC allocator, ports.

## I. INTRODUCTION

Very large-scale integration (VLSI) is the process of integrating or embedding hundreds of thousands of transistors on a single silicon semiconductor microchip. This is the field which involves packing more and more logic devices into smaller and smaller areas. VHDL (VHSIC Hardware Description Language) is a hardware description language used in electronic design automation to describe digital and mixed signal systems such as FPGA and integrated circuits. VHDL can also be used as a general purpose parallel programming language.

The disadvantage of using VHDL are, the modules must be defined by a prototype, the use of the keyword “down to” in every bit vector definition is tedious, missing a single signal in the sensitivity list can cause catastrophic differences between simulation and synthesis , each process must have a sensitivity list that may sometimes be very long. Verilog, standardized as IEEE 1364, is a hardware description language (HDL) used to model electronic systems. It is most commonly used in the design and verification of digital circuits at the register-transfer level of abstraction. It is also used in the verification of analog circuits and mixed-signal circuits.

The advantages of using verilog coding are verification through simulation, it allow architectural trade of

bit short turn around, enable automatic synthesis, reduce time for design capture and it is easy to change.

Today's SoCs need a network on chip IP interconnect fabric to reduce wire routing congestion, to ease timing closure, for higher operating frequencies and to change IP easily. Network on chips are a critical technology that will enable the success of future system on chips for embedded applications. This technology of network on chip is expected to dominate computing platforms in the near future. The paper is organized as follows: Section II explains about the existing overview of the algorithms. Section III explains the proposed method. Section IV discusses about results. Finally, Section V provides the conclusion

## II. EXISTING OVERVIEW

The input ports buffer input flits and send requests to the allocators. The routing computation module determines the output port based on the routing algorithm. After the route computation, a free output VC (OVC) in the next router is assigned to the input VC (IVC) by sending request to the VC allocator. If an OVC is successfully assigned, then another allocation request will be sent to the switch allocator. The crossbar is then configured to send the desired flit to the output port if the switch allocation request is granted. In order to send requests to the switch allocator, the available space in the next router buffer must be known. In the existing system the routers are used by using the dynamic algorithms like XY algorithm.

The design tradeoffs for hard and soft FPGA-based networks-on-chip proposed by M. S. Abdelfattah and V. Betz, presents the design of NOC by using the router. In this paper there is a chance of congestion since it does not have the allocator. We remove control overheads (routing and arbitration logic) from the critical path in order to minimize cycle-time and latency.

The Design of On-the-fly Virtual Channel Allocation for Low Cost High Performance On-Chip Router proposed the on-the-fly virtual channel (VC) allocation for low cost high performance on-chip routers. By performing the VC allocation based on the result of switch allocation, the dependency between VC allocation and switch traversal is removed and these stages can be performed in parallel .

### III. PROPOSED METHOD

In the proposed system low latency router micro architecture with VC allocator and switch allocator is used. Any input flit that is passing through the switch can be successfully delivered at the output as the path request is sent through the VC allocator. The switch and VC allocator is designed in parallel. The scheduler is used with the XY algorithm in order to transfer the datas properly. To reduce the communication latency while maintaining good throughput, a router needs to perform several stages such as route computation, VC allocation, and switch allocation in parallel.

In the proposed NOC router architecture as shown in figure 1, any request which has been granted service by the switch allocator is able to pass a flit to the output port successfully. An efficient masking technique is proposed to filter all switch allocation requests that are not able to pass flits to the output port, either due to the lack of free space in assigned VC or due to the lack of free VC in the output port for non assigned VC requests. Our proposed technique has minimal impact in timing and area overhead of an NOC router. It is also fully parameterizable in terms of number of VCs, buffer width, and flit width.



Fig 1. Block Diagram

A crossbar switch (cross-point switch, matrix switch) is a collection of switches arranged in a matrix configuration. A crossbar switch has multiple input and output lines that form a crossed pattern of interconnecting lines between which a connection may be established by closing a switch located at each intersection, the elements of the matrix.

Virtual channel router (VCR) is a router which uses wormhole network flow control with virtual channels. This router architecture has 5 input and output ports. Four of them are connected to neighbor routers and one is for router's local core. Each input port has 4 virtual channels which are de-multiplexed and buffered in FIFOs. After FIFOs the virtual channels are multiplexed again to a single channel that goes to a crossbar. Routing operations in the crossbar are controlled by an arbitration unit (AU). Arbitration unit also takes care that there are no conflicts between virtual channels and that the arbitration is fair.

Each packet maintains state indicating the availability of buffer space at their assigned output VC. When flits are waiting to be sent, and buffer space is available, an input VC will request access to the necessary output channel via the router's crossbar. On each cycle the switch allocation logic matches these requests to output ports, generating the required crossbar control signals.

After masking the IVC requests, these requests are sent to the switch allocator. Due to having two levels of arbitrations in the switch allocator, arbiter delay is an important parameter in defining the NOC critical path. Hence, to minimize the arbitration delay, fast arbiter proposed. The VC allocation stage assigns an empty VC in the neighboring router connected to the output port. Since several header flits may send requests for the same VC, arbitration is required. The routing computation as well as the VC allocation only requires the header flit. The body and tail flits will follow their respective header flit.

If VC allocation is successful, the third stage sends request to the switch allocator to allocate the output port. Each packet maintains state indicating the availability of buffer space at their assigned output VC. When flits are waiting to be sent, and buffer space is available, an input VC will request access to the necessary output channel via the router's crossbar. The separable input-first allocators have the advantage of lower communication delay, area overhead, and power consumption compared to other schemes. Hence, the separable input-first allocator has been chosen to be implemented in our low latency NOC router. A separable input-first allocator consists of two levels of arbitrations.

Routing algorithm determines the output port which a packet must be sent to reach its destination. Deterministic routings act well when dealing with uniform traffic where congestion has been distributed equally across all links in an NOC. However, the nature of NOC traffic is bursts which results in imbalanced distribution of traffic across all links. Hence, deterministic routing results in poor performance for such traffic. As packets can be sent to multiple ports, a port selection module is required to select the desired output port among them. In the case of look-ahead deterministic routing algorithm, only single output port is selected and it can be directly used in our proposed design.

### IV. RESULTS

In this paper, the datas can easily reach the destination by using the routers. The routers help in guiding the datas to the required output ports. The switch has fours input and output ports. The inputs are given in four directions north,south,east and west. In the same way the outputs are obtained.



fig 2.Input Request

In this fig 2, the input channel is requested through the router and waiting for the acknowledgement from the output side. The datas are given in four directions.



Fig 3.Input Acknowledgement

The input channel acknowledgement is shown in Figure 3.



Fig 4. Output channel request .

The output channel request is shown in Figure 4.



Fig 5.output acknowledgement

The output channel acknowledgement is shown in figure 5.

TABLE I.

| Device Utilization Summary (estimated values) |      |           |             |
|-----------------------------------------------|------|-----------|-------------|
| Logic Utilization                             | Used | Available | Utilization |
| Number of Slices                              | 25   | 704       | 3%          |
| Number of Slice Flip Flops                    | 43   | 1408      | 3%          |
| Number of 4 input LUTs                        | 11   | 1408      | 0%          |
| Number of bonded IOBs                         | 111  | 108       | 102%        |
| Number of GCLKs                               | 1    | 24        | 4%          |

The number of slices, flip flops and I/O ports that are used is shown in table 1.

## V. CONCLUSION

In this work a Network-on-chip (NOC) is a new paradigm in complex system-on-chip (SOC) designs that provide efficient on chip communication networks was proposed. It allows scalable communication and allows decoupling of communication and computation. In this project we designed, implemented and analyzed crossbar router architectures for a network on chip communication in a FPGA. Our Proposed architecture is optimized in five main criteria, which are 4x4 crossbar switch, switch allocator, path and channel request, data ram and 4 I/O ports compared to existing works.

## REFERENCES

- [1] M. S. Abdelfattah and V. Betz, "Design tradeoffs for hard and soft FPGA-based networks-on-chip," in proc. Int. conf. field program.technol.(FPT), Dec. 2012,pp. 95-103.
- [2] E. S. Chung, J. C. Hoe, and K. Mai, "CoRAM: An in-fabric memory architecture for FPGA-based computing," in proc. Int. symp. Field program. Gate arrays(FPGA), 2011, pp 97-106
- [3] M. S. Abdelfattah and V. Betz, "The case for embedded networks on chip on field-programmable gate arrays," IEEE Micro, vol 34, no 1, pp 80-89,Jan/feb 2014.
- [4] B. Sethuraman, P. Bhattacharya, J. Khan, and R. Vemuri, "LiPaR: A light-weight parallel router for FPGA-based networks-on-chip," 2005 Pp 452-457.
- [5] M. K. Papamichael and J. C. Hoe, "CONNECT: Re-examining conventional wisdom for designing NoCs in the context of FPGAs," 2012,pp37-46.
- [6] Y. Huan and A. DeHon, "FPGA optimized packet-switched NoC using split and merge primitives," 2012,pp 47-52

- [7] 8. R. Francis and S. Moore, "Exploring hard and soft networks-on-chip for FPGAs", in proc. Int. Conf. Field prog. tech. Dec. 2008, pp261-264
- [8] K. Goossens, M. Bennebroek, J. Y. Hur, and M. A. Wahlah, "Hardwired networks on chip in FPGAs to unify functional and configuration interconnects," 2008, pp45-54.