# Power Analysis of Low Power Virtex 6 FPGA Based Communication Floswitch Design

Jagannadham V. V., Rajalakshmy Sivaramakrisnan

Flosolver unit, National Aerospace Laboratories, Bangalore, India

#### Abstract

Flosolver designed and developed Mk 8 parallel super computer with computing power of 10 TFLOPS. Mk8 used 1024 processor as processing elements (PE's). Communication device called FloSwitch used for Data transfer across the processing elements [1]. Communication speed, power utilization and flexibility in the interconnectivity have always scope of improvement. In this report Power section has been addressed to improve further by bringing down the total power in the FloSwitch design.

Theoretical analysis has been done to reduce FloSwitch power as a whole. Major change in the design is to replace external DPM with internal memory of FPGA (BLOCK RAM). Power analysis has been done on low power Virtex 6 FPGA and Virtex 5 FPGA using Xilinx power estimator (XPE) and power calculation for the entire board. FPGA's power utilization has been analyzed in detail and overall board power calculations have been done. Comparative analysis results give the considerable power reduction for the new design.

Key words: FloSwitch, XPE, DPM, Block Ram, optical links

## **1** Introduction

Data-Intensive Computing is an application which uses data in the parallel approach to process large volumes of data typically in terabytes, petabytes in size. High computational applications which spent most of their execution time to computational requirements will have small volumes of data. Whereas computing applications which require large volumes of data and spent most of their processing time to I/O and manipulation of data are known as high communication [2].

Earlier electronics designs were based more on the design concept & its feasibilities. Power requirement was as the part of the design. Current design trends are portable devices with high-performance and low power. Designing the low power boards addressing key issues to improve the performance is extremely challenging & demanding.

Communication protocol is a formal description of the digital message formats and the rules for exchanging those messages in or between computing systems. Protocols may include signaling, authentication and error detection and correction capabilities to reduce the size of digital designs. The industry trend over the last few years has been to move towards the use of high speed serial protocols for data transmission. A digital serial signal uses fewer pins to transmit high-speed data by increasing the clock rate at which the signals are sent [3].

Communication network is the most demanding area in the upcoming technology development. Every day the revolution is taking to its new height to meet the demand in the market.

With the rising integration levels, energy utilization has become one of the important design parameters. As a result, effort has to go in achieving lesser dissipation in all front of the design process. There is very small effort has gone in systems research on low power systems. Low power components and subsystems are important building blocks for portable systems, it is important to concentrate on dedicated low-power hardware and software architectures.

A system wide architecture is beneficial because there are dependencies between subsystems, e.g. optimization of one subsystem may have consequences for the energy consumption of other modules.

# 2 Existing FloSwitch Design

The Xilinx Virtex-5 FPGA based FloSwitch design is developed for integrated local and global communication. Parallel and serial interfaces have been used for local and global communications. The 1024 processor Mk8 super computer system integrated using the 128 FloSwitches to run computing intensive applications. Here the numbers, which force to think of, power consumption in the big systems. In the existing system of Flosolver MK-8, FloSwitch is designed using Virtex 5 FPGA (XC5VLX110T, FF1759), DPM (IDT70v658S) and optical links [4].

Flosolver Mk8 is mainly in the form of 128 clusters. Each Cluster is of 8-processor system consists of 4 dual processor server boards with PCI based add-on card which is connected to the FloSwitch through 64 bit parallel bus for intra-cluster communication. Such 128 clusters are linked via 16 Flo-opti-links (optical link) of FloSwitch for inter cluster communication. So the FloSwitch is the prime communication device across the Mk8 super computer.



Fig. 1 Existing FloSwitch design using Virtex 5 FPGA

# **3 Proposed FloSwitch Design**

In the proposed FloSwitch design low power Virtex 6 FPGA is used (XC6VLX550T FF1759) with associated low power components. External DPMs are replaced by the internal memory of Virtex 6 FPGA (BLOCK RAM) to reduce the access time and power. Board Dimensions cut down appreciably by Replacing DPM's and Samtec connectors from the FloSwitch design. Figure 2 shows the proposed FloSwitch design [4] [5].



Fig. 2 Proposed FloSwitch design using Virtex 6 FPGA

## 4 Key areas of power consumption

Power consumption is a part of design constraint, which can be understood in the simple numerical model. With a closer look at power dissipation, it becomes obvious that the subject is not that simple. Electric current is not constant during operation and peak power is an important concern. The device will fail due to electro-migration and voltage drops even if the average power consumption is low. Different factors of power consumption in a design can be shown as given below.

$$\mathbf{P}_{avg} = \mathbf{P}_d + \mathbf{P}_s + \mathbf{P}_l + \mathbf{P}_{st}$$

The key factors are dynamic, short-circuit, leakage and static power consumption. These factors of power consumption depend on the application and technology [6].

The main section of power utilization is CMOS is dynamic. The electric current  $i_d$  that flows during this process causes power dissipation  $P_D$ . The current is dependent on the capacitive output load  $C_{load}$  (charging and discharging) and the supply voltage V. A first order approximation of the dynamic power consumption of CMOS circuitry is given as

$$\mathbf{P}_d = K\mathbf{C}_{load} \, \mathbf{V}^2 f$$

K is the average number of rising transitions during one clock cycle and f the clock frequency. In a defined technology and timing constraints the logic zero and logic one should be in the range.

CMOS circuits which has both pull-up and pull-down network. When pull up & pull down are active for small time, current  $i_{sc}$  flows to ground. This is known as short-circuit current. Power (P<sub>s</sub>) for the same is given as

$$\mathbf{P}_s = \mathbf{K} \,\beta / 12 (\mathbf{V} \cdot \mathbf{V}_t)^3 f \tau$$

MOS transistor gain factor is  $\beta$ , threshold voltage  $V_t$  and  $\tau$  is the rise/fall time of the gate inputs.

Leakage power ( $P_l$ ) refers to the current flows during the reverse biased diodes that are between the diffusion regions and the substrate.  $I_{sub}$  is the currents flow through transistors is nonconducting. Static Power ( $P_{st}$ ) refers to the current flows from power to ground during idle time of the CMOS circuits [7].

## **5** Scope of the work

Existing and proposed communication System design analyzed. Critical issues of Components like switching time, access time, clock distribution and power management have been taken care. Theoretical analysis of power assessment has been done for the Virtex5 and Virtex6 using Xilinx power estimator (XPE). Power consumption of additional components calculated separately. Over all power utilization has been shown in tables and corresponding bar chart.

## **6** Power calculations

Attention is given towards the Power consumption and its related factors before taking up the design. FPGAs have increased in logic capacity & performance even migrating to smaller process geometries and low power consumption. Designers are looking for next generation systems to have more features & higher performance with less power and small geometry. In this work considerable power reduction has been achieved by using low power devices. Here the comparative study has been done with existing and proposed design with their components and its power consumption in the design. Details have been given below.

**FPGA power consumption:** resource utilization of existing and proposed Virtex FPGA and its overall power utilization in the design.

| Parameters            | Existing (Virtex 5) | Proposed (Virtex 6) |
|-----------------------|---------------------|---------------------|
| Supply Voltage VCCINT | 1V                  | 0.9V                |
| Total power consumed  | 12.217W             | 13.645W             |

| Table 6.1: FPGA |
|-----------------|
|-----------------|

## **Memory operation --** SDRAM

Table 6.2: SDRAM

| Parameters            | Existing (48LC8M16A2) | Proposed (W987D6HBGX6E) |
|-----------------------|-----------------------|-------------------------|
| Supply Voltage VCCINT | 3.3V                  | 1.8V                    |
| IO Supply Voltage     | 2.5V                  | 1.8V                    |
| Supply Current        | 160mA                 | 35mA                    |
| Frequency             | 166MHz                | 166MHz                  |
| Power Consumed        | 528mW                 | 63mW                    |

There is a power difference of almost 500mW from the existing RAM to the proposed and increased number of address lines helps in more efficient memory operations.

**Flash memory:** Byte Peripheral Interface (BPI) improves the transfer speed over Serial Peripheral Interface (SPI). SPI is serial interface and BPI is parallel interface.

|                       | 2                       |                            |
|-----------------------|-------------------------|----------------------------|
| Parameters            | Existing-SPI (25P28V6P) | Proposed-BPI (JS28F256P30) |
| Supply Voltage VCCINT | 3.3V                    | 1.8V                       |
| IO Supply Voltage     | 2.5V                    | 1.8V                       |
| Supply Current        | 15mA                    | 30mA                       |
| Power Consumed        | 49.5mW                  | 54mW                       |

Table 6.3: Flash Memory

| Table 6.4: Dual Port Memory |                      |  |
|-----------------------------|----------------------|--|
| Parameters                  | Existing(IDT70V658S) |  |
| Supply Voltage VCCINT       | 3.3V                 |  |
| IO Supply Voltage           | 3.3V                 |  |
| Supply Current              | 500mA                |  |
| Power Consumed              | 1650mW               |  |

#### Dual port memory (IDT70V658S):- Existing system device power

This is for a single DPM. For 8 DPMs, the power consumption is 13.2W. In the proposed system, external DPM is replaced by Block RAMs.

## **CPLD** for the existing design

| Table 6.5 CPLD        |                     |  |
|-----------------------|---------------------|--|
| Parameters            | Existing(XC95144XL) |  |
| Supply Voltage VCCINT | 3.3V                |  |
| IO Supply Voltage     | 3.3V                |  |
| Supply Current        | 45mA                |  |
| Power Consumed        | 148.5mW             |  |
|                       |                     |  |

**SFP Transceivers:-** For one Small Form-Factor Pluggable (SFP) transceiver the power consumption mentioned here. Such 16 SFP have been used for the design.

#### Table 6.6 SFP transceivers

| Parameters         | Existing-(FTLF-1324P2BTV) | Proposed-(FTLF-8524P2BNL) |
|--------------------|---------------------------|---------------------------|
| Voltage VCCINT     | 3.3V                      | 3.3V                      |
| Supply Current     | 300 mA                    | 240 mA                    |
| Data Rate          | 4.25 Gbps                 | 4.25 Gbps                 |
| Power Consumed     | 990mW                     | 792mW                     |
| for one SFP        |                           |                           |
| Total power for 16 | 990x16 = 15840mW          | 792x20 = 12672mW          |
| SFP's              |                           |                           |

**Clocking circuit --**Clock oscillators (50MHz clock)

| Table 6.7: Clock Oscillator (50 MHz) |                       |                       |  |
|--------------------------------------|-----------------------|-----------------------|--|
| Parameters                           | Existing (ECS3953MBN) | Proposed (ECS3518MBN) |  |
| Supply Voltage VCCINT                | 3.3V                  | 1.8V                  |  |
| Voltage for Oscillation              | 2.2V                  | 1.8V                  |  |
| Supply Current                       | 35mA                  | 25mA                  |  |
| Frequency Range                      | (1.8-125) MHz         | (1.8 – 125) MHz       |  |
| Power Consumed                       | 115.5mW               | 45mW                  |  |

| Table 6.8: Clock Oscillator (156.25 MHz) |                       |  |
|------------------------------------------|-----------------------|--|
| Parameters                               | Existing (ECS3953MBN) |  |
| Supply Voltage VCCINT                    | 2.5V                  |  |
| Supply Current                           | 30mA                  |  |
| Frequency Range                          | (53.125-700) MHz      |  |
| Power Consumed                           | 75mW                  |  |

## Clock oscillators (156.25 MHZ Clock)

For the existing Virtex 5 and proposed Virtex 6 based FloSwitch design, power analysis with their major components plotted in figure 1. Replacing on board DPM's and using resources of Virtex 6 Block rams as DPM's, consumes little extra power compare to Virtex 5. On board DPM power utilization completely zero for Virtex 6 based design. Other major components used in the design and their total power utilization also represented below in figure 1. Over all power utilization in the proposed design is significantly reduced.



Fig.1 Graph shows the comparative power consumption of Virtex 5 and Virtex 6 based FloSwitch design

# 7 Power analysis

Technology process with 40 nm for Virtex 6 FPGA, achieved dramatic power reductions over previous generation Virtex-5 devices (65 nm). Achieving such a significant reduction in power consumption gives boost for major technology development. At 40 and 45 nm technology, transistor leakage current increases exponentially so keeping static power low is a big challenge [8]. In Addition, the desire for high performance continues to drive core clock rates high, increases dynamic power. In spite of all these challenges theoretical analysis shows that power utilization of resource in Virtex-6 FPGAs is less compare to the Virtex 5 FPGA's.



Fig.2 (a), (b) shows the comparison of power consumption for typical and maximum voltages with specified temperature range of Virtex 5 and Virtex 6 FPGA

Virtex 6 logic & IO resource utilization is 30% more compare to the virtex 5 FPGA but power consumption in virtex 6 is less as shown in figure 2. Leakage current in virtex 6 is little extra than the virtex 5, because in virtex 6, transistors leakage current is dependent more on the junction temperature. So static power of the device will be more compare to the Virtex 5. Block Ram resource utilization in Virtex 6 is 4 times more compare to the Vitex 5. Virtex 6 Block Ram used in the proposed design to replace the on board DPM's used in the existing Virtex 5 based Floswitch. Block ram power utilization is 1.75W in Virtex 6 compare to the on board DPM power requirement of 13.2W [9] [10].



Fig.3 (c), (d) represents the graph of on chip power vs. Vccint for Virtex 5 and Virtex 6 FPGA

Overall 40% more logic used in the proposed Virtex 6 than Virtex 5 based design. On chip power utilization using core voltage (0.9V) of Virex6 based design is less than the existing design of Virtex 5 core voltage (1.0V). It has been shown in the figure 3.



Fig.4 (e) and (f) graph of on chip typical vs. maximum power for Virtex 5 and Virtex 6 FPGA with reference to junction temperature

As shown in figure 4 Virtex 5 and Virtex 6 power is stable with junction temperature for the typical voltage. Resource utilization in Virtex 6 is 40% more but the power consumption is almost same for lower junction temperature  $(50^{\circ}C)$  and as well for typical voltage [11] [12]. Virtex 5 the power difference for typical (1.0V) and maximum (1.05V) voltage is almost same for all the three temperature cases. In the case of Virtex 6 the power difference increases between typical (.9V) and maximum (.93V) voltage with the increase of junction temperature. Junction temperature increases the leakage current so static and leakage power increases significantly due to the reduced transistor gate length (40nm). In figure 5 on chip-power of Virtex 6 is less than Virtex 5 near junction temperature 45-50°C. The power consumption in Virtex 6 (gate length 40nm) increases significantly as shown in the graph compare to Virtex 5 (gate length 65nm) when junction temperature increases. Here the difference is due to the change in gate length of the transistor in both devices [13] [14].



Fig.5 (g), (h) represents the graph of on chip power vs. junction temperature for Virtex 5 and Virtex 6 FPGA

Total power calculated for the Virtex 5 FPGA with on board components and terminations it is around 43.85W. Virtex 6 FPGA based FloSwitch with inbuilt memory & low power components and replacing DPM and associated terminations total power consumption comes around 26.88W. It shows a quite reasonable amount of reduction in the power on one FloSwitch. The overall reduction of the power is about 16.97W.

Proposed design is integrated to the present big system of 1024 processors; the total power reduction will be around 2172W for 128 Floswitches. The minimum load on power supply unit replaces additional usage of heat sinks and fans and making it more efficient.

## 8 Conclusions

Analysis report shows that over all power utilization reduced by 35%. It's a significant achievement to bring out the design with a power reduction of 16.97W on one communication device (FloSwitch) Using low power components. Using in-built DPM core of Virtex 6 FPGA reduces the 60% power consumption compare to the on board DPM's used in the earlier design with Virtex5. In built core of FPGA for using it as DPM's of proposed design improves the communication speed. The overall board dimension reduced considerably.

## References

- [1] Flosolver Team, NAL. "Preliminary performance analysis of Flosolver Mk-8Flosolver" by PDFS 1017.
- [2] A.M. Middleton. "Data-Intensive Technologies for Cloud Computing". Handbook of Cloud Computing. Springer, 2010
- [3] "Communication protocol development for the FloSwitch" by Flosolver team NAL PDFS0609
- [4] Flosolver Team, NAL "A study of FPGA modules on FPGA based FloSwitch" NAL PDFS 1009
- [5] Jagannadham V V, Anand Raj D, Venkatesh and Rajalakshmy Sivaramakrishnan "Hardware Design Document for Pentium M Based FloSwitch". NAL PDFS 0509.
- [6] Frank Poppen "Low power design guide". OFFIS version 30.06.00, Dipl.-Inform., 2000.

- [7] Havinga, Paul J.M. and Smit, Gerard J.M. "Design techniques for low-power systems". Journal of Systems Architecture. 46(1). Pp1-21, 2000.
- [8] Arman Vassighi and Manjo Sachdev. "Thermal and Power Management of Integrated Circuits" by Springer, 2006
- [9] D. M. Brooks, P. W. Cook, P. Bose, S. E. Schuster, H. Jacobson, P. N. Kudva, A. Buyuktosunoglu, J. Wellman, V. Zyuban and M. Gupta."Power-aware microarchitecture: design and modeling challenges for next-generation microprocessors". IEEE Microelectronics, Vol. 20, No. 6, 2000.
- [10] E. J. Nowak. "Maintaining the benefits of CMOS scaling when scaling bogs down". IBM Journal of Research and Development, Vol. 48, No.2/3, pages 26-44, 2002.
- [11] A. Keshavarzi, K. Roy and C.F. Hawkins. "Intrinsic leakage in deep submicron CMOS ICs: measurement based test solutions". IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 8, No. 6, pages 7 17-723, 2000.
- [12] Y. Taur and T. H. Ning. "Fundamentals of modern VLSI Devices" Cambridge University Press, pages 120-1 28, 1998.
- [13] Y. Taur and T. H. Ning. "Fundamentals of modern VLSI Devices" Cambridge University Press, pages 94-95, 1998.
- [14] S. Tompson, P. Packan and M. Bohr. "MOS scaling: transistor challenges for 21<sup>st</sup> centuries". Intel Technology Journal, Q3, 1998.