Power Analysis of Low Power Virtex 6 FPGA based Communication FloSwitch Design

DOI : 10.17577/IJERTV2IS110002

Download Full-Text PDF Cite this Publication

Text Only Version

Power Analysis of Low Power Virtex 6 FPGA based Communication FloSwitch Design

Jagannadham V. V., Rajalakshmy Sivaramakrisnan

Flosolver unit, National Aerospace Laboratories, Bangalore, India

Abstract

Flosolver designed and developed Mk 8 parallel super computer with computing power of 10 TFLOPS. Mk8 used 1024 processor as processing elements (PEs). Communication device called FloSwitch used for Data transfer across the processing elements [1]. Communication speed, power utilization and flexibility in the interconnectivity have always scope of improvement. In this report Power section has been addressed to improve further by bringing down the total power in the FloSwitch design.

Theoretical analysis has been done to reduce FloSwitch power as a whole. Major change in the design is to replace external DPM with internal memory of FPGA (BLOCK RAM). Power analysis has been done on low power Virtex 6 FPGA and Virtex 5 FPGA using Xilinx power estimator (XPE) and power calculation for the entire board. FPGAs power utilization has been analyzed in detail and overall board power calculations have been done. Comparative analysis results give the considerable power reduction for the new design.

Key words: FloSwitch, XPE, DPM, Block Ram, optical links

  1. Introduction

    Data-Intensive Computing is an application which uses data in the parallel approach to process large volumes of data typically in terabytes, petabytes in size. High computational applications which spent most of their execution time to computational requirements will have small volumes of data. Whereas computing applications which require large volumes of data and spent most of their processing time to I/O and manipulation of data are known as high communication [2].

    Earlier electronics designs were based more on the design concept & its feasibilities. Power requirement was as the part of the design. Current design trends are portable devices with high- performance and low power. Designing the low power boards addressing key issues to improve the performance is extremely challenging & demanding.

    Communication protocol is a formal description of the digital message formats and the rules for exchanging those messages in or between computing systems. Protocols may include signaling, authentication and error detection and correction capabilities to reduce the size of digital designs. The industry trend over the last few years has been to move towards the use of high speed serial protocols for data transmission. A digital serial signal uses fewer pins to transmit high-speed data by increasing the clock rate at which the signals are sent [3].

    Communication network is the most demanding area in the upcoming technology development. Every day the revolution is taking to its new height to meet the demand in the market.

    With the rising integration levels, energy utilization has become one of the important design parameters. As a result, effort has to go in achieving lesser dissipation in all front of the design process. There is very small effort has gone in systems research on low power systems. Low power components and subsystems are important building blocks for portable systems, it is important to concentrate on dedicated low-power hardware and software architectures.

    A system wide architecture is beneficial because there are dependencies between subsystems,

    e.g. optimization of one subsystem may have consequences for the energy consumption of other modules.

  2. Existing FloSwitch Design

    The Xilinx Virtex-5 FPGA based FloSwitch design is developed for integrated local and global communication. Parallel and serial interfaces have been used for local and global communications. The 1024 processor Mk8 super computer system integrated using the 128 FloSwitches to run computing intensive applications. Here the numbers, which force to think of, power consumption in the big systems. In the existing system of Flosolver MK-8, FloSwitch is designed using Virtex 5 FPGA (XC5VLX110T, FF1759), DPM (IDT70v658S) and optical links [4].

    Flosolver Mk8 is mainly in the form of 128 clusters. Each Cluster is of 8-processor system consists of 4 dual processor server boards with PCI based add-on card which is connected to the FloSwitch through 64 bit parallel bus for intra-cluster communication. Such 128 clusters are linked via 16 Flo-opti-links (optical link) of FloSwitch for inter cluster communication. So the FloSwitch is the prime communication device across the Mk8 super computer.

    4 Parallel Connector (64 Bit) for intra cluster Communication

    4 Parallel Connector (64 Bit) for intra cluster Communication

    Dual Port Memory (DPM) Section

    Dual Port Memory (DPM) Section

    Power Regulators with different voltage levels

    Power Regulators with different voltage levels

    System Clock

    JTAG

    Virtex 5 FPGA

    CPLD

    CPLD

    Sprom SDRAM

    SPI

    SPI

    16 Serial Links (optical) for inter cluster communications

    16 Serial Links (optical) for inter cluster communications

    Test points & LED

    Test points & LED

    Fig. 1 Existing FloSwitch design using Virtex 5 FPGA

  3. Proposed FloSwitch Design

    In the proposed FloSwitch design low power Virtex 6 FPGA is used (XC6VLX550T FF1759) with associated low power components. External DPMs are replaced by the internal memory of Virtex 6 FPGA (BLOCK RAM) to reduce the access time and power. Board Dimensions cut down appreciably by Replacing DPMs and Samtec connectors from the FloSwitch design. Figure 2 shows the proposed FloSwitch design [4] [5].

    Power Regulators for different voltage levels

    Power Regulators for different voltage levels

    System Clock

    System Clock

    Sprom

    Sprom

    Test points & LED

    Test points & LED

    JTAG

    JTAG

    SDRAM

    SDRAM

    Virtex 6 FPGA (Low Power)

    BPI

    CPLD

    BPI

    CPLD

    20 Serial Link channels (optical) for intra and inter cluster communications

    20 Serial Link channels (optical) for intra and inter cluster communications

    Fig. 2 Proposed FloSwitch design using Virtex 6 FPGA

  4. Key areas of power consumption

    Power consumption is a part of design constraint, which can be understood in the simple numerical model. With a closer look at power dissipation, it becomes obvious that the subject is not that simple. Electric current is not constant during operation and peak power is an important concern. The device will fail due to electro-migration and voltage drops even if the average power consumption is low. Different factors of power consumption in a design can be shown as given below.

    Pavg = Pd + Ps + Pl + Pst

    The key factors are dynamic, short-circuit, leakage and static power consumption. These factors of power consumption depend on the application and technology [6].

    The main section of power utilization is CMOS is dynamic. The electric current id that flows during this process causes power dissipation PD. The current is dependent on the capacitive output load Cload (charging and discharging) and the supply voltage V. A first order approximation of the dynamic power consumption of CMOS circuitry is given as

    Pd = KCload V2f

    K is the average number of rising transitions during one clock cycle and f the clock frequency. In a defined technology and timing constraints the logic zero and logic one should be in the range.

    MOS circuits which has both pull-up and pull-down network. When pull up & pull down are active for small time, current isc flows to ground. This is known as short-circuit current. Power (Ps) for the same is given as

    Ps = K /12(V-Vt)3f

    MOS transistor gain factor is , threshold voltage Vt and is the rise/fall time of the gate inputs.

    Leakage power (Pl) refers to the current flows during the reverse biased diodes that are between the diffusion regions and the substrate. Isub is the currents flow through transistors is non- conducting. Static Power (Pst) refers to the current flows from power to ground during idle time of the CMOS circuits [7].

  5. Scope of the work

    Existing and proposed communication System design analyzed. Critical issues of Components like switching time, access time, clock distribution and power management have been taken care. Theoretical analysis of power assessment has been done for the Virtex5 and Virtex6 using Xilinx power estimator (XPE). Power consumption of additional components calculated separately. Over all power utilization has been shown in tables and corresponding bar chart.

  6. Power calculations

    Attention is given towards the Power consumption and its related factors before taking up the design. FPGAs have increased in logic capacity & performance even migrating to smaller process geometries and low power consumption. Designers are looking for next generation systems to have more features & higher performance with less power and small geometry. In this work considerable power reduction has been achieved by using low power devices. Here the comparative study has been done with existing and proposed design with their components and its power consumption in the design. Details have been given below.

    FPGA power consumption: resource utilization of existing and proposed Virtex FPGA and its overall power utilization in the design.

    Table 6.1: FPGA

    Parameters

    Existing (Virtex 5)

    Proposed (Virtex 6)

    Supply Voltage VCCINT

    1V

    0.9V

    Total power consumed

    12.217W

    13.645W

    Memory operation — SDRAM

    Table 6.2: SDRAM

    Parameters

    Existing (48LC8M16A2)

    Proposed (W987D6HBGX6E)

    Supply Voltage VCCINT

    3.3V

    1.8V

    IO Supply Voltage

    2.5V

    1.8V

    Supply Current

    160mA

    35mA

    Frequency

    166MHz

    166MHz

    Power Consumed

    528mW

    63mW

    There is a power difference of almost 500mW from the existing RAM to the proposed and increased number of address lines helps in more efficient memory operations.

    Flash memory: Byte Peripheral Interface (BPI) improves the transfer speed over Serial Peripheral Interface (SPI). SPI is serial interface and BPI is parallel interface.

    Table 6.3: Flash Memory

    Parameters

    Existing-SPI (25P28V6P)

    Proposed-BPI (JS28F256P30)

    Supply Voltage VCCINT

    3.3V

    1.8V

    IO Supply Voltage

    2.5V

    1.8V

    Supply Current

    15mA

    30mA

    Power Consumed

    49.5mW

    54mW

    Dual port memory (IDT70V658S):- Existing system device power

    Table 6.4: Dual Port Memory

    Parameters

    Existing(IDT70V658S)

    Supply Voltage VCCINT

    3.3V

    IO Supply Voltage

    3.3V

    Supply Current

    500mA

    Power Consumed

    1650mW

    This is for a single DPM. For 8 DPMs, the power consumption is 13.2W. In the proposed system, external DPM is replaced by Block RAMs.

    CPLD for the existing design

    Table 6.5 CPLD

    Parameters

    Existing(XC95144XL)

    Supply Voltage VCCINT

    3.3V

    IO Supply Voltage

    3.3V

    Supply Current

    45mA

    Power Consumed

    148.5mW

    SFP Transceivers:- For one Small Form-Factor Pluggable (SFP) transceiver the power consumption mentioned here. Such 16 SFP have been used for the design.

    Table 6.6 SFP transceivers

    Parameters

    Existing-(FTLF-1324P2BTV)

    Proposed-(FTLF-8524P2BNL)

    Voltage VCCINT

    3.3V

    3.3V

    Supply Current

    300 mA

    240 mA

    Data Rate

    4.25 Gbps

    4.25 Gbps

    Power Consumed for one SFP

    990mW

    792mW

    Total power for 16 SFPs

    990×16 = 15840mW

    792×20 = 12672mW

    Clocking circuit –Clock oscillators (50MHz clock)

    Table 6.7: Clock Oscillator (50 MHz)

    Parameters

    Existing (ECS3953MBN)

    Proposed (ECS3518MBN)

    Supply Voltage VCCINT

    3.3V

    1.8V

    Voltage for Oscillation

    2.2V

    1.8V

    Supply Current

    35mA

    25mA

    Frequency Range

    (1.8-125) MHz

    (1.8 125) MHz

    Power Consumed

    115.5mW

    45mW

    Clock oscillators (156.25 MHZ Clock)

    Table 6.8: Clock Oscillator (156.25 MHz)

    Parameters

    Existing (ECS3953MBN)

    Supply Voltage VCCINT

    2.5V

    Supply Current

    30mA

    Frequency Range

    (53.125-700) MHz

    Power Consumed

    75mW

    Virtex 5 based design Virtex 6 based design

    Virtex 5 based design Virtex 6 based design

    FPGA

    13.645

    12.27

    FPGA

    13.645

    12.27

    DPM

    13.2

    DPM

    13.2

    12.982

    12.982

    Power consumption in Watts

    Power consumption in Watts

    For the existing Virtex 5 and proposed Virtex 6 based FloSwitch design, power analysis with their major components plotted in figure 1. Replacing on board DPMs and using resources of Virtex 6 Block rams as DPMs, consumes little extra power compare to Virtex 5. On board DPM power utilization completely zero for Virtex 6 based design. Other major components used in the design and their total power utilization also represented below in figure 1. Over all power utilization in the proposed design is significantly reduced.

    18

    Other Devices

    16.686

    18

    Other Devices

    16.686

    16

    14

    12

    10

    8

    6

    4

    2

    0

    16

    14

    12

    10

    8

    6

    4

    2

    0

    1

    1

    2

    Major Devices

    2

    Major Devices

    3

    3

    Fig.1 Graph shows the comparative power consumption of Virtex 5 and Virtex 6 based FloSwitch design

  7. Power analysis

    Technology process with 40 nm for Virtex 6 FPGA, achieved dramatic power reductions over previous generation Virtex- devices (65 nm). Achieving such a significant reduction in power consumption gives boost for major technology development. At 40 and 45 nm technology, transistor leakage current increases exponentially so keeping static power low is a big challenge [8]. In Addition, the desire for high performance continues to drive core clock rates high, increases dynamic power. In spite of all these challenges theoretical analysis shows that power utilization of resource in Virtex-6 FPGAs is less compare to the Virtex 5 FPGAs.

    1. Virtex 5 (b) Virtex 6

      Fig.2 (a), (b) shows the comparison of power consumption for typical and maximum voltages with specified temperature range of Virtex 5 and Virtex 6 FPGA

      Virtex 6 logic & IO resource utilization is 30% more compare to the virtex 5 FPGA but power consumption in virtex 6 is less as shown in figure 2. Leakage current in virtex 6 is little extra than the virtex 5, because in virtex 6, transistors leakage current is dependent more on the junction temperature. So static power of the device will be more compare to the Virtex 5. Block Ram resource utilization in Virtex 6 is 4 times more compare to the Vitex 5. Virtex 6 Block Ram used in the proposed design to replace the on board DPMs used in the existing Virtex 5 based Floswitch. Block ram power utilization is 1.75W in Virtex 6 compare to the on board DPM power requirement of 13.2W [9] [10].

      (c) Virtex 5 (d) Virtex 6

      Fig.3 (c), (d) represents the graph of on chip power vs. Vccint for Virtex 5 and Virtex 6 FPGA

      Overall 40% more logic used in the proposed Virtex 6 than Virtex 5 based design. On chip power utilization using core voltage (0.9V) of Virex6 based design is less than the existing design of Virtex 5 core voltage (1.0V). It has been shown in the figure 3.

      (e) Virtex 5 (f) Virtex 6

      Fig.4 (e) and (f) graph of on chip typical vs. maximum power for Virtex 5 and Virtex 6 FPGA with reference to junction temperature

      As shown in figure 4 Virtex 5 and Virtex 6 power is stable with junction temperature for the typical voltage. Resource utilization in Virtex 6 is 40% more but the power consumption is almost same for lower junction temperature (50C) and as well for typical voltage [11] [12]. Virtex 5 the power difference for typical (1.0V) and maximum (1.05V) voltage is almost same for all the three temperature cases. In the case of Virtex 6 the power difference increases between typical (.9V) and maximum (.93V) voltage with the increase of junction temperature. Junction temperature increases the leakage current so static and leakage power increases significantly due

      to the reduced transistor gate length (40nm). In figure 5 on chip-power of Virtex 6 is less than Virtex 5 near junction temperature 45-50C. The power consumption in Virtex 6 (gate length 40nm) increases significantly as shown in the graph compare to Virtex 5 (gate length 65nm) when junction temperature increases. Here the difference is due to the change in gate length of the transistor in both devices [13] [14].

      (c) Virtex 5 (d) Virtex 6

      Fig.5 (g), (h) represents the graph of on chip power vs. junction temperature for Virtex 5 and Virtex 6 FPGA

      Total power calculated for the Virtex 5 FPGA with on board components and terminations it is around 43.85W. Virtex 6 FPGA based FloSwitch with inbuilt memory & low power components and replacing DPM and associated terminations total power consumption comes around 26.88W. It shows a quite reasonable amount of reduction in the power on one FloSwitch. The overall reduction of the power is about 16.97W.

      Proposed design is integrated to the present big system of 1024 processors; the total power reduction will be around 2172W for 128 Floswitches. The minimum load on power supply unit replaces additional usage of heat sinks and fans and making it more efficient.

  8. Conclusions

Analysis report shows that over all power utilization reduced by 35%. Its a significant achievement to bring out the design with a power reduction of 16.97W on one communication device (FloSwitch) Using low power components. Using in-built DPM core of Virtex 6 FPGA reduces the 60% power consumption compare to the on board DPMs used in the earlier design with Virtex5. In built core of FPGA for using it as DPMs of proposed design improves the communication speed. The overall board dimension reduced considerably.

References

  1. Flosolver Team, NAL. Preliminary performance analysis of Flosolver Mk-8Flosolver by PDFS 1017.

  2. A.M. Middleton. Data-Intensive Technologies for Cloud Computing. Handbook of Cloud Computing. Springer, 2010

  3. Communication protocol development for the FloSwitch by Flosolver team NAL PDFS0609

  4. Flosolver Team, NAL A study of FPGA modules on FPGA based FloSwitch NAL PDFS 1009

  5. Jagannadham V V, Anand Raj D, Venkatesh and Rajalakshmy Sivaramakrishnan Hardware Design Document for Pentium M Based FloSwitch. NAL PDFS 0509.

  6. Frank Poppen Low power design guide. OFFIS version 30.06.00, Dipl.-Inform., 2000.

  7. Havinga, Paul J.M. and Smit, Gerard J.M. Design techniques for low-power systems. Journal of Systems Architecture. 46(1). Pp1-21, 2000.

  8. Arman Vassighi and Manjo Sachdev. Thermal and Power Management of Integrated Circuits by Springer, 2006

  9. D. M. Brooks, P. W. Cook, P. Bose, S. E. Schuster, H. Jacobson, P. N. Kudva, A. Buyuktosunoglu, J. Wellman, V. Zyuban and M. Gupta."Power-aware microarchitecture: design and modeling challenges for next-generation microprocessors". IEEE Microelectronics, Vol. 20, No. 6, 2000.

  10. E. J. Nowak. "Maintaining the benefits of CMOS scaling when scaling bogs down". IBM Journal of Research and Development, Vol. 48, No.2/3, pages 26-44, 2002.

  11. A. Keshavarzi, K. Roy and C.F. Hawkins. "Intrinsic leakage in deep submicron CMOS ICs: measurement based test solutions". IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 8, No. 6, pages 7 17-723, 2000.

  12. Y. Taur and T. H. Ning. "Fundamentals of modern VLSI Devices" Cambridge University Press, pages 120-1 28, 1998.

  13. Y. Taur and T. H. Ning. "Fundamentals of modern VLSI Devices" Cambridge University Press, pages 94-95, 1998.

  14. S. Tompson, P. Packan and M. Bohr. "MOS scaling: transistor challenges for 21st centuries". Intel Technology Journal, Q3, 1998.

Leave a Reply