Cluster Computing Using FPGA vs.GPPs

DOI : 10.17577/IJERTCONV2IS09006

Download Full-Text PDF Cite this Publication

Text Only Version

Cluster Computing Using FPGA vs.GPPs

Cluster Computing Using FPGA vs.GPPs

Ms. Sharon Bhatnagar

Dept. of Electronics and Communication. LNCT Bhopal, India

Ms. Mayuri Chawla

Dept. of Electronics and Telecommunication. JIT Nagpur, India

Mr.Nakul Nagpal

Dept. of Electronics and Telecommunication. JIT Nagpur, India

Abstract In this paper, I will discuss about the cluster computing and a brief overview of cluster computing using GPPs then emphasis on cluster computing using Field programmable gate arrays, it also introduces the importance of reconfigurable computing reconfigurable computing and then after will go through case study in which different classes of machines are analyzed and at the end will have seen the future in cluster computing using FPGAs.

KeywordsCluster computing; GPP; FPGA; Reconfigurable computing; ASICs


    The cluster computing comprises of cluster of computers which work together in synchronization to perform complex time consuming problems. The need for cluster computing arise due to applications demands more computing cycles/memory such as Scientific/Engineering computing, General-purpose computing: Video, Graphics, CAD, Databases, Transaction Processing, Gaming, technology trends i.e. number of transistors on chip growing rapidly, clock rates goes up slowly, architecture trends like Instruction- level parallelism (ILP), task or thread level (TLP) lastly economics that the increased utilization of commodity of-the- shelf (COTS) components in high performance parallel computing systems instead of costly custom components used in traditional supercomputers leading to much lower parallel system cost [1]. Thus we see the main goal here is to reduce the cost and increase the computation efficiency and in the current scenario GPPs which offer high performance computing and flexibility also FPGAs which are computing efficient and reconfigurable, are used in cluster computing as they are cheap and easily available, so here our emphasis is on cluster computing using GPPs and FPGAs.

  2. GPPS AND CLUSTER COMPUTING USING GPP GPPs are scalar or single-processor computer. These use

    many different processors and run a wide variety of operating systems. These processors have always been used for general purpose application but day by day computing demand is increasing and to meet these intense computing there are

    Mr.Sudhanshu Mohan Khare

    Dept. of Electronics and Communication. LNCT Bhopal, India

    development done on these GPPs at circuit level, logic level, architecture level. Today the GPPs available in market comes with multiple core and operating frequency in the range of 3.0 GHz and also the architecture are made smart to do dynamic scheduling, branch prediction, etc. Thus we can use these GPPs in a cluster to perform the massive computation replacing the expensive custom processor. These processors offer a high level of flexibility. Some of the listed GPPs are [2]

    • Intel Pentiums & AMD athlons

      • 3.0 GHz

      • Linux & MS Windows

    • AMD Opteron & Intel Itanium2

      • 1.8 GHz

      • Linux & MS Windows

    • IBM Power4

      • 1.7 GHz

      • AIX & Linux

    • Sun UltraSparcIII

      • 1.2 GHz

      • Solaris & Linux

    • SGI R16000

      • 700 MHz

      • IRIX

    • Motorola G5

      • 2.0 GHz

      • Mac OS X & Linux

    • Alpha 21264

      • 1.26 GHz Tru64

      • Unix & Linux

        Thus using commodity-off-the-shelf hardware components as well as free software, is beginning to play a major role in the field of supercomputing.

  3. FPGAS AND CLUSTER COMPUTING USING FPGA A field-programmable gate array (FPGA) is

    an integrated circuit designed to be configured after manufacturing by the user thus the name "field- programmable". The FPGA configuration is specified using a hardware description language (HDL), similar to that used

    for an application specific integrated circuit (ASIC). FPGAs can be used to implement any logical function that an ASIC could perform. The ability to update the functionality after manufacturing, and the low non-recurring engineering costs relative to an ASIC design, offer advantages for many applications.

    FPGAs contain programmable logic components called "configurable logic blocks", and an interconnection network to draw the path between these blocks like a one-chip programmable breadboard. Logic blocks comprises of look up tables i.e. LUTs and D flip flop as shown in the figure 1 and can be configured to perform complex combinational functions, or simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory [3].

    Figure 1(from [5])

    Reconfigurable computing is a computing paradigm combining some of the flexibility of software with the high performance of hardware by processing with very flexible high speed computing fabrics like FPGAs. The principal difference when compared to using ordinary microprocessors is the ability to make substantial changes to the datapath itself in addition to the control flow. On the other hand, the main difference with custom hardware (ASICs) is the possibility to adapt the hardware during runtime by "loading" a new circuit on the reconfigurable fabric [4].

    Why Reconfigurable computing?

    To improve performance (including predictability) and computational energy efficiency over software implementations (vs. processors: GPPs, ASPs). E.g. signal processing applications in configurable hardware. Also provide powerful, application-specific operations in hardware (ASIC-like). Improve product flexibility and lower development cost/time compared to hardware (vs. ASICs).

      1. encryption, compression or network protocols handling in configurable hardware and finally to use the same hardware for different purposes at different points in the computation [5].

        To carry out the study on this cluster of FPGA they choose molecular dynamics as molecular dynamics simulations of molecules it done at atomic level which is highly compute- intensive and inherently parallelizable

        Molecular Dynamics calculations

        • Calculate interatomic forces

        • Calculate the net force

        • Integrate Newtons equations of motion

          Figure 2 (from [6])

          The figure 2 shows the computation for one atom, the total computation of all the forces is given by U as shown in figure 3

          U =

          Figure 3 (from [6])

          For these molecular dynamics simulations some of the computing solutions mentioned as class 1, class 2 and class 3


    Here I discuss about an idea presented by Paul Chow and Edward S. Rogers Sr. about the Biomolecular Simulations on an FPGA Cluster. They are targeted to build a FPGA cluster and to explore its computation capabilities they studied molecular dynamic and tried to implement and challenge the existing machines against the FPGA cluster.

    Class 1 Machines

        • Supercomputers or clusters of workstations

        • ~10-10^5 interconnected CPUs

          Figure 4 (from [6])

          Class 2 Machines

        • Hybrid networkof CPU and FPGA hardware

        • FPGA acts as external co-processor to CPU

        • Programming model still evolving

          Figure 11 (from [6])

          Figure 5 (from [6])

          Class 3 Machines

        • Network of FPGA-based computing nodes

          Figure 6 (from [6])

          In this case the target machine is class 3 i.e. a network or cluster were FPGA acts as a processing elements. The system specifications defined by them are as follows

          Details about class 3 Machine

        • Designed for applications that exhibit high compute- to-communication ratio

        • Made possible by integration of microprocessors, high-speed communication interfaces into modern FPGA packages

        • Design Features

          • Distributed memory model

          • Low-latency point-to-point interconnection network

          • Provides abstraction of uniform, extensible FPGA fabric to system designers

          • Constructed entirely using commodity FPGA components.

        • Layer 3 (collective operations): Barrier synchronization, data gathering and message broadcasts.

        • Layer 4 uses MPI interface. All MPI functions implemented in TMD-MPI those are available to the applicatio


    In cluster computing the two main issue of concern is the computation efficiency and the communication latency, using FPGAs computation efficiency can be improved to great extent but the communication latency is still an issue, using high speed interconnects solve communication problem to a limit. In order to reduce the communication latency the proposed solution is to build custom boards with multiple FPGAs and all the FPGAs are interconnected to each other with dedicated buses as today FPGA are build with capability of communicating with multiple I/O devices, thus all FPGA devices can communicate in parallel and reducing the communication latency. Further with these custom board clusters can be built and so on.

    Another issue with reconfigurable computing is that configuring a FPGA device cost a lot in terms of time, so this needs to be reduced but again there is a limit to the extent it can be reduced with respect of one CPU cycle, a probable solution is to overlap the configuring time with the computation so when some of the devices are working the other ideal devices can be configured in order to achieve this proper partitioning of program and then assigning these task properly to the devices is a must.


In this paper, we discussed about the cluster computing using GPPs and FPGA and we had seen that till date in cluster computing emphasis had always been on modifying the architecture of the GPPs that are used as computing nodes in cluster to compute different types of complex problems and also trying to build good programming model to interface between this computing nodes, with the evolution of reconfigurable computing brought a new direction of study were we can modify the hardware as much as we want at the run time thus making the computation efficiency even better and limiting the hardware and cost as we do not need to have ASPs for all kind of different applications that we wish to run on the cluster instead we have cluster of FPGAs which can be

configured before in hand or at the run time depending on the application demands.



  1. l



  4. Advanced computer Architecture Lecture 9 Dr. Muhammad Shaaban.

  5. Paul Chow and Edward S. Rogers Sr., Biomolecular Simulations on an FPGA Cluster (+ the Future of Computing)

Leave a Reply