Parallel Computer Architectural Schemes

DOI : 10.17577/IJERTV1IS9187

Download Full-Text PDF Cite this Publication

Text Only Version

Parallel Computer Architectural Schemes

Parallel Computer Architectural Schemes

Mrs. P.M. Chawan, Bhagyashree Patle, Varshali Cholake, Sneha Pardeshi

Department of Computer Technology, Veermata Jijabai Technological Institute, Mumbai

Abstract

This paper describes about computer architectural classification. These architectural schemes are given by Flynn, Feng, Handler and Shores. Flynns classification is based on multiplicity of instruction stream and data stream in a computer system. Fengs classification is mainly based on serial and parallel processing in the computer system. Handlers classification is calculated on the basis of degree of parallelism and pipelining in system levels. Shoress classification is based on constituent element in the system.

  1. INTRODUCTION

    Parallel processing has emerged as a key enabling technology in modern computers, driven by the ever- increasing demand for higher performance, lower costs, and sustained productivity in real-life applications. Concurrent events are taking place in todays high performance computers due to the common practice of multiprogramming, multiprocessing, or multicomputing. Parallelism can be in the form of look ahead, pipelining vectorization concurrency, simultaneity, data parallelism, partitioning, interleaving, overlapping, multiplicity, replication, time sharing, space sharing, multitasking, multiprogramming, multithreading, and distributed computing at different processing levels. Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently.

  2. CLASSIFICATION

    The four classifications defined by Flynn are based upon the number of concurrent instruction (or control) and data streams available in the architecture:

    1. Single Instruction, Single Data stream (SISD)

      A sequential computer which exploits no parallelism in either the instruction or data streams. Single control unit (CU) fetches single Instruction Stream (IS) from memory. The CU then generates appropriate control signals to direct single processing element (PE) to

      operate on single Data Stream (DS) i.e. one operation at a time

      Examples of SISD architecture are the traditional uniprocessor machines like a PC (currently manufactured PCs have multiple processors) or old mainframes.

    2. Single Instruction, Multiple Data streams (SIMD)

      A computer which exploits multiple data streams against a single instruction stream to perform operations which may be naturally parallelized. For example, an array processor or GPU.

    3. Multiple Instruction, Single Data stream (MISD) Multiple instructions operate on a single data stream. Uncommon architecture which is generally used for fault tolerance. Heterogeneous systems operate on the same data stream and must agree on the result. Examples include the Space Shuttle flight control computer.

    4. Multiple Instruction, Multiple Data streams (MIMD) Multiple autonomous processors simultaneously executing different instructions on different data. Distributed systems are generally recognized to be MIMD architectures; either exploiting a single shared memory space or a distributed memory space. A multi-core superscalar processor is an MIMD processor.

      • DIAGRAM COMPAIRING CLASSIFICATION

        IS

        IS DS

        CU PU MU

        I/O

        1. SISD Uniprocessor Architecture

          Captions:

          CU- control unit PU process unit MU memory unit DS Date Stream IS Instruction stream

        2. SIMD Architecture

          Captions:

          CU -Control Unit; PU -Processing Unit MU- Memory Unit; ID- Instruction Stream DS- Data Stream; PE- Processing Element LM- Local Memory

        3. MISD Architecture (the Systolic Array)

          Captions:

          CU -Control Unit; PU -Processing Unit MU- Memory Unit; ID- Instruction Stream DS- Data Stream; PE- Processing Element LM- Local Memory

        4. MIMD architecture

    application program being executed.

    Word length(n)

    Fig. Fengs classification in terms of parallelism exhibited by word length and bit-slice length

    In above fig horizontal axis shows word length n and vertical axis corresponds to the bit-slice length m. A bit slice is a string of bits, one from each of the words at the same vertical bit position. The maximum parallelism degree P(C) of a given computer system C is represented by the product of the word length n and the bit-slice length m; that is,

    Captions:

    CU -Control Unit; PU -Processing Unit MU- Memory Unit; ID- Instruction Stream DS- Data Stream; PE- Processing Element LM- Local Memory

    1. Fengs classification is mainly based on degree of parallelism to classify parallel computer architecture. The maximum number of binary digits that can be process per unit time is called maximum parallelism degree P. The average parallelism degree

      P(C)= n.m

      The pair (n,m) corresponds to a point in the computer space shown by the coordinate system in fig.

      The P(C) is equal to the area of the rectangle defined by integers n and m.

      There are four types of processing methods that can be observed from the diagram:

        • Word serial and bit-serial(WSBS)

        • Word parallel and bit-serial(WPBS)

        • Word serial and bit-parallel(WSBP)

        • Word Parallel and bit-parallel(WPBP)

          WSBS has been called as bit serial processing because one bit (n=m=1) is processed at a time, which was a slow process. This was done in only first generation computers. WPBS (n=1,m>1) has been called bis(bit- slice) processing because an m-bit slice is processed at a time. WSBP(n>1,m=1) has been called word-slice

          =

          =1

          processing because one word of n bits is processed at a time. These are found in most existing computers. WPBP

          Where T is a total processor cycle

          The utilization of computer system within T cycle is given by:

          (n>1,m>1) is known as fully parallel processing, in which an array of n.m bits is processed at a time. This is the fastest mode of the four .

          =

          =

          =0

          .

          When = it means that utilization of computer system is 100%. The utilization rate depends on the

    Handlers proposed an elaborate notation for expressing the pipelining and parallelism of computers. He divided the computer at three levels.

      • Processor Control Unit(PCU)

      • Arithmetic Logic Unit(ALU)

      • Bit Level Circuit(BLC)

        PCU corresponds to CPU, ALU corresponds to a functional unit or PEs in an array processor. BLC corresponds to the logic needed for performing operation in ALU.

        1. Cray-1: this model having 64 bit single processor computer. ALU has 12 functional unit and 8 of which are pipelined. The range of functional units are 1to 14.

          The description for Cray-1: Cray-1 = (1, 12*8,(114))

        2. Illiac-IV: this array processor was fabricated by Burroughs. It is made up from arrays which are connected in a mesh. The total 64 arrays have 64 bit ALUs, it has two DEC PDP-10 as the front end. Whereas Illiac-IV accept data from one PDP-10 at a time.

        PDP-10 Illiac – IV= (2,1, 36)*(1,64,64)

        This model will also work in half word mode, at that time it has 128 processor of 32 bit each instead by normal 64 processor of 64 bit each.

        PDP10 Iliac-IV/2 =(2,1,36)*(1,128,32)

        Combining this with above we get

        PDP-10 Illiac-IV = (2, 1,36)*[(1,64,64)v(1,128,32)]

        5. Shors Classification: In this classification computers are classified on the basis of organization of the constituent elements in the computer. He proposed 6 machines which are recognized and distinguished by numerical designators.

        Machine1:

        Control Unit

        He uses three pairs of integers to describe computer:

        Computer = (k*k, d*d , w*w)

        Where, k= no. of PCUs

        k=no. of PCUs which are pipelined d=no. of ALUs control by each PCU d=no. of ALUs that can be pipelined

        w=no. of bits or processing elements in ALU w=no. of pipeline segments

        * operation is used to show that units are pipelined. + operator is used to show that units are not pipelined. v operator is used to show that computer hardware can work in one of the several mode.

        operator is used to show that range of any parameter.

        Consider the following model and observe how handlers differentiate them on the basis of degree of parallelism and pipelining.

        1. CDC 6600: this model consist single processor which is supported by 10 I/O processor. One control unit control one ALU with 60 bit word lengh. The ALU has 10 functional unit which are in pipelined manner. 10 I/O processor work parallel with each other and with CPU. Each I/O processor contains 12 bit ALU.

          The description for 10 I/O processor: CDC 6600 I/O= (10, 1, 12)

          This machine is same as m

          ut here DM fetc

          U performs para

          The description for main processor:

          Processing Unit

          Memory (Word Slice)

          This machine is based on von Neumann architecture with following units:

          1. Control unit

          2. Processing unit

          3. Instruction memory and data memory

            A single data memory reads the word for parallel processing and generates all bits for that word. PU contains two types of functions which may be pipelined or may not. As a result of that this machine contains both type of computer namely scalar computer and pipeline vector computer.

            Example: IBM360/91, Cray1 Machine 2:

            Control Unit

            CDC 6600 main= (1, 1*10,60)

            Processing Unit

            achine 1 b

            Memory

            hes

            In this model main and I/O processor are pipelined. So

            bit slice of the word from memory. P

            (bit slice)

            llel

            that * operator will be used to combine both of them. CDC 6600 = (I/O processor) * (main processor)

            operation on the word. If the memory contains two

            dimension array of bit with one word stored per row in this case machine 2 reads the vertically and process the same element.

            Example: ICL DAP , MPP Machine 3:

  3. As of 2006, all the top 10 and most of the TOP500 supercomputers are based on a MIMD architecture.

    Some further divide the MIMD category into the following categories:

    Control Unit

    Processing Unit (Horizontal)

    Processing Unit

    (vertical) Memory

    • Single Program, Multiple Data (SPMD)

    SPMD refers to the multiple autonomous processors, simultaneously executing the same program (but at independent points, rather than in the lockstep that SIMD imposes) on different data. Also referred to as 'Single Process, multiple data- the use of this terminology for

    This machine is a combination of machine 1 and machine

    1. In this machine both vertically and horizontally reading and processing are possible. Hence it contain both horizontal and vertical processing unit.

    Machine 4:

    CU

    PU

    PU

    PU

    MU

    MU

    MU

    This machine is obtained by duplicating the PU and DM of machine 1. Combining PU and DM called as Processing Elements (PEs). Instructions are given to PEs for processing through the single control unit. Here there is no communication between PEs. This machine limits the applicability of the machine due to absence of communication between PEs.

    Example: PEPE

    Machine 5:

    CU

    SPMD is erroneous and should be avoided, as SPMD is a parallel execution model and assumes multiple cooperating processes executing a program. SPMD is the most common style of parallel programming. The SPMD model and the term was proposed by Frederica Darema. Gregory F. Pfister was a manager of the RP3 project, and Darema was part of the RP3 team.

    • Multiple Program Multiple Data (MPMD) MPMD refers to the multiple autonomous processors simultaneously operating at least 2 independent programs. Typically such systems pick one node to be the "host" ("the explicit host/node programming model") or "manager" (the "Manager/Worker" strategy), which runs one program that farms out data to all the other nodes which all run a second program. Those other nodes then return their results directly to the manager. An example of this would be the Sony PlayStation 3 game console, with its SPU/PPU processor architecture.

  4. The above discussed Computational models are basically designed for to simulate a set of processor observed in the natural world in order to gain an understanding of these processes and to predict the outcome of natural process given a specific set of input parameters.

    PU

    ……..

    MU

    ……..

    PU PU

    MU MU

  • Flynn, M. (1972). "Some Computer Organizations and Their Effectiveness". IEEE Trans. Comput. C- 21: 948.

  • Duncan, Ralph (1990). "A Survey of Parallel Computer Architectures". IEEE Computer: 516.

This machine is similar to the machine 4. Here there is communication between PEs.

Example: ILLIAC IV

Machine 6:

CU

U and DM are c

ssor.

PU

  • Architecture and Parallel Processing By Bharat Bhushan Agarwal, Sumit Prakash Tayal

  • http://www.cisl.ucar.edu/docs/ibm/ref/parallel.html

  • http://www.cisl.ucar.edu/docs/ibm/ref/parallel.html

  • http://www.nist.gov/dads/HTML/singleprogrm.htm l

  • Computer Architecture and parallel processing by Kai Hawang and Briggs

In this machine P associative proce

And Memory

ombined and called as

Leave a Reply