Different Arbitration Techniques for On- Chip(AMBA) Shared BusMulti-Processor SoC

Anurag Shrivastava; Dr. Amit Kant Pandit

doi:10.17577/IJERTCONV2IS10006

NCETECE - 2014 (Volume 2 - Issue 10)

Different Arbitration Techniques for On- Chip(AMBA) Shared BusMulti-Processor SoC

DOI : 10.17577/IJERTCONV2IS10006

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 297
Total Downloads : 11
Authors : Anurag Shrivastava, Dr. Amit Kant Pandit
Paper ID : IJERTCONV2IS10006
Volume & Issue : NCETECE – 2014 (Volume 2 – Issue 10)
Published (First Online): 30-07-2018
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Different Arbitration Techniques for On- Chip(AMBA) Shared BusMulti-Processor SoC

Anurag Shrivastava Dr. Amit Kant Pandit Research Scholar Prof. SMVDU

shrivastavaanurag@rediffmail.com amitkantpandit@gmail.com

AbstractOn-chip communication architectures play an important role in determining the overall performance of System- on- Chip (SoC) designs. Communication architectures should be flexible so as to offer high performance over a wide range of traffic characteristics. In state-of-the-art multi-processor systems- on-chip (MPSoC) , interconnect of processing elements has a major impact on the systems overall average -case and worst- case performance.

In shared SoC bus systems, arbiters are usually adopted to solve bus contentions with various kinds of arbitration algorithms. In shared-memory MPSoCs

buses are still the prevalent means of on-chip communication for small to medium size chip-multi-processors (CMPs). Still, bus arbitration schemes employed in current architectures either deliver good average-case performance ( maximize bus utilization) or enable tight bounding of worst-case-execution time. This paper presents a shared bus arbitration approach allowing high bus utilization while guaranteeing a fixed bandwidth per time frame to each master. Thus it provides high-performance to both real time and any-time applications or even a mixture of both.

Keywords: On-Chip Bus, Arbiter ,MPSoC,

Introduction

Performance of Multi core Shared bus Embedded Controller depends on how effectively the sharing resources can be utilized. Common bus in System on Chip is one of the sharing resources, shared by the multiple master cores and also acting as a channel between master core and slave core (peripherals) or Memories. Arbiter is an authority to use the shared

Resource (Shared bus) effectively, so performance also depends on arbitration techniques. The arbitration mechanism is used to ensure that only one master has access to the bus at any one time. The arbiter performs this function by observing a number of different requests to use the bus. Master may request to bus master (arbiter) to use the bus during any cycle. The arbiter will sample the request on the rising of the clock and then use predefined algorithm to decide which master will be the next to gain access

to the bus. [1] On-chip communication architecture plays an important role in determining the overall performance of the System-on-Chip (SoC) design. In the recourse sharing mechanism of SoC, the communication architecture should

be flexible to offer high performance over a wide range of data traffic. Rapidly developing electronics industry has entered era of multimillion gate chips. This developing design technology promises new levels of integration on a single chip called System-on-chip design., but also presents significant changes to the chip designer.SoC is a technology that integrates heterogeneous system components such as microprocessor, memory logic and DSP into single chip[2].Currently, On chip interconnection networks are mostly implemented using buses. The performance of SoC greatly depends on efficient communication among processors and on the balanced distribution of computation among them rather than real speed of processor. The communication architecture plays a vital role in SoC design and its performance [3]. The topology consists of combination of shared buses and dedicated channels to which various SoC components are connected. The SoC components include (1) components that initiates transaction called masters (2) components that respond to transactions initiated

by masters called slaves that includes memories and peripheral devices. Since shared bus is used by SoC bus architectures should be designed in such a way to manage access to the bus, which are implemented in bus arbiter. Arbitration is a mechanism that decides the owner of a shared resource, the bus in this case .Bus arbitration mechanism is used to ensure

that only one master has the access to the bus at one time [4]. Bus arbiter designed performs this function. Centralized arbitration is performed in this research. Independent request and grant signals are used for each master as in (Fig 1). Multiprocessor uses priority based for I/O transactions and fairness based policy among processors. To optimize performance, bus should be designed to minimize the time required for request handling, arbitration addressing, so that

most bus cycles are used for useful data transfer operations

.Bus transaction is done by request signal followed by a response signal which is indicated by transition signal. It may limit the maximum number of bus cycles for master to use the bus ,thro maximum transfer size or it may split

Fig:1 Shared Bus Topology

transactions ,when slave devices are slow to respond to requests from a master. Arbitration competition and bus transaction takes place concurrently on a parallel bus with separate lines. Thus the communication architecture has a significant role in the performance of SoC design.[1] Centralized arbitration is dominating in embedded systems currently. Arbiter is a functional module that accepts bus requests from the requestor module and grants control of the shared bus to one requestor at a time. Arbiter is an important functional module in the multiprocessor design since it decides the communication between the master and slave. It should be carefully designed in high performance systems. The communication architecture topology consists of a network of shared and dedicated communication channels, to which various SoC components are connected.[2,3] These include (i) masters, which initiate a data transaction (e.g., CPUs, DSPs, DMA controllers etc.), and (ii) slaves, components that merely respond to transactions initiated by a master (e.g., on-chip memories). Fig (2). When the topology consists of multiple channels, bridges are used to interconnect the necessary channels. Since buses are often shared by several SoC masters, bus architectures require protocols to manage access to the bus

,which are implemented in (centralized or distributed) bus arbiters. Currently used communication architecture protocols includes round-robin priority based and time division multiplexing. In addition to arbitration, the communication Protocol handles other communication functions like to limit the maximum number of bus cycles by setting maximum transfer length.

Fig:2 Shared bus Architecture
STATIC FIXED PRIORITY ALGORITHM Static fixed priority is a common scheduling mechanism most common buses [3&4].In a static fixed priority scheduling policy; each master is assigned a fixed priority value. When several masters request simultaneously, the master with the highest priority will be granted. This is achieved by employing a centralized arbiter. (Fig.1.).If masters with high priority requests frequently, it will lead to the starvation of the elements with lowest priority. The advantage of this arbitration is its simple implement and small area cost. The static priority based architecture does not provide a means for controlling the fraction of communication bandwidth assigned to a component. If masters with high priority requests frequently, it will lead to the starvation of the ones with low priority.

A. Time Division Multiple Access (TDMA):

Time division multiplexed scheduling divides [5,7] execution time on the bus into time slots and allocates the time slots to adapters requesting the use of buses . A request for use ofthe bus might require multiple slot times to perform all required transfers. If the master associated with current time slot has pending request, the arbiter grants the transaction immediately and time wheel is rotated to next slot.

Fig:3 Schematic Diagram of TDMA Architecture

Advantage of this algorithm is that it is easy to implement. Disadvantage in this method is that it leads to the mistake of data transfer and poor response latency. However in this architecture, the components are provided access to communication channel in an interleaved manager, using two level arbitration protocols. To alleviate the problem of wasted slots, second level of arbitration is supported to permit the bus grant to other requesting masters. For e.g.. The current slot is reserved for M1, which has no pending request. As a result arbitration pointer is incremented from its current position to next pending request. (Fig 3). The major drawback is its poor bandwidth.

B.Round Robin Algorithm:

Round Robin algorithm can reallocate the available slots to other requesting master.[5,8] It is a fair arbitration style when used with a limited transfer length. Whenever a turn ends, either unused or because of end of transfer or limited

transfer length, the turn is passed to next component in order. Maximum access time and equal bandwidth can be achieved with limited transfer length. However it provides poor performance if requests are varied dynamically .A round-robin arbitration policy is a token passing scheme wherein fairness among

Masters is guaranteed, and no starvation can take place. In each cycle, one of the masters (in round-robin order) has the highest priority for access to a shared resource. If the token-holding master does not

need the bus in this cycle, the master with the next highest priority who sends a request can be granted the resource. The advantages of round-robin are twofold: Unused time slots are immediately re-allocated to masters which are ready to issue a request, regardless to their access order. This reduces bus under-utilization in comparison with a statically slot allocation, that might grant the bus to a master which is not going to carry out any communication.

obtained, which is compared with random number. The respective master with lottery close to the number is most likely granted. The ticket number in the lottery arbitration algorithm is equal to the weight of each master. The Lottery arbitration algorithm is the probability-based distribution[8]
which can avoid the bus starvation. Meanwhile, the Lottery arbitration has great control ability of communication bandwidth allocations to each master, but the master which owns lower tickets has more average latency than the other master.

C. Lottery-Based Arbitration Algorithm

Let the set of bus masters be C1,C2,C3 and C4.Let the number of tickets[5,10] held by each master be t0,t1,t2,t3.At any bus cycle, let the set of pending bus access requests be represented by a set of Boolean variables reqi(i=1,2,n) where reqi=1 if a corresponding master Cihas pending

request for the access of bus else reqi=0 The master Ci to be granted is selected in a pseudo random fashion favoring the components having larger number of tickets.

The probability of granting component Ci is given by

reqi*ti

P(Ci) = ————–

n

Fig:4 Lottery Arbiter for Shared Bus

The worst-case waiting time for the bus access request of a master is reliably predictable (being proportional to the number of instantaneous requests minus one), even though the actual waiting time is not. The uncertainty on the actual bandwidth that can be granted to a master is the major drawback of this scheme.

D.. Lottery Bus Architecture:

The core of the LOTTERYBUS architecture is a probabilistic arbitration algorithm implemented in a centralized lottery manager for each bus in the communication architecture.[9] The architecture does not presume any fixed communication topology. Hence various SoC components may be interconnected by an arbitrary network of shared channels. The Lottery bus arbitration algorithm (Fig.4.)[10] the role of the arbitration is like a lottery manager, which decides which lucky one, can win the prize. The lottery manager accumulates the requests of bus accesses from all of the masters, and then each master is statically assigned a number of lottery tickets[8&12] (Fig.4). A pseudo random number is generated which corresponds to one ticket number. Based on the requests and tickets owned, partial sum is

reqj*tj

j=1

For e.g. consider three out of four masters are requesting for the bus. Now bus contention should be resolved by arbitration policy. For e.g. Let the masters have the ticket ratio as 1:2:3:4.To find the solution and decide which master to own the bus the arbiter examines the number of tickets that the master possess, which has the pending requests.

This is given by (reqj*tj)reqj. It then generates Pseudo- random number (or picks a Winning "ticket) from the range [0,( reqj*tj) reqj] j=1. To n determine which component to grant the bus to first. If the number falls in the range [0, req1*t1] the bus is granted to component C1, if it falls e range [req1*t1,req1*t1+req2*t2]
it is granted to component C2 and so on. In general, if it lies in the range,

i i+1
[reqk*tk, reqk*tk] it is granted to component k=1 k=1

Ci+1.The component with the largest number of tickets occupies the largest fraction of the total range, and is consequently the most likely candidate to receive the grant, provided the random numbers are uniformly distributed over the interval For example, components C1, C2, C3 and C4 are assigned 1, 2, 3, and 4 tickets, respectively. However, at the instant shown, only C1,C3,C4 have pending requests hence the

number of current

n

tickets is calculated as (reqj*tj)reqj=(1+3+4)1=8

j=1

Therefore, a random number is generated uniformly in the range (0, 8). In the example, the generated random number is 5, and lies between (req0*t0+req1*t1+req2*t2+req3*t3) =4 and req0*t0+req1*t1+req2*t2+req3*t3+req4*t4=8.Therefore the grant signal is generated for the component C4 and the bus is granted.

Fig:5 Lottery Arbiter for Dynamic varying Tickets
1. Token Passing:
  
  In this protocol ring based architectures [10] are used. A special data word, called token, circulates on the ring. An interface that receives a token is allowed to initiate a transaction. When the transaction completes, the interface releases the token and sends it to the neighboring interface.
2. Code Division Multiple Access (CDMA):
This protocol has been proposed for sharing on-chip Communication channel. [10]In a sharing medium, it provides better resilience to noise/interference and has an ability to support simultaneous transfer of data streams. But this protocol requires implementation of complex special direct sequence spread spectrum coding schemes at the component bus interfaces. Round Robin algorithm can reallocate the available slots to ring master. It is a fair arbitration style when used with a limited transfer length. Whenever a turn ends, either unused or because of end of transfer or limited transfer length, the turn is passed to next component in order. Maximum access time and equal bandwidth can be achieved with limited transfer length.However it provides poor performance if requests are varied dynamically.
Review of Arbiter Techniques Design Issues

The Fuzzy arbiter devised by Preeti et al. (2011)[10] is found to be complex to implement and complexity increases with number of processors. It responds slowly

since it requires many calculations and it is hard to implement.

Sonntag and Helmut (2008) has devised weighted Round Robin algorithm to optimize the traffic characteristics in Multiprocessor architecture. It has been shown that WRR outperforms a round robin arbiter in throughput by 44% depending on the trafic patterns used. It is also proved latency for cache refill is also reduced by 34% using this arbiter.

Ari Kulmala et.al (2008) presents a thorough measurement of the effect of different arbitration algorithms on a real MPEG-4 implementation on FPGA. Various shared bus algorithms are compared.[11] The measured quantities include video encoding performance, area usage and the effect of different maximum transfer lengths. It is analyzed that at high utilization, Priority algorithm yield up to 60% better performance. At lower utilization, it is most preferable to use round-robin or combination of round- robin and priority with limited transfer length to avoid starvation.

Krishna Sekar and co. (2008) has designed FLEXIBUS,[14] a new architecture that can efficiently adapt the logical connectivity of the communication architecture and the modules connected to it. It has been implemented as an extension of AMBA bus. [15]They have applied in two SoC designs and performance has been analyzed. It is found that FLEXIBUS provides gains up to 34.55% compared to conventional architectures.

Wei Zhang et al (2007) describes an MPSoC FPGA prototype based on hierarchy bus using 4 ARM processor cores .Satisfactory results have been achieved thru FPGA implementation and the platform works efficiently under higher workloads. Yao et al (2006) proposed RB-Lottery algorithm which solves the starvation problem that exist in conventional algorithms and reduced average latency. The simulation shows, the algorithm has better performance of bandwidth requirements and has less average latency of bus requests than the lottery arbitration at the cost of increasing chip area and power consumption.

Ryu et al. (2001) have presented different MPSoC bus architectures and performance has been compared. It is concluded that bus architectures for a certain system must be determined by the type of application .The performance of these architectures is evaluated using applications from wireless communications, OFDM and MPEG 2 Decoder. It is found that among the five bus architectures Bi-FIFO and Cross Bar switch Bus Architecture perform the best for OFDM transmitter and MPEG 2 decoder respectively

.
Performance Comparison of Arbiters

Performance of the designed arbitration schemes has compared based on the parameter like Latency, Acceptance rate of Masters, Average Waiting time of masters & Shared bus bandwidth utilization by individual masters.[8&9].

Average Latency (Cycles/word): It is a time delay between the moment something is initiated, and the moment one of its effects begins or becomes detectable. Ideally this should be zero of as minimum as possible.

Acceptance Rate: Acceptance rate is defined as percentage of how many times masters

request for shared bus among how many times it request is granted and bus is allotted. Theoretically acceptance rate of every processor should be as high as possible.

Average Waiting Time: It is the average time for particular master in between request and

grant of the shared bus. Average waiting time for every processor should be as low as possible.

Average Bandwidth Utilization: It is measure of shared utilized by different masters. The bus should be ideally equally utilized by all masters.
Results

The discussed algorithms are designed using VHDL and simulation results are presented in figure no 6,7,8 & 9. [8&9] This scheme tested for performance parameter by using VHDL test bench and comparative results are presenting below.

While testing the scheme, the length of data is not Considered and thats why grant signal is consider only for one cycle. In the above analysisfour processor are taken into consideration. Label M0, M1 etc. indicate Master (Processor) 0, 1 and so on.

Fig:6 Comparative Graph for Average Latency (Cycle/ Word) .
1. Average Latency (Cycles/word)
  
  Average Latency (Cycles/word)of every processor Under different arbitration schemes.
2. Acceptance rate
  
  Acceptance rate of every processor under different arbitration schemes
  
  Fig:7 Comparative Graph for Acceptance Rate
3. Average waiting time
  
  Average waiting time (ps) of every processor under different arbitration schemes.
  
  Fig:8 Comparative Graph for Average waiting time
4. Average Bandwidth
Fig:9 Comparative Graph for Average utilization Bandwidth
Conclusion:

In this paper we have discussed some of the issues related to the design of SoC with regard to the inter processor communication Various bus architectures and protocols have been reviewed. Currently on-chip communication networks are mostly implemented using shared interconnects like buses. Shared bus communication architectures like AMBA, The designers should select the right arbitration technique to meet the requirements with improved performance for various shared bus architectures. Hence in the future research it is focused to design an arbiter that dynamically schedules the requests by various masters, occurring simultaneously and thus improving the performance of a multiprocessor with respect to latency and bandwidth. In this paper study of different performance parameters such as latency, bandwidth, acceptance rate and average writing time are presented.

REFERENCES

Hardwick shah,Rabbe,Knoll, Priority division: A high- speed shared-memory bus arbitration with bounded latency In 2011 EDAA.
Ahmed Amine Jerraya, Wayne Wolf, Multiprocessor Systems-on- Chips [M], Morgan Kaufmanns Publishers Inc. San Francisco, 2005, pp.1-18.
R. Ho, K. W. Mai, and M.A. Horowitz, The future of wires,

Proc IEEE, vol. 89, no. 4, pp. 490504, Apr. 2001.
Dinesh Padole, P.R.Bajaj, et.all, Dynamic Lottery Bus Arbiter for Shared Bus -System on-chip: A Design Approach with VHDLFirst international conference on Emerging Trends Engineering and Technology, 2008 IEEE
Shanthi R.Amutha Performance Analysis of On-Chip Communication Architecture in MPSoC In Proceedings of IEEE,ICETECT 2011.
Bu-chung Lin Geeng-Wei Lee, Juninn DarHuang and Jing-Yang Jou, A Precise Bandwidth Control Arbitration Algorithm for Hard Real Time SOC Buses, DAC 2007, pages 165-170.
Prakash Srinivasan ,Ali Ahmadinia Ahmet ,T Erdogan Tughrul ArslanPower Evaluation of the Arbitration policy for different On- Chip Bus base SoC platform ,IEEE SOC Conference, Taiwan, Volume, Issue, 26-29 Sept. 2007 Pages:159.
Dinesh Padole,Deepsheekha,Dr.Preeti Bajaj, Fuzzy Logic Arbiter for Shared Bus Multiprocessor System. : A Design Approach Firs International Conference on Emerging Trends in Engineering and Technology 2008 IEEE.
Dinesh Padole et.all, Design and Performance analysis of efficient bu arbitration schemes for on-chip shared bus Multiprocessor SoC,International Journal IJCSNS,Sept. 08,vol.8,no.9 pp.250-255.
Shanthi, R.Amutha, Design Approach to Implementation Of Arbitration Algorithm In Shared Bus Architectures (MPSoC) MPSoC In CEIS,Vol2,No.4, 2011.
S.Hemachitra and P.T.Vanathi (2008), Design and Analysis of Dynamically Configurable Bus Arbiters for Socs, Trans. On ICGST ,vol 8, Issue 1,Dec 2008.

[12 ] Dr.Preeti Bajaj and Dinesh Padole (2011) , Arbitration schemes for multiprocessor Shared B New Trends and Developments in Automotive Engineering ,INTECH Publisher, Jan 2011.

Ahmed Amine Jerraya, Wayne Wolf and Grant Martin, Multiprocesso System-on-Chip (MPSoC) Technology, IEEE Trans. On Compute Aided Design of Integrated Circuits and Systems,vol.27.no.10,Oct2008.
Kanchan Warathe, Padole and Bajaj , A Design Approach to AMBA Bus Architecture With Dynamic Lottery Arbiter,2009 IEEE.
Anurag Shrivastava,,G.S.Tomar,Singh,Design and Implementation of High Performance AHB Reconfigurable Arbiter for on chip Bus Architecture in IEEE,CSNT 2011.

KLahiri,A.RaghunathanG,Lakshminarayana,LOTTERYBUS : A new high-performance communication architecture for system-on- chip designs in Proc.Design Automation Conf. 2001. pp 15-20

Different Arbitration Techniques for On- Chip(AMBA) Shared BusMulti-Processor SoC

Leave a Reply