Performance Balance and Idle Time Utilization in Hadoop MapReduce Framework

DOI : 10.17577/IJERTV4IS060359

Download Full-Text PDF Cite this Publication

Text Only Version

Performance Balance and Idle Time Utilization in Hadoop MapReduce Framework

Sankarsh O. A Mr. Divakar K. M,

M tech, 4th Sem, Dept. of CS&E Assistant Professor, Dept. of CS&E, SJCIT. SJCIT.

Chickaballapur, India Chickaballapur, India

AbstractEnormous information is all over. MapReduce is one of the procedures utilized by Big information. MapReduce is a parallel and one of the famous processing procedures used to process and register extensive scale information. MapReduce uses space based framework to build execution and use asset to finish execution of undertakings. To build execution and assets distribution MapReduce system utilizes three unique ideas like Dynamic Hadoop Slot Allocation which helps in preconfiguration of spaces to assignments for speedier execution. These procedure helps in using spaces by both guide assignments and lessen undertakings. The other system called Speculative Execution Performance Balancing is utilized to enhance execution by recognizing the stragglers in the procedures. This aides in enhancing the group proficiency and in addition execution tradeoff for both single and different executions of employments in an entirety. The other method called postponement booking would be utilized to enhance information region at the expense of reasonableness. Contrasting and Hadoop information region ought to be enhanced with ease, so we add to a system called Slot PreScheduling. This Slot PreScheduling enhances information area without effect on reasonableness. At long last utilizing all these three strategies we frame a regulated assignment framework for execution of errands and in a roundabout way enhance use of unmoving time spaces. At long last, by consolidating all procedures together, we frame a regulated opening designation framework, that can enhance the execution of MapReduce employments and uses the unmoving time significantly for quicker execution of undertakings be using whatever number assets as would be prudent.

Indexing termsMapReduce, Hadoop Fair Scheduler, Slot PreScheduling, Delay Scheduler, Dynamic Slot Allocation.


As of late, Data has become more the ever envisioned. After cloud its currently huge information that has turn into a pattern in innovation. Keeping up and handling these immense measures of information is turning into a gigantic undertaking to registering world. Huge information utilizes two numerous systems to do these work in a more proficient and legitimate way. These two strategies are Hadoop and MapReduce. MapReduce has turn out to be all that much imperative and well known superior registering strategy to process a lot of information in bunches and server farms. Hadoop is a huge scale, open source programming system utilized by Yahoo!, Facebook, Google, IBM, Twitter and so forth to deal with gigantic measures of information to process from numerous clients.

Numerous studies with respect to MapReduce, there are sure key difficulties for the use and execution change of a Hadoop group. Firstly, the figure assets a MapReduce work execution has two one of a kind elements: 1) the space assignment requirement suspicion that guide spaces must be apportioned to guide errands and diminish openings must be allotted to lessen assignments, and 2) the general execution imperative that guide undertakings are executed before decrease assignments. Because of these we have two perceptions: (I). There are fundamentally distinctive execution and framework usage for a MapReduce workload under diverse space designs, and (II). Indeed, even under the ideal guide/diminish opening design, there can be numerous unmoving lessen/guide spaces, which unfavorably influences the framework usage and execution.

Moreover, due to unavoidable run-time question and diverse resources, there can be straggled guide or reduce assignments, which cause tremendous delay of the whole occupation [1]. Thirdly, data region development is critical for opening use efficiency and execution change of MapReduce workloads.To address the aforementioned difficulties, we show a dynamic space assignment structure to enhance the execution of a group by means of advancing opening use. In particular, Dynamic opening assignment concentrates on Hadoop Fair Scheduler. This is on the grounds that the group use and execution for employments under HFS are much poorer (or genuine) than that under FIFO scheduler. In any case, it merits saying that our procedure can be utilized for FIFO scheduler also. This comprises of three streamlining methods, to be specific, Dynamic Hadoop Slot Allocation (DHSA), Speculative Execution Performance Balancing (SEPB) and Slot PreScheduling from diverse key angles:

Dynamic Hadoop Slot Allocation (DHSA). As opposed to YARN which proposes another asset model of "holder" that both guide and decrease undertakings can keep running on, DHSA keeps the space based asset model. The thought for DHSA is to break the presumption of space assignment imperative to permit that:

  1. Openings are nonexclusive and can be utilized by either guide or lessen undertakings, albeit there is a preconfiguration for the quantity of guide and decrease spaces.

  2. Guide undertakings will like to utilize guide openings and similarly lessen assignments want to utilize decrease slots.

Speculative Execution Performance Balancing (SEPB) is an essential method to address the issue of moderate running errand's impact on a solitary occupations execution time by running a reinforcement assignment on another machine. In any case, it takes a stab at the expense of bunch productivity for the entire employments because of its asset rivalry with other running assignments. We propose a dynamic opening distribution strategy called Speculative Execution Performance Balancing (SEPB) for the theoretical undertaking which can adjust the execution tradeoff between a solitary employments and a cluster of occupations execution time by deciding rapidly when the time it now, time to calendar and dispense openings for speculative undertakings.

Space PreScheduling. Postponement booking has been indicated as compelling methodology for the information region change in MapReduce [6]. It accomplishes better information area by postponing spaces for employments where there are no right now nearby errands accessible. In any case, it is at the expense of reasonableness. In perspective of this, we propose an option method named Slot PreScheduling enhance the information territory with no effect on reasonableness. It is accomplished to the detriment of burden harmony between slave hubs. By watching that there are frequently some unmoving spaces which can't be allotted because of the heap adjusting compel amid runtime, we can preallocate these openings of the hub to occupations to augment the information territory.


The main contributions of this paper are summarized as follows:

  • Propose a Dynamic Hadoop Slot Allocation (DHSA) technique to maximize the slot utilization for Hadoop.

  • Propose a Speculative Execution Performance Balancing (SEPB) technique to balance the performance tradeoff between a single job and a batch of jobs.

  • Propose a PreScheduling technique to improve the data locality at the expense of load balance across nodes, which has no negative influence on fairness.

  • Develop a system which would combine these three techniques in Hadoop.

The rest of the paper is organized as follows. Section 2 introduces our Dynamic framework, starting with an overview and then the details on the three techniques namely, DHSA, SEPB, Slot PeScheduling. Section 3 reviews related work. Finally, Section 4 concludes the paper and gives the future work.


We enhance the execution of a MapReduce bunch by means of upgrading the opening usage principally from two viewpoints. Initially, we can arrange the openings into two sorts, in particular, occupied spaces and unmoving spaces. Second, it is significant that not every occupied space can be productively used. Especially, we distinguish two principle influencing variables: (1). Theoretical undertakings (2). Information territory. Taking into account these, a dynamic use improvement system for MapReduce, to enhance the execution of a common Hadoop group under a reasonable planning between clients.

Figure 1 gives an outline of Dynamic structure. It comprises of three opening designation methods, i.e., Dynamic Hadoop Slot Allocation (DHSA), Speculative Execution Performance Balancing (SEPB), and Slot PreScheduling.

Every method considers the execution change from diverse perspectives. DHSA endeavors to amplify opening usage while keeping up the decency, when there are pending errands. SEPB recognizes the opening asset in- proficiency issue for a Hadoop bunch, brought on by theoretical undertakings. It takes a shot at top of the Hadoop theoretical scheduler to adjust the execution of occupations. Opening PreScheduling enhances the space use effectiveness and execution by enhancing the information region for guide assignments while keeping the reasonableness. It preschedules undertakings when there are pending guide assignments with information on that hub.

By consolidating the three procedures, it empowers Dynamic structure to upgrade the usage and execution of a Hadoop group generously with the accompanying regulated procedures:

  1. At whatever point there is an unmoving space accessible, Dynamic Technique will first endeavor to enhance the opening usage with DHSA. It chooses rapidly whether to distribute it or not, subject to the various compels.

    Figure 1: Overview of Dynamic.

  2. If the allocation is true, will further optimize the performance by improving the efficiency of slot utilization with SEPB. Since the speculative execution can improve the performance of a single job but at the expense of cluster efficiency, SEPB acts as an efficiency balancer between a single job and a batch of jobs. It works on top of Hadoop

    speculative scheduler to determine dynamically whether allocating the idle slot to the pending task or speculative task.

  3. When to allocate the idle slots for pending/speculative map tasks, Dynamic technique will be able to further improve the slot utilization efficiency from the data locality optimization aspect with Slot PreScheduling.

Moreover, we want to mention that the three techniques are at different levels, applied together or individually. The detailed description for each technique is given as follows.

    1. A Dynamic Hadoop Slot Allocation (DHSA)

      Current outline of MapReduce experiences an under-use of the separate openings as the quantity of guide and decrease errands shifts over the long run, bringing about events where the quantity of spaces dispensed for guide/diminish is littler than the quantity of guide/lessen undertakings. Our dynamic space designation approach is in light of the perception that at diverse time of time there may be sit guide/diminish spaces, as the employment continues from guide stage to lessen stage. We can utilize the unused guide spaces for those over-burden lessen undertakings to enhance the execution of the MapReduce workload, and the other way around.

      Then again, there are two difficulties that ought to be considered as takes after:

      (C1). Reasonableness is an imperative metric in Hadoop Fair Scheduler (HFS). We say it is reasonable when the sum total of what pools have been designated with the same measure of assets. In HFS, errand spaces are initially allotted over the pools, and afterward the openings are distributed to the occupations inside of the pool. Besides, a MapReduce work calculation comprises of two sections: guide stage assignment processing and diminish stage errand reckoning.

      (C2). The asset necessities between the guide space and decrease opening are for the most part distinctive. This is on account of the guide assignment and decrease undertaking frequently display totally diverse execution designs. Decrease undertaking has a tendency to expend substantially more assets, for example, memory and system transfer speed. Basically permitting decrease assignments to utilize guide spaces obliges arranging every guide space to take more assets, which will hence lessen the successful number of openings on every hub, bringing on asset under-used amid runtime. Subsequently, a cautious configuration of element assignment approach is essential and should have been mindful of such contrast.

    2. Speculative Execution Performance Balancing (SEPB)

      MapReduce work's execution time is extremely touchy to moderate running undertakings. There are different reasons that cause stragglers, including defective equipment and programming mis configuration. We order

      the stragglers into two sorts, in particular, Hard Straggler and Soft Straggler, characterized as takes after:

      • Hard Straggler: An errand that goes into a stop status because of the unending sitting tight for specific assets. It can't stop and complete unless we murder it.

        Soft Straggler: An assignment that can finish its calculation effectively, however will take any longer time than the basic assignments.

        For the hard straggler, we ought to slaughter it and run another equal assignment, or called a reinforcement errand, instantly once it was identified. Conversely, there are two conceivable outcomes between the delicate straggler and its reinforcement undertaking:

        (S1). Delicate straggler finishes first or the same as its reinforcement assignment. For this case, there is no compelling reason to run a reinforcement errand.

        (S2). Delicate straggler completes later than the reinforcement undertaking. We ought to murder it and run a reinforcement undertaking instantly.

        To manage the straggler issue, theoretical execution is utilized as a part of Hadoop. As opposed to diagnosing and altering straggling undertakings, it distinguishes the straggling errand alertly utilizing heuristic calculations, for example, LATE. Once distinguished, on the other hand, it can't just slaughter the straggler instantly because of the accompanying actualities:

      • Hadoop does not have a component to recognize the hard straggler and the delicate straggler;

      • Moreover, for the delicate straggler, its additionally hard to judge whether it has a place with (S1) or (S2) before running. Just slaughtering the straggler will hurt the instance of (S1).

        Rather, it brings forth a reinforcement undertaking and permits it to run simultaneously with the straggler, i.e., there is a reckoning cover between the straggler and the reinforcement assignment. The errand murdering operation will happens when both of the two undertakings finished. Speculative undertakings are not free, i.e., they seek certain assets, including guide openings, decrease spaces and system, with other running assignments of occupations, which could have a negative effect for the execution of a group of employments. Consequently, it emerges a test issue for speculative assignments on the best way to relieve its negative effect for the execution of a cluster of employments.

        To boost the execution for a group of employments, an instinctive arrangement is that, given accessible assignment spaces, we ought to fulfill pending undertakings first before considering theoretical assignments. That is, the point at which a hub has an unmoving guide space, we ought to pick pending guide undertakings first before searching for speculative guide assignments for a group of occupations. Also, review that in our dynamc scheduler, the guide opening is no more limited to guide undertaking, it can serve diminish assignment. It implies that, for a vacant guide space, we

        ought to consider picking assignments in the accompanying request: pending guide errand, pending decrease assignment, theoretical guide undertaking, and speculative lessen undertaking on the off chance that we need to further expand the execution for a group of occupations.

    3. Slot PreScheduling

Keeping the undertaking processing at the registering hub with the neighborhood information (i.e., Data region) is a productive and critical way to deal with enhance the effectiveness of space use and execution. Delay Scheduler has been proposed to enhance the information region in MapReduce by [8]. It defers the booking for an occupation by a little measure of time, when it identifies there are no nearby guide undertakings from that occupation on a hub where its information live. Nonetheless, it is to the detriment of decency. There is a tradeoff between the information area and decency streamlining with deferral scheduler. It implies that, in HFS, deferral booking is insufficient and there is still streamlining space for information area change.


There is a large body of research work on the performance optimization for MapReduce jobs. We summarize and categorize the closely related work to ours as follows.

  • Scheduling and Resource Allocation Optimization.

    There are some reckoning booking and asset designation improvement work for MapReduce employments consider occupation requesting advancement for MapReduce workloads. They show the MapReduce as a two-stage cross breed stream shop with multiprocessor assignments, where diverse occupation accommodation requests will bring about fluctuated group use and framework execution. On the other hand, there is a suspicion that the execution time for guide and decrease undertakings for every employment ought to be known ahead of time, which may not be accessible in some genuine applications. Additionally, it is suitable for free occupations, yet neglects to consider those employments with reliance, e.g., MapReduce work process. In examination, our DHSA is not compelled by such suspicion and can be utilized for any sorts of MapReduce workloads.

    Hadoop arrangement streamlining is another methodology. Case in point, Starfish [3] is a self tuning structure that can change the Hadoop setup naturally for a MapReduce employment such that the usage of Hadoop bunch can be amplified in view of the expense based model and examining system. Then again, even under an ideal Hadoop setup, e.g., Hadoop guide/decrease space design, there is still space for execution change of a MapReduce employment or workload, by expanding the usage of guide and diminish openings.

  • Speculative Execution Optimization.

    Theoretical execution is a vital errand planning system in MapReduce for managing straggler issue for a solitary occupation, including LATE [5], BASE [8], Mantri [1], MCP [8]. Longest Approximate Time to End (LATE) [5] is a theoretical execution calculation that spotlights on heterogeneous situations by organizing assignments to guess, selecting quick hubs to keep running on, and topping speculative errands. Guo et al. [8] further enhance the execution for LATE by proposing a Benefit Aware Speculative Execution (BASE) calculation that can assess the potential advantage of speculative undertakings and wipe out pointless runs. Mantri [1] gave a theoretical execution procedure that concentrates all the more on sparing group processing asset, i.e., assignment spaces, by checking errands and separating exceptions in view of their reasons. Chen et al. [8] proposed another theoretical execution calculation called Maximum Cost Performance (MCP) to conquer the issues that influence the execution for past speculative execution techniques, e.g., information skew, assignment that begin nonconcurrently, disgraceful setup of stage rate. On the other hand, it merits specifying that every single theoretical execution specified above are not free. They take a stab at the expense of group proficiency, which could have a negative effect for the execution of a clump of occupations. We consequently proposed SEPB to adjust the execution tradeoff between a solitary occupation and a cluster of occupations for every single speculative execution.

  • Data Locality Optimization.

Information territory enhancement has been demonstrated to be a basic system for the execution and proficiency change of the group use by past work[6]. For MapReduce, there are guide side and diminish side information area. The guide side information area streamlining considers moving the guide assignments reckoning near to the data information pieces. Case in point, when there are bunches of little size occupations in a situation, Delay Scheduler can enhance the information area by postponing the booking of guide errands whose information region can't be fulfilled for a brief time of time, to the detriment of decency [6].

Opening PreScheduling has a place with the guide side information area enhancement. Rather than Delay Scheduler, Slot PreScheduling, as its integral part, considers an alternate situation that there are neighborhood guide assignments for a vocation on a hub, yet no permissible unmoving guide openings accessible on that hub because of the heap adjusting oblige. It preschedules nearby guide errands utilizing additional unmoving spaces to expand the information territory while keeping up the reasonableness. Imperatively, in correlation to those guide side improvement routines previously stated, we contend that both Delay Scheduler and Slot PreScheduling are straightforward, non specific and much compelling for decency and information region amplification.


This paper proposes a Dynamic structure expecting to enhance the execution of MapReduce workloads while keeping up the reasonableness. It comprises of three methods, in particular, DHSA, SEPB, Slot PreScheduling, all of which concentrate on the space usage streamlining for MapReduce group from alternate points of view. DHSA concentrates on the opening use boost by allotting guide/decrease openings to guide and diminish undertakings alertly. Especially, it doesn't have any supposition or require any former learning and can be utilized for any sorts of MapReduce jobs.In complexity to DHSA, SEPB and Slot PreScheduling consider the productivity enhancement for a given space use. SEPB recognizes the space wastefulness issue of theoretical execution. It can adjust the execution tradeoff between a solitary employment and a cluster of occupations alertly. Opening PreScheduling enhances the effectiveness of space use by amplifying its information area.


  1. G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu,

    B. Saha, and E. Harris, Reining in the outliers in map-reduce clusters using 5antra, in OSDI10, pp. 1-16, 2010.

  2. Apache Hadoop NextGen MapReduce (YARN). yarn/hadoop-yarnsite/YARN.html.

  3. Hadoop.


  5. M. Zaharia, A. Konwinski , A.D. Joseph , R. Katz , I. Stoica, Improving MapReduce performance in heterogeneous environments. In OSDI08, pp.29-42, 2008.

  6. M. Zaharia, D. Borthakur, J. Sarma, K. Elmeleegy,S. Schenker,I. Stoica, Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling. In EuroSys10, pp. 265-278, 2010.

  7. J. Chao, R. Buyya. MapReduce Programming Model for .NET- Based Cloud Computing. In Euro-Par09, pp. 417-428, 2009.

  8. Z.H. Guo, G. Fox, M. Zhou, Y. Ruan.Improving Resource Utilization in MapReduce. In IEEE Cluster12. pp. 402-410, 2012.

  9. Max-Min Fairness (Wikipedia). min fairness.

Leave a Reply