Survey of Fault Tolerant Techniques for Heterogeneous Mobile Distributed System

Simpy Kataria; Tejinder Thind

doi:10.17577/IJERTV3IS041277

Volume 03, Issue 04 (April 2014)

Survey of Fault Tolerant Techniques for Heterogeneous Mobile Distributed System

DOI : 10.17577/IJERTV3IS041277

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 64
Total Downloads : 471
Authors : Simpy Kataria, Tejinder Thind
Paper ID : IJERTV3IS041277
Volume & Issue : Volume 03, Issue 04 (April 2014)
Published (First Online): 24-04-2014
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Survey of Fault Tolerant Techniques for Heterogeneous Mobile Distributed System

Simpy Kataria Mtech (CSE) student, Lovely Professional, University,India

Tejinder Thind

A.P, HOL(CSE/IT) and COD-Placements Lovely Professional University,

Jalandhar – Delhi G.T. Road (NH-1), Phagwara, Punjab (India) – 144411

Abstract: Fault tolerance, is a major issue faced by the system as if system fails then whole processing and execution of an application stops. Fault tolerance can be defined as ability of the system to respond to an unexpected hardware or software failure [1].The fault detection in distributed system is achieved to identify the faulty entities at different levels of operations as a fault tree of the entire network. Various fault tolerance techniques have been designed provide reliable execution of agents even in face of failures that may occur on account of various errors that emerge during migration request failure, communication exceptions, system crashes or security violations. In this paper surveyof various methods is done used for fault tolerance in mobile heterogeneous system and concluded which method works better. A new and innovative approach is also described to enhance the ability of the system to be fault tolerant.

Keywords- Heterogeneous Mobile distributed System (MDS), distributed computing system, Mobile host (MH), and Mobile support station (Mss.).

NTRODUCTION

Adistributed computer system is measured to be a collection of programmable processors, having private memory, interconnected by communication links and

communication cost between two processors is presumedto be negligible [3].A distributed system is a system with multiple components attached to multiple computers but appearing as a single computer system to the user. The components participating in distributed system are attached to each other by local networks and may or may not be physically connected to each other. Heterogeneous distributed systems (HDSs) are made of different resources possessing different capabilities and are connected with each other with arbitrary networks which help to meet different user requirements. A heterogeneous MDS is described as a system where some of the processes are running on the mobile hosts. There are certain issues and challenges which are still in need to be covered in the context of heterogeneous mobile distributed system such as: resource sharing, openness, concurrency, scalability, fault tolerance, transparency [2].

Fig 1. Heterogeneous Mobile distributed system

SURVEY OF VARIOUS FAULT TOLERANCE METHODS

Many methods exit for fault tolerance in distributed mobile heterogeneous system wecan classifythembased upon the agent- dependentsystem, Cooperating witness agents, check pointing, antecedent graph approachesand load sharing techniques.

VARIOUS TECHNIQUES	DESCRITPTION OFTECHNIQUE
1.Cooperating witness agent method: Using this method a new agent is created known as witness agent fault tolerance which keeps an eye on the actual working agent to check it is alive or not. Log information in stable storage is used for fault recovery.	In the paper [4] dependency of witness age on each other is reduced by making the length of witness agent equal to or less to number of servers which helps to increase the reliability of the mobile agent. It works in three stateslevel0, level1, and level2. Compared with level0 reliability of level1, reduces by growth of the servers, because as the agent (in server Si) is waiting for recovery of server Si+1, there is probability is that failure can occur again on the server where the agent resides, but in level0 is without fault tolerance, as the result, there is not any repair of required server. In level2, number of witness agent that are is equal to number of servers thus reliability of mobile agent in all of the servers is 1, means number minimum of witness agent is equal to number of servers thus increasing reliability and reducing fault.
2. Check pointing scheme:Mobile computing acmes many issues as lower throughput and latency, low bandwidth, shortcoming of stable storage on mobile hosts, connection loss battery life where conventional check pointing protocols incompatible to overcome the situation .So new check pointing ,methods have been developed:	=MHm=MSS.), thus reducing blocking, reduce communication messages and less check points. achieve reliability making system free from fault.
3. Antecedence Graph based Check pointing and Recovery: The graph based fault tolerance protocols are effectively used for the execution of fault tolerant distributed system.	In this paper [7] antecedence graphs and message logs are used to maintain fault tolerance information of mobile agents.Checkpointing method is used in parallel with the maintenance of antecedent graph for dependent agents or agents who are executing the process. Whenever failure arrives the information from the graph and log based check pointing is seen to continue from stable stage again.Simulation results indicate that check pointing done with gathering only dependent agents and with use of antecedent graph results in better execution time and low check pointing time. And in case of failure the recovery time is condensed considerably by use of minimum number of messages.
4. Lightweight Fault-tolerance Mechanism for Distributed Mobile Agent-based Monitoring: Number of mobile agent-based monitoring mechanisms is established to monitor large scale distributed systems.	This paper [8] uses this monitoring method. Firstly; it helps to detect failure as fast as possible with low overhead by domain manager, by transmitting the messages to its instant higher level manager. Secondly it helps to minimize the number of non-faulty managers those get affected due to failure of managers. Thirdly, it permits steady failure detection actions which should be performed nonstop in circumstances of agent creation, termination and migration which is able to implement reliable takeover actions in case of failures of managers.

A low overhead checkpoint method [5] describes reduces the above described factors. A global checkpoint is tried to achieved by (Message total=2*(m-1) + ndep+ nopt, ndep=5% of n, m<<n, nopt = 80% of ndep, n
Another algorithm describes the use of 2PC protocol [6] to reach a stable termination state. The 2PC works in two stages: voting and decision making phase .Where in the voting phase the coordinator sends message to participant to globally commit or abort(i.e. failure) and on other hand in the decision phase all participants reply back and coordinate works according to global decision thus reaching a global termination state(commit or abort).After detecting failure with 2PC a check pointing algorithm is used to reach a global state which uses a marker to coordinate with different originating messages, all incoming channels are saved until a marker is received along that channel. This method helps to

5. Modeling by groups for faults tolerance based on Multi agent systems: The motive of this method is to build faults tolerant Ad hoc networks to improve data availability. It is mainly composed of four stages, namely: clustering, decision, replication by prediction, and consistency	The approach [9] is used to forecasts a problem and delivers conclusions in respect to serious nodes. Firstly, an algorithm is proposed to model by groups in wireless network Ad hoc. Secondly, it studies the fault tolerance by prediction of discontinuation and partition in network. Approach is provided which distributes efficiently the information in the network by selecting some objects of the network to be duplicates of information.
6.Using Host Criticalities for Fault Tolerance	In this [10] new approach is described to make a more reliable mobile agent system. . They proposed an approach to introduce fault tolerance in multi agent system through check pointing based on updating of weights from time to time while calculating the dependence of hosts. From experimental results it can be safely inferred that the proposed monitoring technique for multi agent distributed application may effectively increase system's fault tolerance beside effective recognition of vulnerabilities in system.
7.Load sharing approach for fault tolerance	The problem related to maximizing of reliability of mobile HDS where random node can fail permanently is tried to be solved in this paper. Load sharing [11] policies used for handling the nodes failure. The approach has two-phase hybrid approach where candidate node is selected on the requirements of the user, capability of the computing device of communication link. In second-phase load sharing are used to handle the execution of tasks. In case of failure before executing the tasks assigned to any node, the remaining tasks are shifted to the next more reliable and cost effective node. The whole procedure is repeated again and again until overall task is completed or cost effective nodes are found

CONCLUSION

The real time distributed systems like grid, robotics, nuclear air traffic control systems etc. are highly responsible on deadline. Reviewing all the techniques used for fault tolerance in MDS it can be concluded that most if the methods are agent based but in case the same agent becomes faulty it becomes difficult to overcome failure. As to achieve reliability we can enhance the tolerance factor by dynamically allocating the task of the failure node to other nodes which are active at present time.So,this is my future work or proposed work which I will try to achieve.

REFERENCES:

SarmisthaNeogyWTMR A new Fault Tolerance Technique for Wireless and MobileComputing SystemsIEEE2007
Andrew S. Tanenbaum and Maarten Van Steen, Distributed Systems: Principles and Paradigms, Pearson Prentice Hall, 2nd Edition 2007.
Sol M. Shatz, Senior Member, IEEE, Jia-Ping Wang, Member, IEEE, and Masanori Goto-Task Allocation or Maximizing Reliabilityof Distributed Computer Systems
SarehBeheshti Ali MovagharFault Tolerance in Mobile Agent Systems by Cooperating the Witness AgentsIEEE,2006
Suparna Biswas (Saha) ,SarmisthaNeogyA Low Overhead Checkpointing Schemefor Mobile Computing SystemsIEEE,2007
TejinderThind(Research Scholar, RachitGarg(Associate Professor, School of Computer,Uminder Kaur(Astt. Prof., Department of Computer Sc. & Application), Dinesh Kumar (Assistant Professor, School of Computer Applications),Paradigms in Fault Tolerant Check pointing Protocols in Distributed Mobile Systems, 2012 IEEE.
Ramandeep Kaur, Rama Krishna ChallaRajwinder Singh Antecedence Graph based Checkpointing and Recovery for Mobile Agents.IEEE 2010S
JinhoAhn, Lightweight Fault-tolerance Mechanism for Distributed Mobile Agent-based Monitoring IEEE, 2008
AsmaInsafDjebbar, GhalemBelalem , Modeling by groups for faults tolerance based on multi agent systems, IEEE,2009
Rajwinder Singh, Mayank Dave, Using Host Criticalities for Fault Tolerance in Mobile Agent Systems, 2nd IEEE, 2012
Vinod Kumar Yadav, MahendraPratapYadavandDharmendra Kumar Yadav, Reliable Task Allocation in Heterogeneous Distributed System with Random Node Failure: Load Sharing Approach, International Conference of Computing Science, 2012

Survey of Fault Tolerant Techniques for Heterogeneous Mobile Distributed System

Leave a Reply