Innovative Fault Tolerance Mechanism for Cloud Storage using Network Coding

DOI : 10.17577/IJERTCONV3IS14035

Download Full-Text PDF Cite this Publication

Text Only Version

Innovative Fault Tolerance Mechanism for Cloud Storage using Network Coding

Hemanth Kumar A

MTECH IV semester, CSE Sai vidya institute of technology

Bangalore, India

Sreelatha P K

Assistant Professor, Department of CSE Sai vidya institute of technology Bangalore, India

Abstract Cloud storage is service model for data storage and it provides remote back up and data synchronization facilities. The cloud storage providers are responsible for keeping the data available and accessible. However, Storing data in a cloud raises concerns such as cloud outage, vendor-lock in. The solution for this problem is multi cloud storage .A proxy/gateway sever is deployed between users and multiple clouds and file is striped across multiple clouds to provide fault tolerance and redundancy. A gateway/proxy is built on the concept of network coding and new type of regenerating codes called FMSR (functional minimum storage regenerating codes) codes are applied to the storage nodes. FMSR codes are similar to erasure codes providing double fault tolerant mechanism. However, FMSR codes are simple, availability of data is high and it is also called as minimum storage regenerating codes since the amount of space occupied by data is very less. Integrity check is done by generating MAC for uploaded file and MAC for downloaded file. Iterative repair operation is performed to regenerate the lost data blocks.

KeywordsRegenerating codes, network coding, fault tolerance, repair

  1. INTRODUCTION

    In recent years, cloud storage is playing an important role in backing up the data for growing and emerging information technology. One of the main advantages of cloud is the flexibility it provides us for sharing and editing our files. However, storing our data in a cloud raises concerns such as cloud outage, where the cloud may suffer from failure resulting in the loss of precious data and vendor-lock ins, is a situation in which a customer using a current service cannot buy the services provided by other service providers. There are many real life examples where failure of cloud can be seen. Some of them are shown in the following table.

    TABLE I

    Examples of Transient failures in different cloud services

    storage efficiency of replication is high. Therefore, it is very expensive. Transient and permanent failures are two kinds of failures that are observed. Transient failures for clouds are said to recover in hours or it may take a take a day to recover the lost data but, permanent failures are one where the data stored is permanently unavailable. To overcome the problem associated with replication method, a new type of erasure codes emerged. However, erasure codes suffer from availability of data and its a complication method since it involves typical mathematical calculation such as Galois (2x) field equations and it is single fault tolerant

    A new type of code called regenerating codes [5] are emerged which maintains the same storage size as erasure codes but, it is assumed to be double fault tolerant and the availability of data is very high. FMSR codes built on the concept of network coding and it preserves the benefits of network coding, stripes the data across multiple clouds, activates repair operation whenever a cloud experience a failure. FMSR codes are very beneficial for long term archival of data. Since it involves rare read and write operation. Integrity data is done by using a simple hash algorithm. The figure 1 shows the normal and repair operation.

    Fig 1. a) Normal operation b) Repair operation

    Cloud service

    Failure reason

    Duration

    Date

    Google Gmail

    Software bug

    4 days

    Feb 27-mar 2, 2011

    Google search

    Programming error

    45 minutes

    Jan 31,

    2009

    Amazon

    Gossip protocol blow-up

    6-8 hours

    July 20,

    2008

    Cloud service

    Failure reason

    Duration

    Date

    Google Gmail

    Software bug

    4 days

    Feb 27-mar 2, 2011

    Google search

    Programming error

    45 minutes

    Jan 31,

    2009

    Amazon

    Gossip protocol blow-up

    6-8 hours

    July 20,

    2008

  2. BACK GROUND

    1. Replication

      Replication [4] is one of the straight forward methods to provide fault tolerance and to maintain redundancy. The

      Replication is one of the simplest and commonly used techniques to maintain redundancy. The availability and communication efficiency of replication is high in replication.

      However, it suffers from storage efficiency of data since cloud storage providers are charging more for inbound data too. To achieve double fault two replicas are needed. Therefore, replication seems to be the most expensive.

    2. Erasure coding

      Erasure coding [2] scheme approach maintains redundancy of data across multiple cloud storage nodes. Erasure codes based on reed Solomon [3] codes stripes data across multiple clouds and activate repair operation to recover the lost data. However, erasure codes are very complex to implement since it involves Galois (2x) field of mathematical calculation upon two cloud failure and Figure 2 shows the erasure coding scheme.

      Fig.2 Traditional erasure code based on reed Solomon

  3. CONTRIBUTION OF THE PAPER

    The design of FMSR code is shown, which is double fault tolerant .Also, it has the same storage cost as traditional erasure codes based on reed Solomon codes. FMSR code is also called as minimum storage regenerating codes, since it uses very less storage space compared to replication and therefore it uses minimum bandwidth. It is built on the concept of network coding and preserves the benefits of network coding [1] [6] by eliminating the requirements of storage nodes performing encoding operation and it is performed by proxy server only. FMSR codes are mainly designed for long term archival applications, such as back up data are rarely read in. FMSR codes can offer enterprises and organization to store data redundantly and providing fault tolerant. The below figure 3 shows the FMSR architecture.

    Fig 3. FMSR Code

    Original file is divided into four native chunks A, B, C, D and generate different XOR combination called code chunks. It is shown in the figure that the data in the node one is regenerated by doing XOR operation of data in node two with the data in the node three and the lost data is restored.

  4. PROPOSED METHODOLOGY

    The proxy-based storage system designed for providing fault-tolerant storage over multiple cloud storage providers. Proxy/gateway server can interconnect different clouds and transparently stripe data across the clouds. On top of proxy, the first implementable design for the functional minimum- storage regenerating (FMSR) codes.Our FMSR code implementation maintains double-fault tolerance and has the same storage cost as in traditional erasure coding schemes, but offers double fault tolerance and performs single fault recovery during a single-cloud failure. FMSR codes can also find applications in general distributed storage systems where storage nodes are prone to failures and network transmission bandwidth is limited. It also offershigh availability of data The figure 4 shows the functional architecture of the system.

    Fig. 4 Functional Architecture

    The proxy acts as an interface between the user and interconnects multiple clouds

    The basic operations involved are as follows

      1. file upload

        To upload a file F, it is first divided into four equal size native chunks and it is encoded to form code chunks by proxy/gateway server, generate MAC (message authentication codes) for four native chunks. Each node consists of two blocks. Now, there are eight data blocks.

      2. file download

        The file to be downloaded is selected and proxy/gateway server checks the status of clouds, if the clouds are active, all the four data blocks are downloaded and MAC is generated and compared with the previously generated MACs ensuring integrity of data.

      3. Iterative repairs

    Iterative repair operation is performed whenever a cloud failure is occurred. The repair operation usually retrieves data blocks from the surviving clouds and regenerates the lost data by performing simple XOR operation with surviving blocks. All the operation are performed by the proxy server.

  5. CONCLUSION

A proxy-based,multiple-cloud storage system that practically addresses the reliability of todays cloud backup storage. Proxy/gateway server not only provides fault tolerance in storage, but also allows cost-effective repair when a cloud permanently fails. Proxy implements a practical version of the FMSR codes, which regenerates new parity chunks during repair subject to the required degree of data redundancy. The FMSR code implementation eliminates the encoding requirement of storage nodes (or cloud) during repair, while ensuring that the new set of stored chunks after each round of repair preserves the required fault tolerance. FMSR codes eliminate complexities involved in Galois field theory. The proxy/gateway server prototype shows the effectiveness of FMSR codes in the cloud backup usage, in terms of monetary costs and response times.

REFERENCES

  1. A.G. Demakis, K. Ramchandran, Y. Wu, and C.Suh, A Survey on Network Codes for Distributed Storage, Proc. IEEE, vol. 99, no. 3,

    pp. 476-489, Mar. 2011

  2. Erasure coding and cloud storage storage Eternity, wikibon 2011

  3. J.S. Plank, A Tutorial on Reed-Solomon Coding for Fault-

    Tolerance in RAID-Like Systems, SoftwarePractice & Experience, vol. 27, no. 9, pp. 995-1012, Sept. 1997

  4. H. Weatherspoon and J. D. Kubiatowicz, Erasure coding vs.replication: A quantitative comparison, in IPTPS, 2002

  5. A. Duminuco and E. Biersack.A practical study of regenerating codes for peer-to-peer backup systems.In Distributed Computing Systems, 2009. ICDCS'09.29th IEEE International Conference on, pages 376-

    384. IEEE, 2009.

  6. A.G. Dimakis, P.B. Godfrey, Y. Wu, M.J. Wainwright, and K.Ramchandran.Network coding for distributed storage systems.Information Theory,IEEE Transactions on, 56(9):4539- 4551,2010.

Leave a Reply