Peer-to-Peer File Sharing Systems

DOI : 10.17577/IJERTCONV7IS05002

Download Full-Text PDF Cite this Publication

Text Only Version

Peer-to-Peer File Sharing Systems

Jerin Jayaraj

Department of Computer Science and Engineering Federal Institute of Science And Technology (FISAT)® Mookkannoor, Angamaly

Paul Antony

Department of Computer Science and Engineering Federal Institute of Science And Technology (FISAT)® Mookkannoor, Angamaly

AbstractThe popularity of peer-to-peer systems has grown over these years. This is because compared to traditional sys- tems, peer-to-peer systems have more scalability, availability, security, anonymity, and performance.This survey paper explores the design of such systems by exploring different peer- to-peer architectures.

KeywordsPeer-to-peer file sharing, Farsite, Gnutella, Fast- Track, BitTorrent

  1. INTRODUCTION

    Traditional online file sharing systems are usually Client- server based [1]. These systems require numerous resources to set up and maintain and are usually not scalable, fault- tolerant, secure, etc. P2P systems offer a decentralized, self- sustained, scalable, fault tolerant and symmetric network of machines providing an effective balancing of storage and bandwidth resources [2].

    With advancements in technology efforts to create a stable, reliable, efficient central storage system has grown. Experience has shown that a distributed approach is better for achieving these goals [2]. Important features of these systems included a client-server architecture fundamental to their design, caching, replication and availability.

    In P2P systems, users exchange resources without an intermediator. The files are distributed among the member nodes instead of concentrated at a single server. These systems depend on the voluntary participation of peers to contribute resources out of which the infrastructure is constructed. The architecture of these systems should be able to organize their peers in such a way so they can cooperate to provide a useful service to the entire community of users [3].

    A pure peer-to-peer system is a distributed system without centralized control, where the software running at each node is equal in functionality. In this system,the total workload of the network is distributed among its peers, hence reducing the strain on each system. This survey paper attempts to explore the working of p2p systems by analysing the characteristics of some major distributed P2P file systems Gnutella, FastTrack, Farsite and BitTorrent.

    The rest of the paper is organized as follows: Section 2 ex- plores the design aspects and desired properties of distributed

    P2P file systems. Section 3 analyses and compares a few major P2P systems followed by the Conclusion in Section 5.

  2. DESIGN ASPECTS IN P2P FILE SYSTEMS Peer-to-Peer systems have basic properties that separate them from conventional distributed systems. The architecture might be different for different systems but they have common characteristics that affect its behaviour. This section discusses different design aspects of a P2P file system and the potential effect of the issues on performance.

    1. Operation with Unmanaged Volunteer Participants

      P2P systems work with the participation of voluntary nodes. The participation of a given node can neither be expected nor enforced. Each node is assumed to be prone to failure and removed from the system at any time. The system must be able to handle the removal or failure of nodes at any moment.

    2. Load Balancing

      P2P systems must be able to balance the load on the network. The system should have mechanisms that can make the optimal distribution of resources based on the capability and availability of node resources. The system must be able to prevent the creation of hotspot locations where the load is disproportionately high.

    3. Scalability

      Scalability is the capability of the network to handle a growing number of peers. Traditional distributed systems usually are not scalable beyond a few hundreds or thousands of nodes. Scalability determines how large the system can grow.

    4. Security

      We must design P2P systems to be secure against attacks and system failure. As these systems are made of unmanaged, geographically distributed hosts we must build security mea- sures into the system.

    5. Churn Protection

      Churn describes the oscillations in the P2P system caused by the rapid joining and leaving of nodes. Churn causes reduced performance in any distributed system. One form of a denial of service attack is to introduce churn in the system [2]. Hence, a P2P distributed file system should be able to resist the churn effect.

    6. Fast Resource Location

      As resources are distributed over the network, one of the important features of a p2p system is to locate resources. The efficiency of the method used to locate resources is the deciding factor in the performance of the system. These methods must be able to adapt to different topologies.

      One of the strategies used in several systems is Distributed Hash Table (DHT) [2]. It uses hashing of the file or resource names to locate the object.

    7. Decentralization

    P2P systems are decentralized by nature. We must design mechanisms supporting distributed storage, processing, infor- mation sharing, etc as part of the system. As the system is decentralized, system behaviour no longer remains determin- istic and getting a global view of the system becomes difficult.

  3. ANALYSIS OF EXISTING SYSTEMS

    Designing a P2P file system that can implement all the properties described in Section II is exceedingly difficult. In this section, we will analyse the working of p2p systems such as Gnutella, FastTrack, Farsite and BitTorrent and understand how they achieve these goals.

    1. Farsite

      Farsite stands for Federated, Available, and Reliable Storage for an Incompletely Trusted Environment [4]. Farsite was developed by the Microsoft research group in the early twenties. It is designed to act as a serverless distributed file system which logically acts as a centralized NTFS [5] file system but physically it is distributed among a network of untrusted desktop workstations. Farsite has practically no central administration and minimal administrative effort is requires initially to configure it. It is designed for the file- I/O workload of academic and corporate environment rather than high-performance I/O workload of scientific applications.

      1. Mechanism: Every machine in Farsite may perform three roles: It is a client, a member of a directory group, a file host. A client is a machine which directly interacts with the user. A file host is a system which hosts a particular file. Directory group is a group of machines which manages file information in a directory. Farsite uses a hierarchical namespace. It does not restrict the system to a single root but allows for multiple roots each of which can be regarded as the name of a virtual file server. When a client requests a file the directory groups which contains the file issue leases on files to the clients, granting them access to the files for a specified period, so a client with an active lease and a cached copy of a file can perform operations entirely locally. After the client had made changes, it delays pushing updates to the directory group because most file writes are deleted or overwritten shortly after they occur this helps reduce network traffic.

        The clients encrypt written file data with the public keys of all authorized readers to provide read-access control and the directory group enforces a write-access control by cryptographically validating requests from users bfore accepting updates.Farsite handles fault by replicated files in a Byzantine Fault Tolerance (BFT) manner. BFT allows the system to continue functioning correctly as long as less than a third of the machines it is running on are faulty [5].

      2. Advantages:

        • Provides security by using cryptography and only gives access to authorised users.

        • Designed to work as file I/O in academic and corporate environments.

        • Provides fault-tolerance using file replication.

        • Minimal administrative effort for initial configuration and almost no central administration to maintain.

        • Designed to be scalable up to 105 machines

      3. Disadvantages:

      • Farsite uses a lazy update scheme. The content of newly written files will briefly live on only one machine. Loss of that machine will result in loss of the update.

      • It cannot have over one-third of itssystems being mali- cious nodes.

      • It does not address rapid membership changes.

      • It is not designed for high bandwidth applications.

    2. Gnutella

      Gnutella was the first fully peer-to-peer (P2P) distributed system. The main aim in the making of Gnutella was to eliminate servers and use clients instead to search and retrieve messages. So the clients act as servers and thus the Gnutella clients are called as servents.

      1. Mechanism: In Gnutella peers communicate directly with each other. Peers store files and peer pointers. These peer pointers point to their neighbour peers in the network. The Gnutella system forms an overlay network on top of the internet. In order to search for a file, it routes messages within an overlay graph using five messages.

        1. Query

        2. Query Hit

        3. Ping

        4. Pong

        5. Push

        It floods the query message out to all its neighbouring peers except the one from which it has just received until the TTL (Time-To-Live) value becomes zero. So the query messages are TTL restricted, and they are forwarded only once to avoid duplicate transmission. When a peer finds it has a file matching the incoming keywords, it creates a query hit message which is routed in the reverse path. Hence, the querying peer knows which peers have the corresponding file. After receiving the query hit message, the requesting peer

        sends a GET HTTP request to the responders IP address and port number. Responder replies with an HTTP OK and sends the file packets after this message.

        If the responder is behind a firewall, it sends a push message via in the overlay and the push message is reverse routed along the reverse query hit path. When a responder gets a push message, then it can generate an outgoing TCP connection.

        Ping and Pong messages are used by the peers to update their neighbour list which is done periodically. When a peer receives a Ping message, it responds in reverse route using a Pong message [3].

      2. Advantages:

        • Transfer of data is available under most firewall systems.

        • Flexibility in Query processing.

      3. Disadvantages:

      • File request has a lifespan hence there is no guarantee that the file you want is on any of the machines your request reaches.

      • It takes time to get the complete response for files request queries.

    3. FastTrack

      FastTrack is a proprietary peer-to-peer (P2P) protocol that was used by Kazaa, Grokster and iMesh [6]. Its a hybrid between Gnutella and Napster. In 2003, an open-source alternative, called OpenFT, was developed by the giFT project through reverse engineering.

      1. Mechanism: Active clients in the network are called supernodes and are used to store directory information listing (filename, peer pointer), similar to Napster servers. A peer can become a supernodeif it can gain enough reputation [2]. Reputation of a peer is based on its bandwidth and how long it has been in the network.

        To ensure the constant availability of the network, there is a need for dedicated peers that will monitor and keep track of the network, such peers are called bootstrapping nodes When a peer joins the network, it will first contact the bootstrapping node. The bootstrapping node will determine if the peer is a client or a supernode.

        Peer search by contacting a nearby supernode. When the computer sends out a request for the file, which is funnels through the supernode. The supernode communicates with other supernodes, which are connected to regular nodes that are connected to even more regular nodes. The search request will extend seven levels into the network before it stops propagating. Once the correct file has been located then the file is transferred directly from the file owner to the requester.

      2. Advantages:

        • Systems with less bandwidth can also search files faster as the supernode does the search.

        • No centralised server like in the case of Napster.

      3. Disadvantages:

      • Supernodes are points of vulnerabilities that can bring down the entire system.

    4. BitTorrent

    Considering p2p file-sharing systems in existence, BitTorrent is one of the few that has attracted millions of users. The basic idea here is to ignore search but focus on efficient fetch. It does this to handle flash crowds. BitTorrent provides incentives to peersso it helps them to take part and to share bandwidth. The files are split into blocks of the size typically 256 KB [8]. The peers will connect different peers and download the file from over one peer simultaneously.

    1. Mechanism: BitTorrent has a centralized software of trackers. These trackers coordinate the file transfer and keep track of the peers in the swarm. There is one tracker per file. Here peers are of 2 types: Seed – which has the whole file and Leecher which have someblock of the file. So when a new peer joins the network, it is also a leecher because it has no portion of that particular file. The peer that wants to download a file connects to the tracker, and the tracker returns a list of peers that have the file. To select which block is to be downloaded and from which peer, it uses the Local Rarest First block policy. This policy helps to download blocks which are least replicated among neighbours.

      Choking Algorithm is used to select which peer to contact for a particular piece of the file. Here BitTorrent uses a tit-for-tat strategy which means that upload to peers that are uploading to you [8]. This results in connections actively transferring in both directions. After a peer becomes a seeder, peer uploads to those which have higher upload rates. It is done to ensure that there will be more seeders in the future.

    2. Advantages:

      • Encourages peers to share resources, discourages freeloaders.

      • Can resume partially downloaded files.

    3. Disadvantages:

      • Works well for hot content, not so much for obscure content.

      • Single point of failure (tracker).

  4. CONCLUSION

P2P systems are a good alternative to traditional file transfer systems. Their popularity suggests that it is possible to design a system with the cooperation of multiple unrelated peers

TABLE I

COMPARISON OF DIFFERENT PEER-TO-PEER FILE SYSTEM

Location Scheme

Load Balancing

Scalability

Encryption

Adaptability

Anonymity

Stored

Read/Write

Farsite

Byzantine Groups

Yes

105hosts

Yes

Yes

No

File

Read/Write

FastTrack

Super Nodes

Yes

Yes/p>

Partial (MetaData)

Yes

Yes

Chunks

Read Only

BitTorrent

Global Components

Yes

Yes

No

Yes

Yes

Chunks

Read Only

Gnutella

Querying

Yes

Yes

No

N/A

No

File

Read Only

working together to achieve a common goal. Through our study, we were able to understand different techniques that can be used to design a P2P system. We were able to understand the merits and demerits of each design feature in different scenarios. For our future works, we are planning to implement a file sharing system which is designed using features of popular p2p systems.

ACKNOWLEDGMENT

We thank the Cognitive Computing Research Center (CCRC), FISAT for the constant support and guidance for our work. We would also like to show our gratitude to Mr Pankaj Kumar G., who always guided us in all the phases of our project.

REFERENCES

    1. Clientserver model, En.wikipedia.org, 2019. [Online]. Available: https://en.wikipedia.org/wiki/Client-server model.

    2. R. Hasan, Z. Anwar, W. Yurcik, L. Brumbaugh and R. Campbell, A survey of peer-to-peer storage techniques for distributed file systems, International Conference on Information Technology: Coding and Com- puting (ITCC05) – Volume II, Las Vegas, NV, 2005, pp. 205-213 Vol. 2. doi: 10.1109/ITCC.2005.42

    3. Stefan Saroiu, P. Krishna Gummadi, Steven D. Gribble, Measurement study of peer-to-peer file sharing systems, Proc. SPIE 4673, Multimedia Computing and Networking 2002;

    4. Atul Adya, William J. Bolosky, Miguel Castro, Gerald Cermak, Ron- nie Chaiken, John R. Douceur, Jon Howell, Jacob R. Lorch, Marvin Theimer, and Roger P. Wattenhofer. 2002. Farsite: federated, avail- able, and reliable storage for an incompletely trusted environment. SIGOPS Oper. Syst. Rev. 36, SI (December 2002), pp. 1-14. DOI: https://doi.org/10.1145/844128.844130

    5. William J. Bolosky, John R. Douceur, and Jon Howell. 2007. The Farsite project: a retrospective. SIGOPS Oper. Syst. Rev. 41, 2 (April 2007), pp. 17-26.

      DOI=http://dx.doi.org/10.1145/1243418.1243422

    6. Karagiannis, Thomas, Riverside Andre Broido, Nevil Brownlee, claffy Caida and Michalis Faloutsos. File-sharing in the Internet: A characterization of P2P traffic in the backbone. (2003). DOI = https://www.microsoft.com/en-us/research/wp- content/uploads/2016/02/tech.pdf

    7. Dongyu Qiu and R. Srikant. 2004. Modeling and performance analysis of BitTorrent-like peer-to-peer networks. In Proceedings of the 2004 conference on Applications, technologies, architectures, and protocols for computer communications (SIGCOMM 04). ACM, New York, NY, USA, 367-378. DOI: https://doi.org/10.1145/1015467.1015508

    8. ohan Pouwelse, Pawe Garbacki, Dick Epema, and Henk Sips. 2005. The bittorrent p2p file-sharing system: measurements and analysis. In Proceedings of the 4th international conference on Peer-to-Peer Systems (IPTPS05), Miguel Castro and Robbert Renesse (Eds.). Springer-Verlag, Berlin, Heidelberg, 205-216.

DOI=http://dx.doi.org/10.1007/1155898919

Leave a Reply