Analysis and Comparison of NOSQL's Key-Value & Document-Oriented Databases

Tushar Seth; Bhawna Minocha

doi:10.17577/IJERTV3IS040926

Volume 03, Issue 04 (April 2014)

Analysis and Comparison of NOSQL’s Key-Value & Document-Oriented Databases

DOI : 10.17577/IJERTV3IS040926

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 925
Total Downloads : 680
Authors : Tushar Seth, Bhawna Minocha
Paper ID : IJERTV3IS040926
Volume & Issue : Volume 03, Issue 04 (April 2014)
Published (First Online): 17-04-2014
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Analysis and Comparison of NOSQL’s Key-Value & Document-Oriented Databases

Tushar Seth

Amity Institute of Information Technology Amity University

Noida, India

Bhawna Minocha (Guide)

Amity Institute of Information Technology Amity University Noida, India

Abstract From many years, Relational Databases are being used for organizing data in form of tables or relations. With the increasing amount of data in 4 Vs i.e. Volume, Variety, Velocity, Veracity, a new database being introduced named NOSQL (Not Only SQL). Recently NOSQL databases are widely used in many big companies for storing and managing data. NOSQL databases also overcome few drawbacks of SQL. NOSQL databases are more optimized and flexible in comparison to SQL. In this paper, we address the advantage and disadvantage of two broadly used techniques of NOSQL databases used for storage and also compare on their characteristics of the two techniques, they are Key Value & Documentoriented databases.

We compare and discuss two techniques- key-value store & Document-oriented databases on different parameters. Each NOSQL database technique has its own advantage and their use. We also discuss and compare few differences between RDMBS and NoSQL databases.

Keywords- NOSQL, Big data, Key-value databases, Document oriented databases, RDBMS, CAP theorem, ACID, BASE.

INTRODUCTION

The term NoSQL was first used by Carlo Strozzi in 1998 to name his fast, portable, lightweight shell based -Open source relational database [1]. Carlo used to pronounce it as noseequel. It was first introduced for Unix Operating System by Carlo and latter was developed by Walter.W.Hobbs references in [1]. NoSQL is often interpreted as Not SQL but it should be interpreted as

Not Only SQL. NoSQL is a new database which provides a method for storage and retrieval of data in modelled structure like key-value store, document-oriented store, graph, tree etc. It does not use the very old, robust and widely used model Relational Databases. NoSQL becomes popular because of its ability to store large amount of data (Big data), horizontal scaling and simplicity in design structure. It is one of the growing fields in big data and being widely used in distributed and real-time web applications [6].

Many Big Companies are dependent on NoSQL databases like GOOGLE uses BIGTABLE; Amazon uses Amazon

Dynamo [2], Memcached at Facebook, Zynga and Twitter[21] [3], Foursquare uses MongoDB, Mozilla and Adobe uses HBASE, Linkedin uses Voldemort etc [5]. NoSQL databases have scalable architecture so it can efficiently scale up to many machines easily with minimum effort. NoSQL is schema-less i.e. it supports Dynamic Schemas .Suppose in future you want to change the length or datatype of column, or add new column then we dont need to change whole table structure instead we are able to store with the new structure without affecting the previous data structure. NoSQL also supports feature like auto- sharding which is one of the most important and useful feature. SQL database generally use to scale its data vertically i.e. single server is used to store database and serve to its application while NoSQL uses horizontal scaling i.e. adding more servers to store data other than storing in single server. Auto-sharding feature enables data to distribute its data in different servers and NoSQL database can automatically manage its data [7]. Adding more servers is an advantage is to increase the capacity and the performance of write and read operations [13]. It is even possible to reduce the size of a sharded database cluster when the demands decrease. Automatic Replication of data i.e. sharing of data between nodes or system are also be supported by many NoSQL databases. So it provides highly availability of data as well as disaster recovery management [11]. This is very useful to increase read performance of the database, because it allows a load balancer to distribute all read operations over many machines. It is also very advantageous if one machine fails, then there is at least another machine with the same data which can replace the lost node.

In this research paper we are going to discuss briefly about the differences between RDBMS with NoSQL and the two mostly used NoSQL techniques i.e. Key-value Store & Document-oriented Store.

Google Trends Showing NoSQL Interest Over a Period of Time [8]
Fig-1: Google Trend Showing Interest over time
RDBMS VS NOSQL

ACID

Fig 3: ACID vs BASE [20]

Due to the growing field of big data and analytics has introduced a new database NoSQL which is quite useful in distributed and cloud based web applications and beats RDBMS in few areas like fixed schema, JOIN operations , scalability , fast read operation , performance etc[19]. Apart from these advantages NoSQL lacks in providing ACID (Atomicity, Consistency, Isolation and Durability) type transaction support. Berkeleys computer scientists suggested a theorem for distributed environment which is known as CAP theorem or

Brewers theorem. He said we needed 3 requirements i.e. Consistency, Availability, and Partition Tolerance but we can only fulfil two of the three requirements [18]. Mostly NoSQL databases face consistency problem to achieve scalability. As in cloud based applications mostly read operation is needed to perform. Instead of satisfying ACID properties it uses BASE approach (Basically Available, Soft-state and eventually consistent) [16][17]. So it means choose any two requirements and do something to manage the third to satisfy all three [14][15].

Fig 2: CAP Theorem [19]

ACID

BASE

consistency

commit

transaction

difficult like

Schema
- Strong
- Weakly Consistency
- Availability first
- Best effort
- Approximate answer ok
- Optimistic
- Simpler and faster
- Easy evolution
- Isolation
- Focus on
- Nested
- Availability
- Pessimistic
- Evolution
Fig 3 : ACID vs BASE [20]
Key-value and document oriented databases comparison

Key-value databases are used to store as schema-less data [9]. It stores data in form of associative arrays of entries Associative array is sometimes called as a map or symbol table or dictionary and sometimes also referred to as hash tables [10]. Associative array is a collection of (key, value) pairs string or the data is stored in any primitive datatype of any programming language or it is in the form of object. It is called as associative array as associative means binding of data i.e. key and value together. Binding is also known as creation of new association and other operation includes insert, remove, search, reassign. Most popular key value

Key-Value Store		Keyspace, Flare, SchemaFree, RAMCloud
Key-Value Store Eventually consistent	–	Dynamo,Voldemort, Dynomite, SubRecord, MotionDB,DovtailDB
Data-structures server		Redis
Key-Value Store Ordered		Actord, Lightcloud, NMDB, TokyoTyrant,Luxio, MemcacheDB	Scalaris,
Tuple Store		Apache River,Coord,Gigaspaces
Object Database		ZopeDB, db4o, shoal
Document store		CouchDB, MongoDB,MarkLogic, Jackrabbit, XML Databases, ThruDB, CloudKit, Perservere, Riak, Scalaris
Wide Columnar Store		BigTable, HBase, Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI

stores are Amazons Dynamo, MemcachedDB, Project Voldemort, Redis etc.

Document-Oriented databases are purposely designed for storing data in form of documents. Document oriented databases uses an encoding schemes or file formats like XML, YAML, JSON (JavaScript object Notation), BSON (Binary JSON) etc [11]. Document oriented databases are generally represented by a unique key where key can be a URI, path, or a string [11]. The idea behind designing this database is to add or delete any column from a particular row without creating dummy empty fields and this implies that we can add any number of column fields of any length [12]. Document oriented database also provides API or query language support to easily retrieval of documents.

Data model	Matching Databases
Key-value Cache	Memcached, Hazelcast, Repcached, Coherence, Infinispan, EXtreme Scale, Jboss Cache, Velocity,Terracoqa

Stephen Yen in its blog says NoSQL is a Horseless Carriage and suggested a classification on NoSQLs data model [22][23].Here we see the classification by Stephen Yen.

Table 1: classification on data models [22][23]

Comparing Key-value and Document-oriented databases-

Database Name	MongoDB	CouchDB	Voldemort	Riak	MemcacheDB	DynamoDB
Developer	10gen	Apache software Foundation	Linkedin	Basho Technologies	Steve Chu	Amazon
Initial Release	2009	2005	2009	2010	2008	–
Major Users	Craiglist,Foursquar e, shutterfly, Intuit	Lotsofwords.com	LinkedIn	Mozilla, Comcast, AOL	Facebook	Amazon
Storage type	Document	Document	Key-value	Key-value	Key-value	Key-value
Licence	AGPL, Open Source	Apache, Open Source	Apache, Open Source	Apache, Open Source, Proprietary	BSD, Open Source	Proprietary
Implementation Language	C++	Erlang	Java	Erlang	C, Python	Java
Best Use	Dynamic queries, frequently written, rarely read statistical data	Accumulating, occasionally changing data with predefined queries	–	High Availability	Small pieces of data with many concurrent connections. Transient data	large to big db solution
Key Points	Retains some properties of SQL such as query and index	DB consistency, easy to use	Data automatically replicated & partitioned to multiple servers	A truly fault- tolerant system, Riak has no single point of	Emphasis on persistence	–

				failure
Concurrency Control	Locks	MVCC(Multi version Concurrency control)	MVCC(Multi version Concurrency control)	–	–	ACID
Replication	Async	Async	Async	Async	–	Sync
Deployment Model (On Premise)	Yes	yes	No	Yes	No	No
Platform	Mac OS X, Windows, Linux	Mac OS X, Windows, Linux	Mac OS X, Windows, Linux	Mac OS X, Unix, Linux	Mac OS X, Windows, Linux	Linux
Data Storage 1- BDB 2 Disk Plug-in RAM	2	2	1,4	3	–	2
Characteristics 1-Consistency 2 – High – Availability 3 -Partition Tolerance 4-Persistence	1,3,4	2,3,4	2,3,4	2,3,4	1,3	1,2
Persistence Design	B-tree	Append only B- tree	Pluggable(primary BDB Mysql)	–	No	No
Query API	Cursor	Map/reduce views	Get/put	Nested hashes	–	–

Table2- Comparison between Key-value and Document-oriented databases [24][25][21]

REFERENCES

CONCLUSION:

Section 2 analyse, describes and compare about RDBMS & NoSQLs databases. Section 3 contains information about key-value databases and document oriented databases. And their comparison. After analysing and discussion we conclude that, though NoSQL is an emerging, fast, highly scalable database which is used by big companies but still it has some drawbacks or we can say that it need some shortcomings like transactional support which stops capturing market from large section like banking sector which depends fully on transaction support. As NoSQL main objective is to achieve high availability and less on consistency and Relational databases still good for handling relations and consistency requirement. Although NoSQL was designed to have achieve zero-admin solution but still it requires skills to install and manage such databases. There are millions of RDBMS experts available but if we talk about NoSQL currently every developer is in learning stage. So finding experts for support in this field of this database is little difficult.

http://www.strozzi.it/cgi- bin/CSA/tw7/I/en_US/nosql/Home%20Page
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and

W. Vogels, Dynamo: Amazons Highly Available Key-Value Store, in Proceedings of 21st ACM SIGOPS Symposium on Operating Systems

Principles. New York, NY, USA: ACM, 2007, pp. 205220.
B. Fitzpatrick, Distributed caching with Memcached, Linux Journal, vol. 2004, no. 124, p. 5, August 2004.
J. Petrovic, Using Memcached for Data Distribution in Industrial Environment, in Proceedings of the 3rd International Conference on Systems. Washington, DC, USA: IEEE Computer Society, 2008, pp. 368372.
Mateusz Berezecki Facebook mateuszb@fb.com, Eitan Frachtenberg Facebook etc@fb.com , Mike Paleczny Fcebook mpal@fb.com and Kenneth Steele Tilera ken@tilera.com Many-Core Key-Value Store http://gigaom2.files.wordpress.com/2011/07/facebook-tilera- whitepaper.pdf
http://en.wikipedia.org/wiki/NoSQL
http://www.tutorialindustry.com/nosql-tutorial-for-beginners
http://www.google.com/trends/explore#q=NOSQL&cmpt=q
Marc Seeger (21 September 2009) Key value Store:a practical overview http://blog.marc- seeger.de/2009/09/21/key-value-store-a-practical-overview/ Marc Seeger. Retrieved 1 January 2012
http://en.wikipedia.org/wiki/Key-value_store
http://en.wikipedia.org/wiki/Document-oriented_database
Lith, Adam; Jakob Mattson (2010). "Investigating storage solutions for large data: A comparison of well performing and scalable data storage solutions for real time extraction and batch insertion of data" (PDF). GÃ¶teborg: Department of Computer Science and Engineering, Chalmers University of Technology. p. 70. Retrieved 12 May 2011
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.184. 483&rep=rep1&type=pdf Analysis and Classication of

NoSQL Databases and Evaluation of their Ability to Replace an Object-relational Persistence Layer author: Kai Orend
Brewer, E. A. 2000. Towards robust distributed systems (abstract).In Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing (Portland, Oregon, United States, July 16 – 19, 2000). PODC 00. ACM, New York, NY, 7. DOI= http://doi.acm.org/10.1145/343477.343502
Gilbert, S. and Lynch, N. 2002. Brewers conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33, 2 (Jun. 2002), 51-59. DOI=http://doi.acm.org/10.1145/564585.564601
Pritchett, D. 2008. BASE: An Acid Alternative.Queue 6, 3 (May. 2008), 48-55.

DOI=http://doi.acm.org/10.1145/1394127.1394128
Introduction to store data in Redis, a persistent and fast key- value database Matti Paksula ,Department of Computer Science, University of Helsinki http://www.cs.petrsu.ru/fdpw/2010/article/paksula.pdf
http://en.wikipedia.org/wiki/CAP_theorem
Joshi_Graph Visualization Using the NoSQL Database http://library.ndsu.edu/repository/handle/10365/22972
http://www.yeeach.com/post/583ACID vs BASE
http://en.wikipedia.org/wiki/Memcached
http://en.wikipedia.org/wiki/NoSQL
A Yes for NoSQL Taxonomy. High Scalability (2009-11-05). Retrieved on 2013-09-18.
http://nosql.findthebest.com/
http://www.christof-strauch.de/nosqldbs.pdf


ACID	BASE

consistency


commit

transaction



difficult like
Schema

Analysis and Comparison of NOSQL’s Key-Value & Document-Oriented Databases

Leave a Reply