Friday, 17 January 2014

MongoDB vs Cassandra: Difference and Similarities between MongoDB and Cassandra

MongoDB vs Cassandra: Difference and Similarities between MongoDB and Cassandra

MongoDB and Cassandra have occupied a very big market of NoSQL Schema-Free databases. So, before going to choose any one of them, it is logical to compare basic features of both MongoDB and Cassandra. MongoDB and Cassandra both are fully-featured NoSQL databases and choosing one of them heavily depends upon your application requirements. There are a lot of similarities and differences between MongoDB and Cassandra. Following is the basic comparison of MongoDB and Cassandra in terms of data storage model, usage etc.

Differences between MongoDB and Cassandra

1. MongoDB has a document-oriented data model while Cassandra has column-oriented data model and storage type. 

MongoDB acts much like a relational database. Its data model consists of a database at the top level, then collections which are like tables in MySQL (for example) and then documents which are contained within the collection, like rows in MySQL. Each document has a field and a value where this is similar to columns and values in MySQL. Fields can be simple key / value e.g. { 'name': 'David Mytton' } but they can also contain other documents e.g. { 'name': { 'first' : David, 'last' : 'Mytton' } }.

In Cassandra documents are known as “columns” which are really just a single key and value. e.g. { 'key': 'name', 'value': 'David Mytton' }. There’s also a timestamp field which is for internal replication and consistency. The value can be a single value but can also contain another “column”. These columns then exist within column families which order data based on a specific value in the columns, referenced by a key. At the top level there is a keyspace, which is similar to the MongoDB database.

2. MongoDB is developed by MongoDB, Inc while Cassandra is a product of Apache Software Foundation. The original authors of MongoDB are core contributors to the code and work for 10gen (indeed, 10gen was founded specifically to support MongoDB and the CEO and CTO are the original creators). In contrast, Cassandra was created by 2 engineers from Facebook and is incubated by the Apache Foundation. MongoDB was initially released in 2009 while Cassandra was released in 2008.

3. MongoDB is implemented in C++ while Cassandra is implemented in Java.

4. MongoDB supports Linux, OS X, Solaris and Windows server operating systems while Cassandra supports BSD, Linux, OS X and Windows server operating systems.

5. MongoDB supports more programming languages than Cassandra. MongoDB supports Actionscript, C, C#, C++, Clojure, ColdFusion, D, Dart, Delphi, Erlang, Go, Groovy, Haskell, Java, JavaScript, Lisp, Lua, MatLab, Perl, PHP, PowerShell, Prolog, Python, R, Ruby, Scala and Smalltalk. Cassandra supports C#, C++, Clojure, Erlang, Go, Haskell, Java, JavaScript, Perl, PHP, Python, Ruby and Scala. 

6. Handling of indexes is different in MongoDB and Cassandra

MongoDB indexes work very similar to relational databases. You create single or compound indexes on the collection level and every document inserted into that collection has those fields indexed. Querying by index is extremely fast so long as you have all your indexes in memory.

Prior to Cassandra 0.7 it was essentially a key/value store so if you want to query by the contents of a key (i.e the value) then you need to create a separate column which references the other columns i.e. you create your own indexes. This changed in Cassandra 0.7 which allowed secondary indexes on column values, but only through the column families mechanism. Cassandra requires a lot more meta data for indexes and requires secondary indexes if you want to do range queries.

7. Handling of replication is different in MongoDB and Cassandra

In MongoDB replication is achieved through replica sets. This is an enhanced master/slave model where you have a set of nodes where one is the master. Data is replicated to all nodes so that if the master fails, another member will take over. There are configuration options to determine which nodes have priority and you can set options like sync delay to have nodes lag behind (for disaster recovery, for example).

Writes in MongoDB are “unsafe” by default; data isn’t written right away by default so it’s possible that a write operation could return success but be lost if the server fails before the data is flushed to disk. This is how Mongo attains high performance. If you need increased durability then you can specify a safe write which will guarantee the data is written to disk before returning. Further, you can require that the data also be successfully written to n replication slaves.

MongoDB drivers also support the ability to read from slaves. This can be done on a connection, database, collection or even query level and the drivers handle sending the right queries to the right slaves, but there is no guarantee of consistency (unless you are using the option to write to all slaves before returning). In contrast Cassandra queries go to every node and the most up to date column is returned (based on the timestamp value).

Cassandra has much more advanced support for replication by being aware of the network topology. The server can be set to use a specific consistency level to ensure that queries are replicated locally, or to remote data centres. This means you can let Cassandra handle redundancy across nodes where it is aware of which rack and data centre those nodes are on. Cassandra can also monitor nodes and route queries away from “slow” responding nodes.

The only disadvantage with Cassandra is that these settings are done on a node level with configuration files whereas MongoDB allows very granular ad-hoc control down the query level through driver options which can be called in code at run time.

8. MongoDB provides a custom map/reduce implementation while Cassandra provides native Hadoop support, including for Hive (a SQL data warehouse built on Hadoop map/reduce) and Pig (a Hadoop-specific analysis language that many think is a better fit for map/reduce workloads than SQL).

9. MongoDB supports server-side scripting while Cassandra does not.

10. Major users of MongoDB are Craigslist, Foursquare, Shutterfly, Intuit while Facebook, Twitter and Digg heavily use Cassandra.

11. MongoDB is commercial while Cassandra is free.

Similarities between MongoDB and Cassandra

1. Both MongoDB(AGPL) and Cassandra(Apache 2.0 license) are open-source databases.
2. Both MongoDB and Cassandra are NoSQL schema-free databases.
3. Both MongoDB and Cassandra support Sharding.
4. Both MongoDB and Cassandra support concurrency. Concurrency is supported by using Locks in MongoDB while concurrency in Cassandra is achieved by MVCC.
5. Both MongoDB and Cassandra support Eventual and Immediate Consistency. 
6. None supports Foreign keys, Transaction concepts, Triggers etc.

2 comments:

  1. Darrell Burgan19 January 2014 04:36

    Your Cassandra information seems to be based on a much older version. The old column-store view of data in Cassandra is still available, but they are pushing the CQL-level view of data very hard. It would not surprise me if the old Thrift API is deprecated some day.

    When you use CQL, Cassandra is a much different data model. You have tables, which are composed of columns, each of which can have a certain type, each which can optionally have cardinalities higher than 1 (e.g. list, set, map). You can index columns as well, and Cassandra will transparently maintain the index for you, yielding very high performance.

    This brings Cassandra and MongoDB closer to each other in terms of how the developer relates to the data. A key difference remains that MongoDB is 'schema-less', where every row is a JSON/BSON document that might have different fields; whereas Cassandra has a formal schema for each table, such that each row has the same field definitions. This, to me, is the key difference from the developer's point of view.

    As you correctly note, Cassandra is much better suited for highly distributed applications due to its tunable replication engine. It was built from the ground up to be a shared-nothing data engine. MongoDB, by contrast, is better suited for applications that need a dynamic schema-less approach. Both have their relative strengths.

    ReplyDelete
  2. Much obliged About Your Perfect Post & Contain In Your Website.really amazing post and I truly appreciated the insign you carry to the subject, unbelievable stuff.thanks and continue offering..
    FIN 402 Final Exam Guide

    ReplyDelete