Thursday, August 2, 2012

Distributed Databases - Round 2 - MongoDB

By:  J.B., Software Engineer @ G2

MongoDB

Overview

MongoDB is a schema-less, document oriented, distributed database. Documents are built upon key-value pairs and incorporate a variety of data types, primarily dictionaries, arrays, lists, strings, and documents. Embedding data structures inside such documents alleviates the need for SQL joins, increasing performance. MongoDB’s document structure is based upon JSON and binary JSON (BSON) to allow for quick and easy use. It was designed with high performance, availability, and horizontal scaling in mind. MongoDB was developed by 10gen in 2007 and is being used by MTV Networks, Craigslist, and Foursquare.

Features

Schema-less
A major selling point for MongoDB is its schemaless document based design. These documents are based upon a Key Value pair that looks like JSON. Users are able to create documents that have specific fields in certain cases and omit them when it’s not necessary in other documents. For example: one could create a “blog” document that contains the properties title, author, date, and blog post. In another “blog” document one could add a “comments” field, a tag field, or both.
Join-less
Since this is a NoSQL database, there are no joins, which increases performance. Mongo is able to index on keys from embedded documents or data structures. Mongo is built with fault tolerance in mind by having highly replicated servers with automatic failover occurring when a node goes down.
Master-Slave Relationship
Mongo has a master-slave relationship with the distributed servers. The master is able to perform writes/reads. All the slaves across the shards will read from the master and will be used for reads by the clients. There is a feature called auto-sharding that allows for data partitioning across the data servers. There are backups of the shards preserving the data. Using the auto-sharding method MongoDB is able to scale horizontally by adding new servers.
MapReduce
Another major feature is the ability to perform batch processing such as MapReduce. It is a built in feature in which all the user has to do is create a mapper and reducer process. There is an incremental MapReduce feature if a given dataset is continually growing and prevents aggregating over the entire dataset every time.

Limitations

Access Control
A major limitation in MongoDB is that it has no security built into it. There is no notion of permission or role directly inherent in the database. Developers intentially left security to the application level.
Auto-sharding
When new servers are implemented the auto-sharing feature automatically kicks in and can slow down processing.
Document Size
Depending on what is being saved the file limitations could be a limiting factor with 4/16 MB document sizes.

Use Cases

Rapid Development
MongoDB would be excellent for rapid agile development. It would be beneficial for modifying a dynamic type of input where specific fields would be needed in specific cases. MongoDB was built to allow users to focus more on the application itself and not the database.

No comments:

Post a Comment