Wednesday, August 1, 2012

Databases, DHT's, and Data (Oh My!)

By:  B.Y., Software Engineer @ G2

Recently, or not so recently, a co-worker and I worked on a grant to explore the different distributed databases that were available. We ended up choosing four that we found which seemed to be most prominent. Now that our research has ended it came to our attention that others might benefit from this knowledge as well. With that we thought it best to release it as a series of blogs posts; one a week featuring an overview of said database. Given that we're starting off with Redis...

Redis

Overview

Redis is an in-memory key-value data store that supports a wide variety of atomic instructions. It operates on a single CPU and allows for scalability increases through multiple instances. Each instance will spin up on a separate CPU, thus allowing scalability through sharding across multiple cores and physical machines. The creator of Redis is a man of Italian descent named Salvatore Sanfilippo with all support funding coming from VMware. Redis is currently being used by StackOverflow, Github, Blizzard, and more.

Features

EXPIRE and TTL

The EXPIRE command sets the time to live (TTL) for a given key. The TTL command allows for an immediate check of the time left on a given key. This ‘age-off’ capability adds immediate value when discussing short term, or short lived, values that are queried often.

Data Structures as Values

Redis allows for abstract data structures to be put into keys, namely lists (L*), sets (S*), and sorted sets (Z*). These values can be operated on (addition, union, etc.) incrementally within the data store.

Persistence

Redis can have its data serialized to disk periodically to allow for backups and fault tolerance. This feature spins up a separate thread on the CPU to allow for a minimal hit to the performance of the actual data store and ensures that in unknown circumstances or power outages a Redis store can be restored into memory.

Limitations

Memory

Redis currently operates within the bounds of physical memory on a given system.  If more is needed it falls down to malloc() calls which, with no available memory left, will just return NULL and fail to insert. Further, Redis adds a set of metadata into each key-value pair which can add up to become a nontrivial number when dealing with small keys.

Use Cases

Cycling Data

Redis is very good at aging data. As seen with the EXPIRE and TTL commands any area where one needs a higher-level ‘cache’ region for rolling or cycling data that is frequently queried would be ideal.

Small / Frequent Data

Given its entire in-memory usage, and equal limitations, Redis makes a prime candidate for frequent and updating data, such as statistical primitives.

No comments:

Post a Comment