Category Archives: Databases

RDBMS Alternatives – Round 2

Introduction

From our initial list of N candidates, we narrowed the field down to three:

  • LightCloud
  • Cassandra
  • MongoDB

Please see the page on RDBMS Alternatives for a discussion of the other kv store candidates and why they were eliminated.

The Candidates

Light Cloud

Since Light Cloud is based on Tokyo Tyrant, it gets initially high marks -- prior to testing -- for scalability, performance, and concurrency. Likewise, we have reason to believe it will be strong on high availability. It also gets high marks for Python support since Light Cloud itself is written in Python. At this point, we're not sure how easy it will be to use nor what its back-up capabilities are. Although it is used by the micro-blogging site Plurk.com, we don't have the impression that it has a large, active user community behind it.

We see it as having the following advantages :

  • Fact that it's built on top of Tokyo Tyrant is a plus
  • It offers scripting in Lua
  • It appears to offer the benefits of Tokyo Tyrant without needing to implement on our own hashing layer

and risks:

Cassandra

Since Cassandra is a fully distributed database, we are optimistic about it scalability potential and its high availability. We don't yet know its performance characteristics nor its ability to handle many concurrent reads and writes. We know that Python bindings exist, but it appears harder to use than the other two candidates. It has an active user community.

Additionally, it has the advantage of  being by far the most "battle tested" of any of the candidates with the following risks:

  • Doesn't seem as easy to use
  • Performance and concurrency need to be tested
  • Still under heavy development, described as "not polished"

Mongo DB

Mongo is said to have good performance, but we have some concerns about scalability given that sharding is apparently an alpha feature. We're not sure of Mongo's support for high availability nor its ability to handle high concurrency. Python support appears strong and it appears easy to use. We're unsure of its back-up capabilities. The user community appears active and enthusiastic; there are not a lot of big production developments.

We see the following advantages:

  • More features than Light Cloud
  • Probably easier to use than Cassandra
  • Lots of enthusiasm around Mongo suggests that it will continue to be used
  • Very nice documentation

and risks:

  • No huge production deployments
  • Sharding is alpha
  • Performance needs to be validated

Testing the Candidates

In order to fill out our understanding of each of the above candidates, we devised a series of tests to try to put them through their paces. The testing we did was designed to be as thorough as possible while being done in a limited amount of time. The tests below were conducted by my colleague, Matt Ernst, and the following write-up is based on the results he compiled.

Basic Installation/Read-Write Test

The first test was just to see how easy it was to get the candidate up and running and working with Python to do basic reads and writes. To this end, we did the following:

  • Download the candidate software and Python bindings
  • Install the software and bindings
  • Get basic reads and writes working

Cassandra

  • It was very simple to download a copy of Cassandra 0.4.2 and start it with the default configuration file.
  • It took some effort to get the Lazyboy client working. There are several dependencies that need to be satisfied first (this was easier under Ubuntu than under a vm running CentOs 5).
  • Adding or changing keyspaces and column families requires restarting the server after altering a configuration file. It was not clear how or if we could add a new column family to meet new data needs without service downtime.

Lightcloud

  • It was straight forward to get LightCloud working, though a number of separate source packages had to be downloaded, compiled, and installed first.
  • Setting up the client was easy.
  • High-availability is available by default.

MongoDB

  • Mongo is available in a single prebuilt package.
  • The client can be added with easy_install.

Basic Performance Test

Next we did a basic performance test defined as follows:

  • Write a Python script that generates performance numbers for the following (I would suggest using N ~ 100,000, but feel free to vary that somewhat as needed):
    • Read/write N key/value pairs - keys are longs, values are integers
    • Read/write N key/value pairs - keys are strings<10 chars, values are integers
    • Read/write N key/value pairs - keys are longs, values are dicts (or dicts rendered as strings), values have byte-length ~ 100 bytes
    • Read/write N key/value pairs - keys are longs, values are dicts (or dicts rendered as strings), values have byte-length ~ 10000 bytes

Basic Test Results

Read and write speeds seem to be in competition; no candidate won for both reads and writes. MongoDB won on combined read and write times.

Concurrency Test

After that we did a test of concurrency:

  • Write a Python script that uses M (M < 100) clients to write to/read from the db.
  • Perform the tests in the basic performance test using these clients (N can be smaller if need be)

Concurrency Results

All of the solutions had somewhat reduced performance with increasing numbers of concurrent clients, like due to the Python client global interpreter lock. If a storage backend was not limited by I/O or CPU with a single client connection it was not used any more fully by concurrent connections.

Failover Test

Next we tested failover:

  • Research how each of the candidates handles failover.
  • For each candidate, configure the software for failover.
  • Using the basic performance test above, run the script and simulate the need to failover.
  • Validate that failover works for each candidate.

Failover Results

Cassandra

Configuring replication on Cassandra was straight forward The Lazyboy client uses available Cassandra instances on a round-robin basis. It does not, however, gracefully handle failure if an instance is dead. Adding transparent failure recovery to the client would require some development effort, which we did not do.

Lightcloud

Lightcloud comes configured for high-availability by default. LightCloud's HA works fine when Tokyo Tyrant instances are killed; when instances are stopped instead of killed, performance becomes agonizingly slow.

MongoDB

MongoDB requires some simple effort to set up replication. The tests worked fine when one replica or the other was killed. When a replica was stopped instead of killed, performance was terrible as with Lightcloud. This apparently means it is better for a server or software instance to crash completely than to remain alive but unresponsive.

Final Recommendations

After my colleague, Matt Ernst, finished testing the three candidates, he made the following recommendations:

MongoDB

Mongo is the standout among the 3 options tried in depth. Its speed is very good, especially on writes. The community is active, software releases come at a quick pace (a sign of health if not exactly convenient for maintenance), and the documentation is better than that of the other options. MongoDB offers a structured approach to storage that is much more attractive than a plain persistent key/value store. I think it will be a good fit for many kinds of data that Loomia uses.

Cassandra

Cassandra seems like it might be very powerful, scalable, and resilient. Improved documentation and Python clients would make it much more attractive. It appears to be designed with the most thought toward distributed operation from the beginning. The enormous scale of Facebook's use speaks to the soundness of the design. If I could wait another year for features and clients to be polished, and for someone to write an O'Reilly book about it, it might be the best choice. Right now it has too many rough edges, not enough cohesive documentation.

LightCloud

LightCloud seems to be nice enough software, but it doesn't have enough standout points to triumph over MongoDB and Cassandra. It is even lighter on documentation than Cassandra and iterating over data without knowing keys in advance (important for testing and debugging) is very slow. There is no command line shell available. If it had been available a few years ago it might have been a nice alternative to memcache so we could implement a "bottomless cache," but for a broad SQL alternative it seems slight.

The Winner

We thus opted for MongoDB. We have not yet deployed Mongo in production, but are hoping to do so sometime in the next month.

RDBMS Alternatives

Introduction

At Loomia we recently decided that for improved scalability, we need to move some of our data out of MySQL and into another form of persistent storage better suited for what is mostly key-value data.

To that end, we started researching various ‘no sql’ options. What follows is a discussion of how we went about our research and how we reached our decision. I found the articles listed in the references below to be valuable in our research; my hope is that this write-up may be of use to others.

Our approach was to draft a longish list of candidates for a first round of analysis and winnow that down to a short list of candidates for actual deployment and testing.

Although many of the solutions we looked at are more than key-value stores, for simplicity sake I refer to them all that way (or sometimes as ‘kv stores’). And apologies in advance to anyone working on these various projects if I've misrepresented anything about them -- please let me know and I will make the appropriate corrections.

References

RDBMS alternatives are an active topic of discussion in the technology world at the moment and listed below are a number of helpful articles written on the subject. I am very grateful to the authors for their insights:

Requirements

We defined the following requirements for our candidates:

  • Scalablility
    • The kv store needs to be able to handle arbitrarily large amounts of data, which means that we need to be able to scale writes across multiple machines.
  • Performance
    • The read performance needs to be excellent (since most of the queries will be cache misses, so a caching layer won't help much)
    • The write performance needs to be good
  • Concurrency
    • The kv store needs to be readable and writable by many clients at once
  • High Availablity
    • The kv store needs to be highly available
  • Python friendliness
    • Python bindings for the kv store should exist
  • Back-up and recovery
    • There should be some facility for back-up and recovery of the data
  • Network accessibility
    • The kv store needs to be accessible over the network (e.g. Berkeley DB is not network accessible)
  • Ease of deployment
    • Ideally, the kv store would be fairly easy to deploy
  • Community
    • The kv store should have a substantial user community and should have been deployed in existing production applications

Round 1 Candidates

We immediately ruled out two solutions:

  • Redis
  • Scalaris

For Redis, the entire data set needs to fit in memory and scalaris is not persisted to disk (at least as of now).

We then settled on the following list of candidates:

Assessment of the Candidates

Light Cloud

  • Built on top of Tokyo Tyrant
  • Used by SNS plurk.com: http://opensource.plurk.com/LightCloud/
  • LightCloud is a distributed db -- it adds a universal hashing layer to Tokyo Tyrant
  • Discussion of LC here: http://news.ycombinator.com/item?id=498581
  • Supports scripting in Lua (which is supposedly extremely fast) - this is interesting because it might provide us a way to do auto increments
  • Conclusion: LightCloud is definitely worth looking at given that Tokyo Tyrant gets good reviews

Cassandra

Mongo DB

Tokyo Tyrant/Cabinet

  • Used by Mixi, which is the equivalent of Facebook in Japan
  • Good performance (1.5M writes/second)
  • Nice presentation here:  http://www.scribd.com/doc/12016121/Tokyo-Cabinet-and-Tokyo-Tyrant-Presentation
  • Replication strategy similar to MySQL
  • Supports compression
  • It's been suggested that it chokes @ about 70GB of data: http://news.ycombinator.com/item?id=841942
  • In order for Tokyo Tyrant to be a distributed database, we would need to put a consistent hashing layer in the application code that uses it (which is essentially what LightCloud does)
  • Scripting with Lua
  • Conclusion: given that TT is not distributed, we're passing on it in favor of its distributed offspring, LightCloud.

Couch DB

  • Written in Erlang
  • CouchDB is often compared to Mongo, but is said to have inferior performance
  • Conclusion: given that Couch is said to have performance is that is inferior to Mongo and given that Couch doesn't boast an install base that is larger than Mongo (it's not as though Couch is used by Facebook, for example), we decided to eliminate Couch.

MemcachedDB

  • Persistent key-value store that uses the memcache protocol
  • I believe it use's Berkeley DB so this might be a deal-breaker
  • MemcacheDB isn't a distributed database: you could treat it like a distributed database by implementing hashing at the app level, but as I understand it that's the only way to scale writes across multiple machines.
  • Performance is excellent: http://memcachedb.org/benchmark.html
  • Used by Digg: http://highscalability.com/blog/2009/2/14/scaling-digg-and-other-web-applications.html
  • Conclusion: the fact that Digg uses MemcacheDB is certainly a point in its favor, but at the same time they seem to be the only ones and I don't have the feeling that MemcacheDB has enough momentum behind it; this is in contrast to Mongo or Tokyo or Cassandra that are definitely in the ascendance.

Project Voldemort

  • Produced by Linkedin and open-sourced by them
  • No re-balancing (you set up a cluster with 10 nodes and then you need 11, you need to migrate data to a new cluster)
  • Apparently there is garbage collection in spite of what Bob Ippolito said in his talk (http://groups.google.com/group/project-voldemort/browse_thread/thread/2746cd8bd3700162)
  • From their website:
    • Data is automatically replicated over multiple servers.
    • Data is automatically partitioned so each server contains only a subset of the total data
    • Server failure is handled transparently
    • Pluggable serialization is supported to allow rich keys and values including lists and tuples with named fields, as well as to integrate with common serialization frameworks like Protocol Buffers, Thrift, and Java Serialization
    • Data items are versioned to maximize data integrity in failure scenarios without compromising availability of the system
    • Each node is independent of other nodes with no central point of failure or coordination
    • Good single node performance: you can expect 10-20k operations per second depending on the machines, the network, the disk system, and the data replication factor
    • Support for pluggable data placement strategies to support things like distribution across data centers that are geographically far apart.
  • Limited Python support -- this appears to be all there is: http://github.com/etrepum/voldemort_client.
  • Conclusion: the limited Python support makes me wary of using PV when other good options exist.

Round 2

Of our first round candidates the following made in on to round 2:

  • Light Cloud
  • Cassandra
  • MongoDB

In the next article on the topic, we discuss how we assessed the remaining candidates.