NoSQLs great, but bring your A game

MongoDB might be a popular choice in NoSQL databases, but its not perfect at least out of the box. At last weeks MongoSV conference in Santa Clara, Calif., a number of users, including from Disney, Foursquare and Wordnik, shared their experiences with the product. The common theme: NoSQL is necessary for a lot of use cases, but its not for companies afraid of hard work.

If youre in the cloud, avoid the disk

According to Wordnik technical co-founder and vice president of engineering Tony Tam, unless youre willing to spend beaucoup dollars on buying and operating physical infrastructure, cloud computing is probably necessary to match the scalability of NoSQL databases.

As he explained, Wordnik actually launched on Amazon Web Services and used MySQL, but the database hit a wall at around a billion records, he said. So, Wordnik switched to MongoDB, which solved the scaling problem but caused its own disk I/O problems that resulted in a major performance slowdown. So, Wordnik ported everything back onto some big physical servers, which drastically improved performance.

And then came the scalability problem again, only this time it was in terms of infrastructure. So, it was back to the cloud. But this time, Wordnik got smart and tuned the application to account for the strengths and weaknesses of MongoDB (Your app should be smarter than your database, he says), and MongoDB to accoun! t for th e strengths and weaknesses of the cloud.

Among his observations was that in the cloud, virtual disks have virtual performance, meaning its not really there. Luckily, he said, you can design to take advantage of virtual RAM. It will fill up fast if you let it, though, and theres trouble brewing if requests start hitting the disk. If you hit indexes on disk, he warned, mute your pager.

Foursquares Cooper Bethea echoed much of Tams sentiment, noting that for us, paging the disk is really bad. Because Foursquare works its servers so hard, he said, high latency and error counts start occurring as soon as the disk is invoked. Foursquare does use disk in the form of Amazon Elastic Block Storage, but its only for backup.

EBS also brings along issues of its own. At least once a day, Bethea said, queued reads and writes to EBS start backing up excessively, and the only solution is to kill it with fire. What that means changes depending on the problem, but it generally means stopping the MongoDB process and rebuilding the affected replica set from scratch.

Monitor everything

Curt Stevens of the Disney Interactive Media Group explained how his team monitors the large MongoDB deployment that underpins Disneys online games. MongoDB actually has its own tool called the Mongo Monitoring System that Stevens said he swears by, but it isnt always enough. It shows traffic and performance patterns over time, which is helpful, but only the starting point.

Once a problem is discovered, its like C! SI on your data to figure out what the underlying problem is. Sometimes, an instance just needs to be sharded, he explained. Other times, the code could be buggy. One time, Stevens added, they found out a poor-performing app didnt have database issues at all, but was actually split across two data centers that were experiencing WAN issues.

Oh, and just monitoring everything isnt enough when youre talking about a large-scale system, Stevens said. You have to have alerts in place to tell you when somethings wrong, and you have to monitor the monitors. If MMS or any other monitoring tools go down, you might think everything is just fine while the kids trying to have a magical Disney experience online are paying the price.

By the numbers

If youre wondering what kind of performance and scalability requirements forced these companies to MongoDB, and then to customize it so heavily, here are some statistics:

  • Foursquare: 15 million users; 8 production MongoDB clusters; 8 shards of user data; 12 shards of check-in data; ~250 updates per second on user database, with maximum output of 46 MBps; ~80 check-ins per second on check-in database, with maximum output of 45 MBps; up to 2,500 HTTP queries per second.
  • Wordnik: Tens of billions of documents with more always being added; more than 20 million REST API calls per day; mapping layer supports 35,000 records per second.
  • Disney: More than 1,400 MongoDB instances (although your eyes start watering after 30, Stevens said); adding new instances every day, via a custom-built self-service portal, to test, stage and host new games.

For more-technical details about their trials and tribulations with MongoDB, all three presentations are available online, along with the rest of the conferences talks.

Feature image courtesy of Tony Tam, Wordnik.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.


Comments

Popular posts from this blog

China Watch: Magical New Maglev, Fire the Ambassador?

Live Blog: GMIC G-Startup Competition 2011

White spaces are a go! (at least in Wilmington)