This last July I had another opportunity to attend OSCON 2011 in Portland, Oregon. So I did what any self-described beer and technology enthusiast would do, I jumped on a plane and flew north. While in attendance, I soaked in as much knowledge as I could — insight, instruction, tips, and funny stories about the open source software running most of the Internet and Web-based applications many of us use in our day-to-day online. The following is a sampling of what I took away from a week of geek speak.
Netflix is moving massive amounts of data to the cloud
Typically I don’t attend Keynotes (mostly because there is something more interesting, like lunch, going on at the same time). But this year I made an exception, and I’m glad I did. The Netflix keynote was a 19-minute insight into the inner workings of how Netflix is moving their streaming movie data to the cloud, and in the process, moving data to new stores. Did you know that one movie takes Netflix an entire day to encode into 50 different audio and video files? Then the data is replicated three times across three timezones, and then replicated two more times to two different vendors, giving Netflix geographically redundant data backups. And while moving data around to these new servers, ratings and queues are moving to Cassandra. So the next time you can’t watch an instant movie on Netflix, it’s likely that this move into the cloud is the culprit.
Postgres 9.1 is another step in the right direction
I’ve long been an advocate for Postgres as one of the best choices for an open source database. But it has been lacking in some features, mainly streaming replication. Postgres 9.0 gave us replication, and 9.1 is taking that a step further with synchronous replication. For those of us that require zero data loss in our master-to-slave replication setups, this is just what we’ve been waiting for. In addition, Postgres is replacing contrib with more pluggable extensions available at www.pgxn.com. As it continues to evolve, it’s nice to see Postgres becoming the database of choice for projects like Django and Apple’s own Lion X Server.
Facebook is still working out where to store all of our data
Did you know that Facebook is collecting 25TB of message data alone per month? All of those messages are going into Hbase and Haystack data stores on clusters upon clusters of servers. Each cluster has a stack — running open source software such as Memcached, ZooKeeper, Hbase, Hadoop and their own Haystack — spread across 5 racks of 20 servers. That’s 100 servers per cluster. As Facebook migrates their data away from MySQL and shards it onto these clusters, they reduce the number of users affected by a single outage. This also explains how they are able to roll out new updates to a percentage of users at a time. Aside from finding a place to stash all of this data, there wasn’t any mention of what Facebook is going to do with all of our messages. <insert SkyNet reference here>.
MongoDB, and its noSQL ilk, are not to be ignored
The noSQL movement is gathering a tremendous amount of momentum and popping up in places I don’t think anyone expected it to. When discussing whether or not noSQL data stores would replace relational databases, we’ve been focusing too much on an all-or-nothing approach. Meanwhile, open source noSQL, or schema-less, databases found their niche in service oriented architectures. I can tell you from personal experience that the database is always going to present a bottleneck, and furthermore, it’s gone to be one aspect of the database that does so. Companies like Netflix and Facebook have figured this out and are resolving these bottlenecks by replacing the problematic services with Hbase, Cassandra and MongoDB. These schema-less databases can be especially useful when all you need is a basic key-value store. Over the next few years we are going to see schema-less databases adopting features from relational databases, and vice-versa. There is going to be a lot of innovation in this space as these open source databases mature.
The Netflix API is not what it used to be
When I left the talk on changes coming to the Netflix API I thought they were crazy (which I’ll explain). But as I’ve been mulling over it, I can see how their situation is truly unique and they need to do something unprecedented to address it. When Netflix created their API, they did so like any of us would — creating a standards-based API that would attract developers too their platform. And it worked. In the beginning, 99% of their API usage comes from individual developers. Now the tides have turned and 99% of the API usage comes from Netflix streaming devices. This has resulted in 20 billion API requests per month, or 20,000 per second. The solution? In short, they want to provide custom API hooks for devices to cut the number of requests per month down to 5 billion (a 75% reduction). A custom API interface per device?!?!?! Yeah, that was my thought exactly. But then, if the Roku player is always making the same four requests to display your instant queue, why not give them a custom API call so they can reduce that to one request? You’ve reduced your API requests by 75% and you’re also helping the device run faster. Netflix has a long ways to go — they are still in the conceptual phases of overhauling the API — before they pull this off, but they might just pull it off and do some pretty innovative stuff in the process.
It’s really hard to test scaling problems before deployment
Try as we might, it’s quite difficult to replicate a production environment. We can match the number of servers, the amount of bandwidth, possibly even an approximation of the load we expect to receive, but, we’ll never get it right. The truth is, we can never fully estimate what the real world is going to throw at our web-based application until we put it out there as a target. Our users, simply put, are too unpredictable. That doesn’t we shouldn’t do load testing in a staging environment, we should. It means at some point we just have to put it out there and see what happens, and be on the ready to quickly stamp out any fires that may crop up. As a web developer who has deployed features into the great unknown, it is reassuring to know that other web developers are up against the same problems and are coming up with similar solutions. It’s like they say, it’s easier to beg forgiveness than it is to ask permission. Which brings us to the topic of DevOps…
DevOps are the new SysAdmins
As web developers, it used to be we could place most the blame for web-based application failures on systems outside of our control. We could point fingers at a misconfigured database, an overworked web server, or some other constellation in the corner of the infrastructure. With the widespread adoption of Memcached, NGINX, and noSQL databases, bad coding is becoming exposed more as the bottleneck in modern web-based applications. Our apps have to perform well, and when they don’t, we have to fix them. DevOps are the new SysAdmins, meaning, if something breaks, we have to go in and get the app back on its feet. I’ve always been an advocate of developers understanding how their apps interact with the server environment where they are hosted, now it’s becoming a necessity. And if you ever get a chance to see Terry Chay speak at a conference, he’s definitely good for a few laughs ;)
Infrastructure is quickly becoming a commodity
Everywhere I went people were talking about the cloud, and have been long before I went to this conference. While the cloud is a great step forward in web-based application development, it is not a one-size-fits-all solution. The good parts? A developer can spin up any number of web server instances to meet an application’s traffic demands as they ebb and flow. Just a few clicks and more servers are deployed to handle load. Storage constraints, also, are easily addressed by allotting more disk space. Paying for usage means hardware isn’t sitting around unused. It’s easy to expand only when necessary, meaning our hosting needs can grow alongside our budget. We don’t have to project our server needs out years in advance, we can meet those needs tomorrow. But, it isn’t perfect. Servers still go down, file systems are corrupted, noisy neighbors can negatively affect our apps, and performance isn’t always the best. I’ve personally seen all of these happen in the cloud and had to wait while a system administrator troubleshoots and resolves the issue. No, the cloud isn’t perfect, but it definitely has it’s advantages. Would I move a dedicated bare-metal database server to the cloud? No. But I’m considering more letting the cloud handle commodity hosting needs, like web servers and load balancing.