July 21, 2008
Who wouldn't want to live in a "cloud"? The term is a perfect marketing buzzword for the server industry, heralding images of a gauzy, sunlit realm that moves effortlessly across the sky. There are no suits or ties in this world, just toga-clad Greek gods who do as they please and punish at whim, hurling real lightning bolts and not merely sarcastic IMs. The marketing folks know how to play to the dreams of server farm admins who spend all day in overgrown shell scripts and impenetrable acronyms.
To test out these services, I spent a few days with them and deployed a few Web sites. I opened up accounts at four providers, configured some virtual servers, and sent Web pages flowing in a few hours. Our choice of four providers wasn't as scientific as possible because there are a number of new services appearing, but I chose some of the big names and a few new services. Now, I can invoke Joni Mitchell and say I've looked at both sides of these services and offer some guidance.
[ Download a QuickTime video tour of each compute cloud: Amazon EC2 | Google App Engine | GoGrid | AppNexus.
See also: "First Look: Google's high-flying cloud for Python code" and "What cloud computing really means." ]
The first surprise is that the services are wildly different. While many parts of Web hosting are pretty standard, the definition of "cloud computing" varies widely. Amazon's Elastic Compute Cloud offers you full Linux machines with root access and the opportunity to run whatever apps you want. Google's App Engine will also let you run whatever program you want -- as long as you specify it in a limited version of Python and use Google's database.
The services offer wildly different amounts of hand-holding, and at different layers in the stack. When this assistance works and lines up with your needs, it makes the services seem like an answer to your prayers, but when it doesn't, you'll want to rename it "iron-ball-and-chain computing." Every neat feature that simplifies the workload does it by removing some switches from your reach, forcing you into a set routine that is probably but not necessarily what you'd prefer.
After a few hours, the fog of hype starts to lift and it becomes apparent that the clouds are pretty much shared servers just as the Greek gods are filled with the same flaws as earthbound humans. Yes, these services let you pull more CPU cycles from thin air whenever demand appears, but they can't solve the deepest problems that make it hard for applications to scale gracefully. Many of the real challenges lie at the architectural level, and simply pouring more server cycles on the fire won't solve fundamental mistakes in design.
By the end of my testing, the clouds seemed like exciting options with much potential, but they were far from clear winners over traditional shared Web hosting. The clouds made some things simpler, but they still seemed like an evolving experiment.
[ Jump to the reviews and the analysis: Amazon EC2 | Google App Engine | GoGrid | AppNexus | The fine print | Crashing the cloud metaphor | Best and worst ]
Amazon Elastic Compute Cloud
Amazon was one of the first companies to launch a product for the general public, and it continues to have one of the most sophisticated and elaborate set of options. If you need CPU cycles, you can spin up virtual machines with Elastic Compute Cloud (EC2). If it's data you want to store, you can park objects of up to 5GB in the Simple Storage Service (S3). Amazon has also built a limited database on top of the S3, but I didn't test it because it's still in a closed beta. To wrap it up, your machines can talk among themselves with the Simple Queue Service (SQS), a message-passing API.
All of these services are open to the Web and accessible as Web services. There's a neat demo for the SimpleDB that is just a pile of HTML running in your browser while querying the distant cloud. The documentation is extensive, and Amazon makes it relatively easy to wade through the options.
The ease, though, is relative because almost everything you do needs a command line. Amazon built a great set of tools with sophisticated security options for sending orders to your collection of machines in the sky, but they all run from the command line. I found myself cutting and pasting commands from documentation because it was too easy to mistype some certificate file name, for example.
Unix jockeys will feel right at home in this world because the virtual machines at your disposal are all versions of Linux distros like Fedora Core 4. After you grab one off the shelf, you can install your own software and create a custom instance that can be loaded relatively quickly if there's space available in the cloud.
It's hard to go into enough detail about all of the offerings described here, but Amazon is the most difficult because it has the most extensive solutions. Amazon is thoroughly committed to the cloud paradigm, rethinking how we design these systems and producing some innovative tools. [ See the QuickTime video. ]
[ Jump to the reviews and the analysis: Amazon EC2 | Google App Engine | GoGrid | AppNexus | The fine print | Crashing the cloud metaphor | Best and worst ]
Google App Engine
Google's App Engine is a polar opposite of Amazon's offering. While you get root privileges on Amazon, you can't even write a file in your own directory with the App Engine. In fact, it's not even clear that you get your own directory, although that's probably what's happening under the hood. Google ripped the file write feature out of Python, presumably as a quick way to avoid security holes. If you want to store data, you must use Google's database.
The result of all of these limitations is not necessarily a bad thing. Google has stripped Web applications down to a core set of features and built up a pretty good framework for delivering them. I was able to write a simple application with several hundred lines of Python (cutting and pasting from Google's documentation) in less than an hour. Google offers some nice tools for debugging the application on your own machine.
Deploying this application to the cloud should have taken a few seconds, but it was held up by Google's insistence that I fork over my cell phone number and wait around for a text message that tests the number. When my message didn't show up for several hours after retrying, I switched to a friend's phone and finally activated my account.
Google insists on linking your App Engine account to both your cell phone and your Gmail account because -- well, I don't know. I think it's to track down the scammers, spammers, pharmers, phishers, and other fraudsters, but it starts to feel a bit creepy. Maybe it will help customer service and allow them to field support requests with answers like, "Your cell phone shows you filed this report from a location with a liquor license. Your e-mail suggests you're coding while waiting for Chris to get off of work. We suggest going home, sleeping this off, and then it will take you only a few seconds to find the endless loop on line 432 of main.py. BTW, Chris is lying to you and is really out with someone else."
The best users for the App Engine will be groups, or most likely individual developers, who want to write a thin layer of Python that sits between the user and the database. The API is tuned to this kind of job. In the future, Google may add more features for background processing and other services such as lightweight storage, but for now, that's the core strength of the offering. [ See the QuickTime video. ]
GoGrid
GoGrid refers to itself as the "world's first multi-server control panel." GoGrid's offerings aren't functionally different from Amazon's EC2, but using the old term "control panel" seems to be a better description of what's going on than the trendier term "cloud." You start up and shut down load balancers in much the same way as relatively ancient tools like Plesk and cPanel let you add and subtract services.
While GoGrid offers many of the same services as Amazon's EC2, the Web-based control panel is much easier to use than the EC2 command line. You point and click. There's no need to cut and paste information because little pop-up boxes show the way, by suggesting available IP addresses, for example. The system is intuitive, and it takes only a few minutes to build up your network. A simple ledger on the left keeps track of the costs and helps you manage the budget.
GoGrid also has a wider variety of OS images ready to go. There is the usual collection of CentOS/Fedora and common LAMP stacks. If you need Windows, you can have Windows Server 2003 with IIS 6.0, and Microsoft SQL Server is available at extra cost. There are also images with Ruby on Rails, PostgreSQL, and the Facebook application server. These make it a bit easier to start up.
While GoGrid offers many of the same features as Amazon's EC2, it doesn't provide more cloudlike services for storing information in a shared way like SimpleDB. This can make it a bit harder to start up and shut down servers without a bit of grief. The startup notes for the service point out that the only way to stop paying for a server is to delete it, and that means losing all of the data on it.
There's no simple way to build custom images at this moment, but the documentation says GoGrid is working on a way to turn any running server into an image that can be restarted later. If you're going to be expanding and contracting your network as the traffic ebbs and flows, you'll have to come up with some tools of your own to add and subtract these servers. [ See the QuickTime video. ]
AppNexus
If you like the idea of the cloud but aren't sure if you want to leave behind the old trustworthy world of Unix, cron jobs, and other tools, then AppNexus is a service that aims to be a bit more transparent. The company has taken a big, industrial-sized server farm with the best load-sharing tools and storage boxes and found a way to let you buy it in small portions. AppNexus provides a number of command-line abstractions that let you turn servers on and off, but they also let you drill down into the file system.
The main functions of the AppNexus cloud are similar to Amazon's EC2. You log in through a command line and boot up images of Linux distributions. AppNexus says it can rebuild images from other sources like Amazon's EC2 by replacing the kernel with a version that's more aware that it is running in a virtual environment. Then it just takes a few key clicks on a command line to set up a load balancer.
One open question in the world of cloud computing is where the abstraction occurs; that is, where do the walls between the machines become blurred and it all starts to look a bit cloudy? Amazon's SimpleDB hides the storage behind a software wall and gives you access to it through some Web service call. AppNexus is working at a lower level by building in a cluster of Isilon IQ X-Series storage clusters into its cloud.
This gives you the option of simply mounting the storage and sharing the data across your cluster of servers -- if you consider that simple. Instead of working with abstract keys, you use real file names as the keys. The cluster handles the rest of the work.
A better solution is to use what AppNexus calls its CDN, or Content Delivery Network. The storage cluster has its own set of HTTP servers built in, and you can automatically begin serving static data from your files. Just write the files to the /cdn directory and they become available. AppNexus will distribute this storage cloud to multiple datacenters, making it simpler to serve up the static data from the closest location. [ See the QuickTime video. ]
The fine print
One of the ways to go truly insane is to read the terms of service for these clouds. While the people who wrote the old co-location contracts could try to imagine the data as living on a single server that was in a certain box owned by a certain person and residing in a certain jurisdiction, all bets are off with a cloud. The whole point is that it isn't confined to one box, one building, or even one country.
Some of the service agreements are very specific and clear. GoGrid, for instance, spells out numerical thresholds for standard values such as latency, jitter, and packet loss for the six continents. If the cloud doesn't meet them, GoGrid promises to give you service credits for 100 times the amount lost.
Other terms are deliberately murky. You might consider it fairly capricious for Amazon to demand the right to terminate your account "for any reason" and "at any time," but the company also carefully reserves the right to terminate your account for "no reason" too. In other words, "It's not you, honest. It's me. No. I take that back, it's not even me. It's just over between the two of us. No reason."
Google's terms seem more generous, indicating it will terminate accounts only if you breach the terms of the agreement or do something unlawful. But Google does reserve the right to "pre-screen, review, flag, filter, modify, refuse, or remove any or all Content from the Service." I want to say that the terms seem more reasonable than they were when I read them several weeks ago, but I can't be sure. And it doesn't matter too much because new terms apply whenever Google wants to change them, and you signify your acceptance by continuing to use the service.
If you think it's hard to work through the legal rules when a server is in one state and a user is in another, imagine the right answer when your virtual server could migrate within a cloud that might encompass datacenters spread out across the globe. Amazon's terms, for instance, prohibit you from posting content that might be "discriminatory based on race, sex, religion, nationality, disability, sexual orientation, or age." It sounds like Amazon is worried that part of the cloud might touch down in a municipality that forbids things like this.
It almost seems scary to mention this fact, but New York is insisting that Amazon charge sales taxes because Amazon pays a commission to Web sites that do business in the state. What does this mean for applications hosted by Amazon? Do you owe sales tax if your application touches down in a part of the cloud that's in New York? Do you owe income tax?
I wanted to make some allusion to Schrodinger's cat and imply that we can't know where the computation occurs in the cloud, but then I slowly realized that this is far from true. Cloud servers have log files too, and these log files can produce insanely detailed analyses of who might owe which taxes. Major league athletes already hire tax attorneys to compute their share of income earned in each stadium, and some people are suggesting that Web companies aren't paying enough to support the local fire trucks and orphanages. Say good-bye allusions to Joni Mitchell; it's time to start invoking Warren Zevon's "Lawyers, Guns, and Money."
Crashing the cloud metaphor
The legal worries are just part of the details that aren't so certain. One of the biggest dangers is reading too much into the cloud metaphor. While it's largely true that these services are very flexible ways to build up a network of machines, they are far from perfect. What happens if a server or a hard disk crashes in the middle of an operation? Often the same thing that happens when a generic server kicks the bucket: Your data might disappear and then it might not.
An instance of a machine from Amazon's EC2 looks just like a normal machine because after you strip away the hype, it is just another version of Linux running on a chip that probably speaks 8080 machine code and writes data to a spinning platter. If you write something to a good old file in the Unix file system, the cloud metaphor won't protect it. It will stay there until the machine dies. If you shut down the server to save some cash when traffic is low, that's the same thing as dying. That means you can't really scale up and down without a savvy plan for migrating data.
In other words, MySQL in the cloud works just like it does on a generic server. Everything could be lost in a poof unless you start up several instances and mirror them with each other. The magic of the cloud metaphor can't remove this fundamental rule.
If you want something to survive a crash, you've got to put it into the cloud's data stores. These are great services, but they're not cheap. One friend of mine used to back up his disks to Amazon's S3 until he started getting bills for more than $200 a month. He bought a hard disk and kept it on his desk.
The price is higher because the service level is higher. Amazon wants people to be able to trust the data store, and that means providing a level of service that would make a bank happy. Sharing data across servers takes time and careful coding. Google cautions users to be careful writing to its data store because it can be expensive. If you're someone who likes to keep lots of log files just in case, you'll probably pay much more to store them in the cloud than you would in a regular file. Alas, Google doesn't have regular files.
One of the trickier details is trying to understand the prices. GoGrid, for instance, likes to say that its Intel Xeon servers are more powerful than its competitors. Google doesn't even sell server time per se; it just bills you for CPU megacycles, a squirrelly metric. Amazon EC2 has regular-sized machines and bigger ones that are a bit more expensive. When costs change, the companies often lower their prices. But they also raise them when a service turns out to be more expensive to provide than they thought. This complexity will have you scratching your head for a long time because it's hard to know what things will end up costing. That box from Sun may not scale up and down, but the bill isn't going to change with every hit on your Web site.
Best and worst
After working through these systems, I tried to imagine the best and worst applications for these clouds. One of the best fits might be some kind of reservation system for weekend events like concerts. While there might be a small amount of the load at any time, the crunch would come each Friday afternoon when people realize they have no weekend plans. The cloud's ability to spin up more servers to handle this demand would fit this perfectly. The service might also take real reservations and sell tickets in advance, a service that would demand the higher qualities of service offered by the shared data stores.
The worst possible application might be something like RedSoxYankeesTrashTalk.com or any Web site filled with an endless stream of mostly forgettable comments trolling for reactions from the rival fans. While there might be a slight peak around game time, I've found that sites like this keep rolling along even late at night during the off-season. And such a site would certainly attract First Amendment proponents who would look for ways to write a single sentence that could zing all seven of Amazon's protected targets of discrimination.
Furthermore, there would be no reason to pay for high-quality storage because I'm sure that even the participants wouldn't notice if their comments disappeared by mistake. For fun, read Amazon's terms on getting your data back after they shut you down. While I would probably write the same thing if it were my cloud, there are plenty of examples of applications that are better off on their own.
These examples aren't perfect, of course, but neither is cloud computing. After a few weeks of building up some machines and hearing from people who've used the services, I'm pleasantly confused and filled with curious and optimistic questions. Will these clouds be large enough to handle the Internet equivalent of the Thanksgiving weekend traffic jams? Will the cloud teams be able to find a way to offer simple options that are priced correctly for the serious and not-so-serious data wrangler? Will they ever find an adequate meter for computation time?
I suspect the only people who know the answers to these questions today are living in the real clouds where they went after a life ministering to the IBM mainframes. If we could get those guys back here today, we might be able to get this cloud thing up and running smoothly. We just have to convince Intel to build a chip that understands IBM 360 binaries.