2008年7月31日木曜日

Cloud Computing: The Nine Features of an Ideal PaaS Cloud

PaaSベースのCloud Computingに必要な要件を整理した記事。
多くのCloud Computing環境が登場する現在、オープンインタフェースの必要性を謳うケースが多くなってきており、本記事もその一例。 
複数の共通化すべきインタフェースを整理している事と、Cloud Computingとして前提となるScalabilityやLoad Balancingなどの機能を自動化すべき、とういポイントをよく抑えている。

Cloud Computing: The Nine Features of an Ideal PaaS Cloud

Clouds Should Be Open, Not Proprietary


What sort of cloud computer(s) should we be building or expecting from vendors? Are there issues of lock-in that should concern customers of either SaaS clouds or PaaS clouds? I've been thinking about this problem as the CEO of a PaaS cloud computing company for some time. Clouds should be open. They shouldn't be proprietary. More broadly, I believe no vendor currently does everything that's required to serve customers well.

What's required for such a cloud? I think an ideal PaaS cloud would have the following nine features:

1. Virtualization Layer Network Stability

Cloud computers must operate on some sort of virtualization technology for many of the following features to even be feasible. But as general purpose computing moves from dedicated hardware to on-demand computing, one key feature of the dedicated model for web applications is a stable, static IP address. If the virtualization layer borks (and this happens), when the cloud has recovered the cloud instances of compute, the developer should be able to rely on the web application just working without having to re-jigger network settings.

2. API for Creation, Deletion, Cloning of Instances

Developers should be able to interact with the cloud computer, to do business with it, without having to get on the phone with a sales person, or submit a help ticket. In other words, the customer should be able to truly get on-demand computing when they demand, whenever they demand. Joyent only began to offer this recently through Aptana and their Aptana Studio product. However, the API is only available to Aptana at this point. The API needs to be publicly available to everyone. Provide a credit card (that works and is yours) and you should get compute, storage, and RAM on-demand. The challenges for cloud computing companies is to figure the just-in-time economics that allow us to provide on-demand infrastructure without having lots of infrastructure sitting around waiting to be used. I think this means that cloud computing companies will, just like banks, begin more and more to "loan" each other infrastructure to handle our own peaks and valleys, But in order for this to happen we'd need the next requirement.

3. Application Layer Interoperability

Cloud computers need to support a core set of application frameworks in a consistent way. I propose that cloud computers should support PHP, Ruby, Python, Java and the most common frameworks, libraries, gems/plugins, and application/web servers for each of these languages. Essentially, a developer should be able to move between Joyent, the Amazon Web Services, Google, Mosso, Slicehost, GoGrid, etc. by simply pointing the "deploy gun" at the cloud (having used the API mentioned above to spin up instances) and go. Change DNS, done. But, no cloud computing company is innovating by providing better application layer solutions. We ought to support the most popular languages and move on. However, for a developer to truly have cloud portability, we need to support another requirement.

4. State Layer Interoperability

This is the most difficult problem to solve when scaling a web application, and, consequently, the area in which cloud computing companies are innovating while sacrificing interoperability. It's not simply a question of deciding that we should all support MySQL or Postgres because we will find that the needed requirement ("Automatic Scaling") is practically impossible to achieve with these tools. Amazon is innovating with SimpleDB, Google has BigTable as solutions for the problem, but developers can't leave either cloud because neither SimpleDB nor BigTable are available anywhere else. What is needed, and I'm looking ahead to the next requirement when I say this, is an XMPP-based state-layer that can flush out to some SQL-y store. Think open-source Tibco. The financial markets fixed these problems years ago. This datastore needs to speak SQL, be built using open-source and free software, and be easy for developers to adopt. The value cloud computing companies provide to developers is running the state layer for them, without requiring developers to use some proprietary state layer that may or may not provide scalability upon success and represents lock-in.

5. Application Services (e.g. email infrastructure, payments infrastructure)

A cloud computer should provide scaled application services consumable by developers in developing and delivering their own applications. There are two types of application services. The first group is delivered using open protocols/formats. Examples would be IMAP/SMTP, LDAP/vCARD, iCAL/ICS, XMPP, OpenID, OPML. All clouds should offer these open protocols/formats so that developers can move between clouds without having to rewrite their application. The second group is delivered as web services, are often proprietary to the cloud (therefore a means of differentiation), and include services such as payments, inventory.

6. Automatic Scale (deploy and forget about it)

All things being equal, a competent developer should be able to deploy to a cloud and grow to five billion page views a month without having to think about "scale". Just write the code, the cloud computer does the rest.

Is this achievable? Today, no. No cloud computer automatically scales applications. Part of the problem lies in the state layer. Part of the problem lies in what it means to scale. What is the measure of scale? Responsiveness? Scaling the state layer (e.g. the database) is a black art. Scaling the application layer or the static assets layer relies, in part on load balancing and storage.

7. Hardware Load Balancing

The cloud computer should provide the means to achieve five billion page views a month. I picked that number because it is big. If you're writing an application, and you want to be able to achieve tremendous scale, the answer shouldn't be to move off the cloud onto your own "private" cloud of dedicated servers. Of course, if the cloud computer is open as we've described, you can build your own cloud. It's also true you can generate your own electricity from coal, if you want to bother. But why bother? Software load balancers will get you nowhere close to the throughput required to achieve 5 billion page views per month. The state of the art is hardware load balancers.

8. Storage as a Service

Storage should be available to developers as a service. Where this is done today, it is done using a proprietary API and represents lock-in. The storage service should allow customers to consume endless amounts of storage and pay for only what is used. Objects on the storage service should be accessed by developers as objects rather than as nodes in a hierarchical tree. This way developers don't have to understand the hierarchy.

WebDAV could be an open protocol version of the storage service, but fails to provide the abstraction of treating objects as objects rather than nodes in a hierarchical tree. At present, I don't believe there is a reasonable solution to the problem that isn't also proprietary. We need to develop one that is open and free.

9. "Root", If Required

The cloud computer vendor can't think of everything a developer or application might need or want to do. So the cloud needs to be hackable and extensible by the developer and that means an administrative account of some sort that allows the developer to shape and mold the cloud to their specific needs. By definition, cloud computers must be built on top of some sort of virtualization technology, so the developer never has "root" to the cloud, only "root" to the developer's part of the cloud.

MicrosoftのCloud Computing戦略

7/24に行われたMicrosoft社のFinancial Analyst Meetingの席上にて、同社のChief Software Architect, Ray Ozzie氏のインタビューがあり、Microsoftの目指している Software + Services戦略について語った。 
非常に面白い内容ではあるが、今ひとつ具体性にかけているのはまだ製品化の段階に程遠いと見るべきか。
Microsoftのデータセンターインフラの絶対的な強さがあるので、まずCloud Computingの戦略がスタートすればかなり同業他社に怖い存在になる事は明らかであるが。
 

Peering Into Microsoft's Cloud

Written by Sarah Perez / July 28, 2008 8:19 AM / 13 Comments

On July 24th, Microsoft held their annual Financial Analyst Meeting (FAM), an event where many of Microsoft's top executives come together to talk about the company's progress and achievements. At this year's meeting, Microsoft Chief Software Architect, Ray Ozzie hinted at Microsoft's cloud initiatives, a part of their Software + Services (S+S) strategy. While Ozzie did not reveal either codenames or ship dates during his speech, there is still some information we can piece together to help determine what Microsoft's cloud will look like.

The Connected OS

Ozzie Said:

"We believe in a future, again, in many ways analogous to Xbox LIVE, in which Windows Live acts as a strategic extension to both Windows on the PC and Windows Mobile on the phone. You can think of this as the connected OS, Windows beyond the level of a single device or PC. How the OS connects to services and how it synchronizes with other devices are key. Your PC's config settings, your apps and their settings, your files and folders, are transparently synchronized across a mesh of PCs and other devices by Windows."

Takeaway:

This is a reference to Windows Live Mesh, a platform that currently only does file sync but is designed to also sync settings and applications. Today, many people are still confused about Mesh, thinking it's Microsoft's competitor to Apple's MobileMe service, but the hint here is that Mesh will go deeper than just a premium service for cloud storage and sync, to be a more of an overall cloud computing platform.

The Software Stack

Ozzie Said:

"Most major enterprises today find themselves in the early stages of a two-stage infrastructure transition. The first stage is the consolidation of many dedicated application servers into a fewer number of larger application servers using virtualization to combine those workloads into a single high-scale box. The second stage they're heading into is the shift toward leveraging utility computing services, a new kind of system designed for massive scale-outs, running on large redundant arrays of inexpensive commodity servers in the cloud. Both of these trends, consolidation and utility computing, are motivated by the same two things: First, to make the best use of expensive IT personnel; and second, to increase the agility of IT -- agility in deployment, agility in management, deploying and scaling IT systems in just minutes or hours that might have taken in the past weeks or months to get up and running. Earlier you heard Stephen and others talk about online services, Exchange Online, SharePoint Online, CRM Online. These are the high-level service analogs to on-premises server offerings. And to power these services at scale, we're bringing some of our most capable server assets to the cloud beyond those key building blocks, like SQL Server and BizTalk."

Takeaway:

Hey, that looks familiar! Microsoft's software stack hasn't changed - it has just moved from the data center to the cloud. For IT, this means faster deployments and the power to scale up as needed. IT admins will spend less time staging servers which means they'll have more time to focus on other areas of the business. We envisioned what this will look like here, but to sum up - your company's computer guy/gal will be less of a "geek" and more of a "facilitator."

Developers, Developers, Cloud App Developers

Ozzie Said:

"The third principle that I've been talking for several years now with people in the organization is one that targets developers, and it's one that I refer to as the trend toward small pieces loosely joined in how you build programs. When I talked about my first two principles, software that spans multiple devices and software that spans from the enterprise into the cloud, you might have gathered that the nature of software development is also being transformed in moving toward a world of software plus services...Software on the back end is also being transformed from being a single program running on an enterprise server that scales in a scale-up manner to programs that are spread out across hundreds or even thousands of PCs running in a cloud-based datacenter that appears like one datacenter to the programmer, but is actually spread across the world. So, what does this principle mean for Microsoft's business perspective? Well, many business ISVs and many VARs will be looking to move their applications and solutions to the cloud just like we have. For them, like us, this technology shift towards services represents a significant opportunity, a chance for them to deliver to their enterprise customers the power of choice within their own application or solution. And so Microsoft's opportunity in this space is perfectly aligned with that of our partners to provide them with the platforms and the tools to make this transition, leveraging our experience as well as our substantial economies of scale in embracing the cloud."

And later..

Question: "How do you convince the market and the customers that you're going to be moving into the cloud, that Microsoft should be the platform play versus a Google and/or an Amazon?"

RAY OZZIE: "I think, on that front, if we solve a problem for these folks, it will prove itself out in a positive way. Web developers -- you can just look out there right now -- are extremely pragmatic. They're very, very pragmatic. If something works for them and solves a problem, they're just going to use it. And, you know, the onus is on us to prove, to show through what we deliver, that it's very, very valuable. And, you know, brand will -- if there is a brand perception, for example, within the open-source community about Microsoft, they'll be a bit perplexed when they find out -- when they see that the best way to run what they're trying to do is on our infrastructure. And I think that will improve. That will improve brand perception in that realm."

Takeaway:

Obviously, the key to a good cloud strategy is getting software developers on board, so beyond just providing the cloud itself, this sounds like he's hinting at some sort of development tool (or tools) that will provide developers a way to build apps in the cloud. In order to compete with both Google and Amazon, the Microsoft cloud has to be better - that is, it has to be richer and more well-defined than what currently exists today. This could be tough. Says Ozzie, "Amazon has done a terrific job...I think we've all learned a lot from it."

Cloud Datacenters To Run It All

Ozzie Said:

"And, yes, the datacenters that we invest in...it's the same datacenters that host Search and our MSN apps and our Windows Live apps, Office Live apps. And this platform infrastructure is also...going out there. We do careful staged investments...as you're expanding, you have to have different projects in different phases...we have to have the footprint to be able to build at the right rate when the demand emerges. You don't want to overbuild too much in advance of the demand. But we're preparing for a fairly significant transformation."

Takeaway:

In other words, the Microsoft cloud isn't just about business apps and SLAs. Along with running Exchange Online and SharePoint Online, their same datacenters will run Live Search, Windows Live Services, MSN, Office Live, and more. "Significant transformation?" Was Microsoft really blindslided by the shift to cloud computing or have they been quietly ramping up for a massive shift of their business? ZDNet recently reported that Microsoft's corporate vice president of Global Foundation Services Debra Chrapaty is on record saying Microsoft is adding 10,000 new servers a month. Put that in perspective - Facebook is estimated to have 10,000 total.

Conclusion

Although we don't have all the pieces yet nor any sort of ship dates, what we can see here is that Microsoft does indeed have a cloud computing strategy and it's huge. They're not just moving their businesses services from the data center to the cloud - they're also providing cloud services for consumers, too. And then there is Live Mesh, their cloud computing platform that will sync apps, settings, and files. How does it all tie together? We don't have all the answers today, but it looks like it's going to be a big reveal when the time comes.

2008年7月29日火曜日

Cloud versus cloud: A guided tour of Amazon, Google, AppNexus, and GoGrid | InfoWorld | Test Center | July 21, 2008 | By Peter Wayner

Amazon、Google、AppNexus、GoGridの4社が提供しているCloud Computingサービスをそれぞれ実際に使い、違いを比較した記事。  簡単なアプリケーションを走らせた程度なので、Enterpriseアプリケーションに向いているかどうかの判断は難しいが、それぞれの違いがある程度見えて、興味深い。 
 
総合的に判断して、次のようなコメントが出た。
  • 定期的なトラフィック増にCPU/メモリー リソースを一時的に必要とするようなアプリケーションは向いている
  • ストレージをバックアップとしての利用は高額になる
  • SNSのようにコンテンツの内容をよくモニターする必要のあるアプリケーションは向かない(契約書に反する利用法として問題がおきるリスク多し)
 
 

Cloud versus cloud: A guided tour of Amazon, Google, AppNexus, and GoGrid

Cloud computing offerings differ in depth, breadth, style, and fine print; beneath the heady metaphor lurk familiar pitfalls, complex pricing, and many questions



By Peter Wayner


July 21, 2008

Who wouldn't want to live in a "cloud"? The term is a perfect marketing buzzword for the server industry, heralding images of a gauzy, sunlit realm that moves effortlessly across the sky. There are no suits or ties in this world, just toga-clad Greek gods who do as they please and punish at whim, hurling real lightning bolts and not merely sarcastic IMs. The marketing folks know how to play to the dreams of server farm admins who spend all day in overgrown shell scripts and impenetrable acronyms.

To test out these services, I spent a few days with them and deployed a few Web sites. I opened up accounts at four providers, configured some virtual servers, and sent Web pages flowing in a few hours. Our choice of four providers wasn't as scientific as possible because there are a number of new services appearing, but I chose some of the big names and a few new services. Now, I can invoke Joni Mitchell and say I've looked at both sides of these services and offer some guidance.

[ Download a QuickTime video tour of each compute cloud: Amazon EC2 | Google App Engine | GoGrid | AppNexus.
See also: "First Look: Google's high-flying cloud for Python code" and "What cloud computing really means." ]

The first surprise is that the services are wildly different. While many parts of Web hosting are pretty standard, the definition of "cloud computing" varies widely. Amazon's Elastic Compute Cloud offers you full Linux machines with root access and the opportunity to run whatever apps you want. Google's App Engine will also let you run whatever program you want -- as long as you specify it in a limited version of Python and use Google's database.

The services offer wildly different amounts of hand-holding, and at different layers in the stack. When this assistance works and lines up with your needs, it makes the services seem like an answer to your prayers, but when it doesn't, you'll want to rename it "iron-ball-and-chain computing." Every neat feature that simplifies the workload does it by removing some switches from your reach, forcing you into a set routine that is probably but not necessarily what you'd prefer.

After a few hours, the fog of hype starts to lift and it becomes apparent that the clouds are pretty much shared servers just as the Greek gods are filled with the same flaws as earthbound humans. Yes, these services let you pull more CPU cycles from thin air whenever demand appears, but they can't solve the deepest problems that make it hard for applications to scale gracefully. Many of the real challenges lie at the architectural level, and simply pouring more server cycles on the fire won't solve fundamental mistakes in design.

By the end of my testing, the clouds seemed like exciting options with much potential, but they were far from clear winners over traditional shared Web hosting. The clouds made some things simpler, but they still seemed like an evolving experiment.

[ Jump to the reviews and the analysis: Amazon EC2 | Google App Engine | GoGrid | AppNexus | The fine print | Crashing the cloud metaphor | Best and worst ]

Amazon Elastic Compute Cloud
Amazon was one of the first companies to launch a product for the general public, and it continues to have one of the most sophisticated and elaborate set of options. If you need CPU cycles, you can spin up virtual machines with Elastic Compute Cloud (EC2). If it's data you want to store, you can park objects of up to 5GB in the Simple Storage Service (S3). Amazon has also built a limited database on top of the S3, but I didn't test it because it's still in a closed beta. To wrap it up, your machines can talk among themselves with the Simple Queue Service (SQS), a message-passing API.

All of these services are open to the Web and accessible as Web services. There's a neat demo for the SimpleDB that is just a pile of HTML running in your browser while querying the distant cloud. The documentation is extensive, and Amazon makes it relatively easy to wade through the options.

The ease, though, is relative because almost everything you do needs a command line. Amazon built a great set of tools with sophisticated security options for sending orders to your collection of machines in the sky, but they all run from the command line. I found myself cutting and pasting commands from documentation because it was too easy to mistype some certificate file name, for example.

Unix jockeys will feel right at home in this world because the virtual machines at your disposal are all versions of Linux distros like Fedora Core 4. After you grab one off the shelf, you can install your own software and create a custom instance that can be loaded relatively quickly if there's space available in the cloud.

It's hard to go into enough detail about all of the offerings described here, but Amazon is the most difficult because it has the most extensive solutions. Amazon is thoroughly committed to the cloud paradigm, rethinking how we design these systems and producing some innovative tools. [ See the QuickTime video. ]

[ Jump to the reviews and the analysis: Amazon EC2 | Google App Engine | GoGrid | AppNexus | The fine print | Crashing the cloud metaphor | Best and worst ]

Google App Engine
Google's App Engine is a polar opposite of Amazon's offering. While you get root privileges on Amazon, you can't even write a file in your own directory with the App Engine. In fact, it's not even clear that you get your own directory, although that's probably what's happening under the hood. Google ripped the file write feature out of Python, presumably as a quick way to avoid security holes. If you want to store data, you must use Google's database.

The result of all of these limitations is not necessarily a bad thing. Google has stripped Web applications down to a core set of features and built up a pretty good framework for delivering them. I was able to write a simple application with several hundred lines of Python (cutting and pasting from Google's documentation) in less than an hour. Google offers some nice tools for debugging the application on your own machine.

Deploying this application to the cloud should have taken a few seconds, but it was held up by Google's insistence that I fork over my cell phone number and wait around for a text message that tests the number. When my message didn't show up for several hours after retrying, I switched to a friend's phone and finally activated my account.

Google insists on linking your App Engine account to both your cell phone and your Gmail account because -- well, I don't know. I think it's to track down the scammers, spammers, pharmers, phishers, and other fraudsters, but it starts to feel a bit creepy. Maybe it will help customer service and allow them to field support requests with answers like, "Your cell phone shows you filed this report from a location with a liquor license. Your e-mail suggests you're coding while waiting for Chris to get off of work. We suggest going home, sleeping this off, and then it will take you only a few seconds to find the endless loop on line 432 of main.py. BTW, Chris is lying to you and is really out with someone else."

The best users for the App Engine will be groups, or most likely individual developers, who want to write a thin layer of Python that sits between the user and the database. The API is tuned to this kind of job. In the future, Google may add more features for background processing and other services such as lightweight storage, but for now, that's the core strength of the offering. [ See the QuickTime video. ]

GoGrid
GoGrid refers to itself as the "world's first multi-server control panel." GoGrid's offerings aren't functionally different from Amazon's EC2, but using the old term "control panel" seems to be a better description of what's going on than the trendier term "cloud." You start up and shut down load balancers in much the same way as relatively ancient tools like Plesk and cPanel let you add and subtract services.

While GoGrid offers many of the same services as Amazon's EC2, the Web-based control panel is much easier to use than the EC2 command line. You point and click. There's no need to cut and paste information because little pop-up boxes show the way, by suggesting available IP addresses, for example. The system is intuitive, and it takes only a few minutes to build up your network. A simple ledger on the left keeps track of the costs and helps you manage the budget.

GoGrid also has a wider variety of OS images ready to go. There is the usual collection of CentOS/Fedora and common LAMP stacks. If you need Windows, you can have Windows Server 2003 with IIS 6.0, and Microsoft SQL Server is available at extra cost. There are also images with Ruby on Rails, PostgreSQL, and the Facebook application server. These make it a bit easier to start up.

While GoGrid offers many of the same features as Amazon's EC2, it doesn't provide more cloudlike services for storing information in a shared way like SimpleDB. This can make it a bit harder to start up and shut down servers without a bit of grief. The startup notes for the service point out that the only way to stop paying for a server is to delete it, and that means losing all of the data on it.

There's no simple way to build custom images at this moment, but the documentation says GoGrid is working on a way to turn any running server into an image that can be restarted later. If you're going to be expanding and contracting your network as the traffic ebbs and flows, you'll have to come up with some tools of your own to add and subtract these servers. [ See the QuickTime video. ]

AppNexus
If you like the idea of the cloud but aren't sure if you want to leave behind the old trustworthy world of Unix, cron jobs, and other tools, then AppNexus is a service that aims to be a bit more transparent. The company has taken a big, industrial-sized server farm with the best load-sharing tools and storage boxes and found a way to let you buy it in small portions. AppNexus provides a number of command-line abstractions that let you turn servers on and off, but they also let you drill down into the file system.

The main functions of the AppNexus cloud are similar to Amazon's EC2. You log in through a command line and boot up images of Linux distributions. AppNexus says it can rebuild images from other sources like Amazon's EC2 by replacing the kernel with a version that's more aware that it is running in a virtual environment. Then it just takes a few key clicks on a command line to set up a load balancer.

One open question in the world of cloud computing is where the abstraction occurs; that is, where do the walls between the machines become blurred and it all starts to look a bit cloudy? Amazon's SimpleDB hides the storage behind a software wall and gives you access to it through some Web service call. AppNexus is working at a lower level by building in a cluster of Isilon IQ X-Series storage clusters into its cloud.

This gives you the option of simply mounting the storage and sharing the data across your cluster of servers -- if you consider that simple. Instead of working with abstract keys, you use real file names as the keys. The cluster handles the rest of the work.

A better solution is to use what AppNexus calls its CDN, or Content Delivery Network. The storage cluster has its own set of HTTP servers built in, and you can automatically begin serving static data from your files. Just write the files to the /cdn directory and they become available. AppNexus will distribute this storage cloud to multiple datacenters, making it simpler to serve up the static data from the closest location. [ See the QuickTime video. ]

The fine print
One of the ways to go truly insane is to read the terms of service for these clouds. While the people who wrote the old co-location contracts could try to imagine the data as living on a single server that was in a certain box owned by a certain person and residing in a certain jurisdiction, all bets are off with a cloud. The whole point is that it isn't confined to one box, one building, or even one country.

Some of the service agreements are very specific and clear. GoGrid, for instance, spells out numerical thresholds for standard values such as latency, jitter, and packet loss for the six continents. If the cloud doesn't meet them, GoGrid promises to give you service credits for 100 times the amount lost.

Other terms are deliberately murky. You might consider it fairly capricious for Amazon to demand the right to terminate your account "for any reason" and "at any time," but the company also carefully reserves the right to terminate your account for "no reason" too. In other words, "It's not you, honest. It's me. No. I take that back, it's not even me. It's just over between the two of us. No reason."

Google's terms seem more generous, indicating it will terminate accounts only if you breach the terms of the agreement or do something unlawful. But Google does reserve the right to "pre-screen, review, flag, filter, modify, refuse, or remove any or all Content from the Service." I want to say that the terms seem more reasonable than they were when I read them several weeks ago, but I can't be sure. And it doesn't matter too much because new terms apply whenever Google wants to change them, and you signify your acceptance by continuing to use the service.

If you think it's hard to work through the legal rules when a server is in one state and a user is in another, imagine the right answer when your virtual server could migrate within a cloud that might encompass datacenters spread out across the globe. Amazon's terms, for instance, prohibit you from posting content that might be "discriminatory based on race, sex, religion, nationality, disability, sexual orientation, or age." It sounds like Amazon is worried that part of the cloud might touch down in a municipality that forbids things like this.

It almost seems scary to mention this fact, but New York is insisting that Amazon charge sales taxes because Amazon pays a commission to Web sites that do business in the state. What does this mean for applications hosted by Amazon? Do you owe sales tax if your application touches down in a part of the cloud that's in New York? Do you owe income tax?

I wanted to make some allusion to Schrodinger's cat and imply that we can't know where the computation occurs in the cloud, but then I slowly realized that this is far from true. Cloud servers have log files too, and these log files can produce insanely detailed analyses of who might owe which taxes. Major league athletes already hire tax attorneys to compute their share of income earned in each stadium, and some people are suggesting that Web companies aren't paying enough to support the local fire trucks and orphanages. Say good-bye allusions to Joni Mitchell; it's time to start invoking Warren Zevon's "Lawyers, Guns, and Money."

Crashing the cloud metaphor
The legal worries are just part of the details that aren't so certain. One of the biggest dangers is reading too much into the cloud metaphor. While it's largely true that these services are very flexible ways to build up a network of machines, they are far from perfect. What happens if a server or a hard disk crashes in the middle of an operation? Often the same thing that happens when a generic server kicks the bucket: Your data might disappear and then it might not.

An instance of a machine from Amazon's EC2 looks just like a normal machine because after you strip away the hype, it is just another version of Linux running on a chip that probably speaks 8080 machine code and writes data to a spinning platter. If you write something to a good old file in the Unix file system, the cloud metaphor won't protect it. It will stay there until the machine dies. If you shut down the server to save some cash when traffic is low, that's the same thing as dying. That means you can't really scale up and down without a savvy plan for migrating data.

In other words, MySQL in the cloud works just like it does on a generic server. Everything could be lost in a poof unless you start up several instances and mirror them with each other. The magic of the cloud metaphor can't remove this fundamental rule.

If you want something to survive a crash, you've got to put it into the cloud's data stores. These are great services, but they're not cheap. One friend of mine used to back up his disks to Amazon's S3 until he started getting bills for more than $200 a month. He bought a hard disk and kept it on his desk.

The price is higher because the service level is higher. Amazon wants people to be able to trust the data store, and that means providing a level of service that would make a bank happy. Sharing data across servers takes time and careful coding. Google cautions users to be careful writing to its data store because it can be expensive. If you're someone who likes to keep lots of log files just in case, you'll probably pay much more to store them in the cloud than you would in a regular file. Alas, Google doesn't have regular files.

One of the trickier details is trying to understand the prices. GoGrid, for instance, likes to say that its Intel Xeon servers are more powerful than its competitors. Google doesn't even sell server time per se; it just bills you for CPU megacycles, a squirrelly metric. Amazon EC2 has regular-sized machines and bigger ones that are a bit more expensive. When costs change, the companies often lower their prices. But they also raise them when a service turns out to be more expensive to provide than they thought. This complexity will have you scratching your head for a long time because it's hard to know what things will end up costing. That box from Sun may not scale up and down, but the bill isn't going to change with every hit on your Web site.

Best and worst
After working through these systems, I tried to imagine the best and worst applications for these clouds. One of the best fits might be some kind of reservation system for weekend events like concerts. While there might be a small amount of the load at any time, the crunch would come each Friday afternoon when people realize they have no weekend plans. The cloud's ability to spin up more servers to handle this demand would fit this perfectly. The service might also take real reservations and sell tickets in advance, a service that would demand the higher qualities of service offered by the shared data stores.

The worst possible application might be something like RedSoxYankeesTrashTalk.com or any Web site filled with an endless stream of mostly forgettable comments trolling for reactions from the rival fans. While there might be a slight peak around game time, I've found that sites like this keep rolling along even late at night during the off-season. And such a site would certainly attract First Amendment proponents who would look for ways to write a single sentence that could zing all seven of Amazon's protected targets of discrimination.

Furthermore, there would be no reason to pay for high-quality storage because I'm sure that even the participants wouldn't notice if their comments disappeared by mistake. For fun, read Amazon's terms on getting your data back after they shut you down. While I would probably write the same thing if it were my cloud, there are plenty of examples of applications that are better off on their own.

These examples aren't perfect, of course, but neither is cloud computing. After a few weeks of building up some machines and hearing from people who've used the services, I'm pleasantly confused and filled with curious and optimistic questions. Will these clouds be large enough to handle the Internet equivalent of the Thanksgiving weekend traffic jams? Will the cloud teams be able to find a way to offer simple options that are priced correctly for the serious and not-so-serious data wrangler? Will they ever find an adequate meter for computation time?

I suspect the only people who know the answers to these questions today are living in the real clouds where they went after a life ministering to the IBM mainframes. If we could get those guys back here today, we might be able to get this cloud thing up and running smoothly. We just have to convince Intel to build a chip that understands IBM 360 binaries.

Cloud Computingの定�

Cloud Computingに�してまた新たな解�を述べた�事。  今ひとつ分解能が�えないが、このエリアに�する各社の考え方の�いや、多少の混乱が�くものと想像する。
 

What cloud computing really means

The next big trend sounds nebulous, but it's not so fuzzy when you view the value proposition from the perspective of IT professionals



By Galen Gruman


April 07, 2008

Cloud computing is all the rage. "It's become the phrase du jour," says Gartner senior analyst Ben Pring, echoing many of his peers. The problem is that (as with Web 2.0) everyone seems to have a different definition.

As a metaphor for the Internet, "the cloud" is a familiar cliché, but when combined with "computing," the meaning gets bigger and fuzzier. Some analysts and vendors define cloud computing narrowly as an updated version of utility computing: basically virtual servers available over the Internet. Others go very broad, arguing anything you consume outside the firewall is "in the cloud," including conventional outsourcing.

[ Learn more about the new breed of utility computing and platform-as-a-service offerings. ]

Cloud computing comes into focus only when you think about what IT always needs: a way to increase capacity or add capabilities on the fly without investing in new infrastructure, training new personnel, or licensing new software. Cloud computing encompasses any subscription-based or pay-per-use service that, in real time over the Internet, extends IT's existing capabilities.

Cloud computing is at an early stage, with a motley crew of providers large and small delivering a slew of cloud-based services, from full-blown applications to storage services to spam filtering. Yes, utility-style infrastructure providers are part of the mix, but so are SaaS (software as a service) providers such as Salesforce.com. Today, for the most part, IT must plug into cloud-based services individually, but cloud computing aggregators and integrators are already emerging.

InfoWorld talked to dozens of vendors, analysts, and IT customers to tease out the various components of cloud computing. Based on those discussions, here's a rough breakdown of what cloud computing is all about:

1. SaaS
This type of cloud computing delivers a single application through the browser to thousands of customers using a multitenant architecture. On the customer side, it means no upfront investment in servers or software licensing; on the provider side, with just one app to maintain, costs are low compared to conventional hosting. Salesforce.com is by far the best-known example among enterprise applications, but SaaS is also common for HR apps and has even worked its way up the food chain to ERP, with players such as Workday. And who could have predicted the sudden rise of SaaS "desktop" applications, such as Google Apps and Zoho Office?

2. Utility computing
The idea is not new, but this form of cloud computing is getting new life from Amazon.com, Sun, IBM, and others who now offer storage and virtual servers that IT can access on demand. Early enterprise adopters mainly use utility computing for supplemental, non-mission-critical needs, but one day, they may replace parts of the datacenter. Other providers offer solutions that help IT create virtual datacenters from commodity servers, such as 3Tera's AppLogic and Cohesive Flexible Technologies' Elastic Server on Demand. Liquid Computing's LiquidQ offers similar capabilities, enabling IT to stitch together memory, I/O, storage, and computational capacity as a virtualized resource pool available over the network.

3. Web services in the cloud
Closely related to SaaS, Web service providers offer APIs that enable developers to exploit functionality over the Internet, rather than delivering full-blown applications. They range from providers offering discrete business services -- such as Strike Iron and Xignite -- to the full range of APIs offered by Google Maps, ADP payroll processing, the U.S. Postal Service, Bloomberg, and even conventional credit card processing services.

4. Platform as a service
Another SaaS variation, this form of cloud computing delivers development environments as a service. You build your own applications that run on the provider's infrastructure and are delivered to your users via the Internet from the provider's servers. Like Legos, these services are constrained by the vendor's design and capabilities, so you don't get complete freedom, but you do get predictability and pre-integration. Prime examples include Salesforce.com's Force.com, Coghead and the new Google App Engine. For extremely lightweight development, cloud-based mashup platforms abound, such as Yahoo Pipes or Dapper.net.

[ Get the complete view of the cloud in our special report. ]

5. MSP (managed service providers)
One of the oldest forms of cloud computing, a managed service is basically an application exposed to IT rather than to end-users, such as a virus scanning service for e-mail or an application monitoring service (which Mercury, among others, provides). Managed security services delivered by SecureWorks, IBM, and Verizon fall into this category, as do such cloud-based anti-spam services as Postini, recently acquired by Google. Other offerings include desktop management services, such as those offered by CenterBeam or Everdream.

6. Service commerce platforms
A hybrid of SaaS and MSP, this cloud computing service offers a service hub that users interact with. They're most common in trading environments, such as expense management systems that allow users to order travel or secretarial services from a common platform that then coordinates the service delivery and pricing within the specifications set by the user. Think of it as an automated service bureau. Well-known examples include Rearden Commerce and Ariba.

7. Internet integration
The integration of cloud-based services is in its early days. OpSource, which mainly concerns itself with serving SaaS providers, recently introduced the OpSource Services Bus, which employs in-the-cloud integration technology from a little startup called Boomi. SaaS provider Workday recently acquired another player in this space, CapeClear, an ESB (enterprise service bus) provider that was edging toward b-to-b integration. Way ahead of its time, Grand Central -- which wanted to be a universal "bus in the cloud" to connect SaaS providers and provide integrated solutions to customers -- flamed out in 2005.

Today, with such cloud-based interconnection seldom in evidence, cloud computing might be more accurately described as "sky computing," with many isolated clouds of services which IT customers must plug into individually. On the other hand, as virtualization and SOA permeate the enterprise, the idea of loosely coupled services running on an agile, scalable infrastructure should eventually make every enterprise a node in the cloud. It's a long-running trend with a far-out horizon. But among big metatrends, cloud computing is the hardest one to argue with in the long term.

Microsoft to Acquire DATAllegro - MarketWatch

Microsoft社がデータウェアハウスのベンダー、DATAllego社と買収する移行を発表。
データウェアハウス事業の重要性、特にDATAllegro社のように規模の大きい数百テラバイト級のデータボリュームを管理できる機能とツールを備えたベンダーに注目している点が興味深い。
 

Microsoft to Acquire DATAllegro

Leaders in data warehousing team to provide large-scale business intelligence solutions.


Last update: 2:40 p.m. EDT July 24, 2008
ALISO VIEJO, Calif. and REDMOND, Wash., July 24, 2008 /PRNewswire-FirstCall via COMTEX/ -- Microsoft Corp. today announced that it intends to acquire DATAllegro Inc., a provider of breakthrough data warehouse appliances. The acquisition will extend the capabilities of Microsoft's mission-critical data platform, making it easier and more cost-effective for customers of all sizes to manage and glean insight from the ever-expanding amount of data generated by and for businesses, employees and consumers.
 
 
 
 
"DATAllegro is a tremendously innovative company that has started to redefine the data warehouse market," said Ted Kummert, corporate vice president of the Data and Storage Platform Division at Microsoft. "Microsoft SQL Server 2008 delivers enterprise-class capabilities in business intelligence and data warehousing, and the addition of the DATAllegro team and its technology will take our data platform to the highest scale of data warehousing."
 
"Integrating DATAllegro's nonproprietary hardware platform and flexible software architecture into Microsoft SQL Server will provide customers with the strongest offering in the market," said Stuart Frost, CEO of DATAllegro. "We are excited to join forces with Microsoft and continue the innovation this company was founded on."
 
Unlike most data warehouse appliance vendors targeting the 1-25 terabyte range, DATAllegro has specialized in large-volume, high-performance data warehouses. DATAllegro's data warehouse appliance installations boast some of the largest data volume capacities in the industry -- up to hundreds of terabytes on a single system. DATAllegro clients span such markets as retail, telecommunications and manufacturing.
 
According to a report by Donald Feinberg of Gartner Inc., "As data warehouses are becoming more strategic to organizations and as data warehouse appliances mature, the adoption rate of the data warehouse appliance is increasing rapidly." ("Data Warehouse Appliances Are More Than Just Plug-And-Play," July 13, 2007.)
 
In addition to offering large capacities, DATAllegro's patent-pending technology is designed for complex workloads including high concurrency and mixed queries. DATAllegro is one of the few data warehouse appliances built on a nonproprietary hardware platform including Dell and Bull servers and EMC storage. This flexible architecture makes it ideally suited to integrate with Microsoft SQL Server.
 
After completing the acquisition, Microsoft will retain most of DATAllegro's team as well as its headquarters in Aliso Viejo, Calif., making it a Center of Excellence for data warehousing. Existing DATAllegro customers will continue to be supported.
 
"We are pleased to support DATAllegro's pending acquisition," said Lisa Lambert, managing director of the Software and Solutions Group for Intel Capital. "DATAllegro's integration with SQL Server is the optimal next-generation solution, and the acquisition by Microsoft is a great conclusion for the company."
 
To help customers of all sizes keep up with the current "data explosion" and allow them to benefit from the next generation of data-driven applications, Microsoft is focused on delivering not just a database, but a data platform. A leader in data warehousing and business intelligence (BI), Microsoft SQL Server includes comprehensive, tightly integrated functionality for data management as well as advanced BI out of the box. SQL Server delivers on Microsoft's vision for pervasive BI by providing capabilities for large-scale data warehousing, rich interoperability with Microsoft Office, and enhanced functionality for Microsoft's BI solutions. SQL Server is a key element of the broader Microsoft Application Platform, a portfolio of technology capabilities and core products that help organizations develop, deploy and manage dynamic applications and IT infrastructure.
 
About DATAllegro
DATAllegro offers the most advanced data warehouse appliance on an enterprise-class platform. By combining DATAllegro's patent-pending software with the industry's leading hardware, storage and database technologies, DATAllegro has taken data warehouse performance and innovation to the next level. DATAllegro v3 goes beyond the high performance of first generation data warehouse appliances and adds the flexibility and scalability that only an open platform can offer. The result is a complete data warehouse appliance that enables companies with large volumes of data to increase their business intelligence.

2008年7月25日金曜日

SaaSの利点、欠点整理

 

Benefits and Drawbacks of SaaS

Benefits

For the Consumer

For the Provider:

No client/server software installation or maintenance - that's right, no more 800-page planning and implementation guides.

Aggregate operating environment - as a provider, you own your domain.  No longer are you sending technicians to fix or customize your software because it doesn't fit into a customer's highly-specialized (or horribly outdated) infrastructure.  You have complete control to optimize an infrastructure to your SaaS application's specific requirements.  This is synergy at its best, and leads to financial savings as well as less headaches.

Shorter deployment time - potentially minutes as opposed to a phased implementation that could take months.

Predictable Revenue Stream - the subscription model associated with SaaS means that your customers will pay you on a recurring schedule.  If you make this cycle flexible enough, you can get a real handle on forecasting revenues.  The payment may be tied to your product (think cell phone plans) where everybody pays according to the same term, or tied to your individual subscribers where some may pay monthly, some yearly, and some quarterly. 

Global availability - sure the technology exists to make on-premise software available outside of the premises, but we're talking about functionality that is available from anywhere on the internet natively

Predictable Growth - Same as above, but here we're talking about sheer volume of subscribership.  The fact that users hit your site to access the application means that with the right tools you can monitor their usage pretty closely - something that's not so easy with all your customers running the application on premise.

Service Level Agreement (SLA) adherence - reported bugs can be fixed minus any rollout overhead.  Sure the provider actually has to fix the issue, but assuming they've deployed a moderately efficient SaaS application the rollout of a patch or fix should happen in the blink of an eye.

Focus On Smaller Upgrades Instead of Monster Patch Rollouts - and while you're at it, don't worry about rollout logistics across all of your customer sites either.  Your development teams can focus on fixing core application functionality, tackling bugs and enhancing features in smaller incremental rollouts because it's just easier to do so.

Constant, Smaller, Upgrades - when you use a SaaS application, it is in the best interest of the provider to keep you happy and they can do so by constantly improving the application experience.  With SaaS this can come in the form of consistent miniscule changes that add up over time instead of monster patch and upgrades that cost you time and money to implement. 

Sales Becomes Customer Relationship Management - When you are selling a subscribable service, the game of gaining subscribership becomes one of balancing user retention vs. attrition more than a game of landing the 'big deals'.  Sure, it's important to have a team out there pounding the pavement to sell your application - i.e. getting subscribers in the door - but the real thrust of the new sales and marketing in SaaS is customer relationship management.  The equation becomes quite simple - keep retention rates higher than attrition rates and focus on bringing in new customers. 

Ease Your Internal IT Pains - This is a big one. Most of the last several points here highlight that SaaS offloads a great deal of IT pains incurred by software consumers in the traditional client/server model.  This leaves IT personnel to focus on improving the day-to-day technical operations of your company instead of being called upon to troubleshoot 3rd party software or maintain aging infrastructure. 

 

Redistribute IT Budget - by outsourcing software functionality to a provider, the enterprise realizes a cost savings in infrastructure requirements and IT personnel knowledge requirements.  This allows the enterprise to focus on core competencies.  It also means that the cost savings from using SaaS applications can be flat out saved, or reallocated to boost productivity through other services

 


 

Drawbacks

For the Consumer:

For the Provider:

No direct control of the data - One of the biggest hurdles to get over is the control of the data. Specifically, what happens when things go wrong? I'm sure every company trying to sell you a product will tell you that things can't go wrong and that they will be there to support you for years to come. It is important that you ask the difficult questions: How safe is my data? Will I be able to download it? Will it be disposed of properly and safely? Can it be sold? Can anyone else host the application and my data? Will the application source be opened so that hosting can happen in house or by another provider? Stories of companies going belly up are not uncommon, and not only for SaaS companies but for traditional software companies as well.

Focus on customer satisfaction - This is one of those bad things that it's good to have and makes great companies but we have to mention it anyways. SaaS providers need to focus on customer satisfaction month in and month out or they will lose their customers. They need to earn their customer's business every month or they can simply leave. Contrary to on-premise deployments which are very costly and time consuming, if your customer is unhappy with the service he can up and leave at any time with very minimal cost. Some might argue that you can negotiate longer term contracts, make it hard to take their data and all other kind of shenanigans but if you ask me, it is bad practice and if you are not the best, then you better have one damn good reason why they should stick with you other than a binding contract.

Internet connection required - I don't know of many businesses that run without an internet connection these days. Nonetheless, it could affect your operations if you need to access an application and the internet connection is down.  A good set of companies are trying to solve this problem by allowing their applications to continue to work in a disconnected fashion for a period of time but at some point you will need to sync back up to the server. If this is a big concern for you make sure that your provider can address this need.

Harder development process - There are many different approaches to writing SaaS applications and they are outside the scope of this article but the bottom line is that there is a whole new set of things that you need to worry about when writing a SaaS application that you otherwise wouldn't need if it were a traditional on-premise deployment. Things like tenant isolation, provisioning and scalability to mention a few could be a hard thing to tackle where you wouldn't even have to think about if you were writing an on-premise application.

Dependence on an outsider to run your business - In a big way, you are trusting an outsider to help you run your business, and if they are not keeping their end of the bargain it can really affect you. To keep it in perspective, these people are out there to stay in business and they do this for a living so arguably 95 out of 100 times they can do it a lot better than you could in house. This does not mean that you shouldn't be aware of the implications so make sure you ask the tough questions.

Compensation issues - One of the early problems for SaaS providers is how to maintain operations when there is only very little money coming in. Unlike traditional on-premise deployments where one deal could bring you $60,000 upfront and carry you for a couple of months while you close more deals, SaaS deals are MUCH smaller so initially it will be a lot harder to maintain operations unless you are properly funded so you can survive until enough money is coming in .

Security awareness - Another big hurdle is security. This concern is the umbrella that is home to the concerns above, as the common thread among them all is that they make you consider how "secure" you feel with SaaS. You are trusting your really valuable data to someone else. This can be a painful reality to accept but most security breaches occur because of disgruntled internal employees that end up selling or releasing the data when they are fired or when they quit, having your data managed and stored by an expert of the application is not a bad idea as long as they take it as seriously as you would.

Success can be a problem - You've heard many times that being too successful is a great problem to have but in the case of SaaS it can literally bring you to your knees if you are not prepared. This goes back to my second point of SaaS being harder to develop. Things can grow out of control if the application is not architected properly and addresses scalability issues and your service can become unusable over time if it does not scale properly with the addition of new tenants. Make sure you don't leave the hard decisions for later because you will run into a wall down the road.

This blog entry is a tabulated version of SaaS' Benefits and Drawbacks

Eucalyptus - Cloud Computing間の共通インタフェースを提供

Eucalyptus社は、Amazon Web Services、EC2を共通インタフェースとして定め、他のCloud Computing環境にEC2と同じレイヤーを開発/移植し、複数のCloud Services間のインタオペラビリティを提供する技術を提供する会社。 
Cloud Computingは容易にWebアプリケーションを開発できる環境を提供する一方、固有のCloud環境にロックインされる、という懸念がある、複数Cloud間をつなげる手法の必要性が謳われている。
 

Eucalyptus - Build Your Own Private EC2 Cloud

Update: InfoQ links to a few excellent Eucalyptus updates: Velocity Conference Video by Rich Wolski and a Visualization.com interview Rich Wolski on Eucalyptus: Open Source Cloud Computing.

Eucalyptus is generating some excitement on the Cloud Computing group as a potential vendor neutral EC2 compatible cloud platform. Two reasons why Eucalyptus is potentially important: private clouds and cloud portability:

Private clouds. Let's say you want a cloud like infrastructure for architectural purposes but you want it to run on your own hardware in your own secure environment. How would you do this today? Hm....

Cloud portability. With the number of cloud offerings increasing how can you maintain some level of vendor neutrality among this "swarm" of different options? Portability is a key capability for cloud customers as the only real power customers have is in where they take their business and the only way you can change suppliers is if there's a ready market of fungible services. And the only way their can be a market is if there's a high degree of standardization.

What should you standardize on? The options are usually to form a great committee and take many years to spec out something that doesn't exist, nobody will build, and will never really work. Or have each application create a high enough layer interface that portability is potentially difficult, but possible. Or you can take a popular existing API, make it the general API, and everyone else is accommodated using an adapter layer and the necessary special glue to take advantage of value add features for each cloud.

With great foresight Eucalyptus has chosen to create a cloud platform based on Amazon's EC2. As this is the most successful cloud platform it makes a lot of sense to use it as a model. We see something similar with the attempts to port Google AppEngine to EC2 thus making GAE a standard framework for web apps. So developers would see GAE on top of EC2. A lot of code would be portable between clouds using this approach. Even better would be to add ideas in from RightScale, 3Tera, and Mosso to get a higher level view of the cloud, but that's getting ahead of the game.

Just what is Eucalyptus?

From their website:

Overview ¶

Elastic Computing, Utility Computing, and Cloud Computing are (possibly synonymous) terms referring to a popular SLA-based computing paradigm that allows users to "rent" Internet-accessible computing capacity on a for-fee basis. While a number of commercial enterprises currently offer Elastic/Utility/Cloud hosting services and several proprietary software systems exist for deploying and maintaining a computing Cloud, standards-based open-source systems have been few and far between.

EUCALYPTUS -- Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems -- is an open-source software infrastructure for implementing Elastic/Utility/Cloud computing using computing clusters and/or workstation farms. The current interface to EUCALYPTUS is interface-compatible with Amazon.com's EC2 (arguably the most commercially successful Cloud computing service), but the infrastructure is designed to be modified and extended so that multiple client-side interfaces can be supported. In addition, EUCALYPTUS is implemented using commonly-available Linux tools and basic web service technology making it easy to install and maintain.

Overall, the goal of the EUCALYPTUS project is to foster community research and development of Elastic/Utility/Cloud service implementation technologies, resource allocation strategies, service level agreement (SLA) mechanisms and policies, and usage models. The current release is version 1.0 and it includes the following features:

* Interface compatibility with EC2
* Simple installation and deployment using Rocks cluster-management tools
* Simple set of extensible cloud allocation policies
* Overlay functionality requiring no modification to the target Linux environment
* Basic "Cloud Administrator" tools for system management and user accounting
* The ability to configure multiple clusters, each with private internal network addresses, into a single Cloud.

The initial version of EUCALYPTUS requires Xen to be installed on all nodes that can be allocated, but no modifications to the "dom0" installation or to the hypervisor itself.

2008年7月24日木曜日

Amaxon EC2上のアプリケーションの構築事例

EC2を使ったシステム構築事例。
かなり詳細に喜寿されており、参考になる。
 
 

How We Built a Web Hosting Infrastructure on EC2

In the months prior to leaving Heavy, I led an exciting project to build a hosting platform for our online products on top of Amazon's Elastic Compute Cloud (EC2).  We eventually launched our newest product at Heavy using EC2 as the primary hosting platform.

I've been following a lot of what other people have been doing with EC2 for data processing and handling big encoding or rendering jobs.  This is not one of those projects.

We set out to build a fairly standard LAMP hosting infrastructure where we could easily and quickly add additional capacity.  In fact, we can add new servers to our production pool in under 20 minutes, from the time we call the "run instance" API at EC2, to the time when public traffic begins hitting the new server.  This includes machine startup time, adding custom server config files and cron jobs, rolling out application code, running smoke tests, and adding the machine to public DNS.

What follows is a general outline of how we do this.

Architecture Summary

Heavy makes use of a pretty standard LAMP stack.  Administration scripts are written in PHP, Perl, or Bash.  There is a lot of caching in memory (memcached), file caching, and HTTP caching (Akamai).  The new site requires a layer of front-end web servers that double as application servers and a database layer (with replication).  The site is built entirely in PHP, making use of Zend Framework.  The database is MySQL.  We are not using Amazon's SimpleDB service.

EC2 Hosting Architecture. Click for full-size image.

EC2 Hosting Architecture. Click for full-size image.

EC2 Images

I chose CentOS for the operating system on our machine images.  All of the machine images we built are designed specifically for their purpose, and there are a handful of them.  For example, web servers run Apache, PHP, and some Perl libraries.  Databases are installed with MySQL and PHP (for administration scripts).  Memcached nodes are built with memcached and barely anything else.

Thorsten von Eicken at RightScale has written  a lot of great material about their use of EC2.  I took a lot of ideas from their blog, including the use of RightScripts.  After banging around with some publicly available images, I started building my own by modifying their scripts for 32-bit and 64-bit Cent OS images.

It took a little getting used to the manner of building images with these scripts, and getting the right software packages installed.  Eventually it clicked and what I ended up with was a really simple script for building each type of machine that we would need.  Even better, it was simple to go back and re-configure any of the images and roll them into new machines.

Many thanks to Thorsten for providing these scripts.

Running Instances

Amazon provides some fantastic command line tools for managing EC2 instances and getting status on the service.  Unfortunately, these tools don't really help much in terms of documenting what each server is doing. To keep track of this, I went to work building a control panel for our EC2 account that documents the roles for each machine and what products are running on it.  Our plan was not to run a single web site, but multiple sites/products each with their own database and web server clusters.

The control panel, by the way, lived on a physical machine of ours (at RackSpace), and not on EC2.

We realized early that all of these machines would need to know how to find each other.  Our control panel manages a global configuration file that lives in our S3 account that documents all of our servers' roles. Every server is setup to to inspect this file and adjust its own application environment.  For example, when a new web server comes online it  grabs the configuration file and figures out which databases belong to the web site running on the instance.  If a database fails and a slave takes over as the new master, web servers can figure that out on their own without anyone manually logging in to change configs or host files.

Load Balancing

Although we had tested the EC2 servers to handle very high loads of traffic, we have the luxury of using Akamai's Site Accelerator product in front of our web servers.  This allows for easy page caching in front of our web servers, and actually handles about 90% of the hits to the site.  Our web servers serve as the origin for Akamai's proxy.  Rather than fooling around with additional servers to handle load balancing and configuring proper failover between them, we simple use round-robin DNS.  As it turns out, our load is very evenly distributed amongst the web servers.

Database Replication

Most people I've talked to about this setup want to know how we felt about hosting our database on EC2.  The best answer I can give is, "nervous". Since EC2 doesn't have persistent storage on machine instances, yet, we were liberal with setting up replication and backup servers.  A single master database is replicated to a slave (master candidate), and that slave replicates to a second slave (slave candidate).  Scripts were written to handle automated failover if the master becomes inaccessible; the master candidate is automatically be promoted to master, and our global configuration file is updated so that all of the web servers are aware of the change.

Furthermore, we run a second slave from the master database. This slave has a single role: dump snapshots of the database every 15 minutes and store them on S3.  If all of our EC2 instances should ever disappear from EC2, we have recent copies of the database on S3.

In all, 4 databases instances.  We're being careful.

Server Configuration

Our images are designed to support a specific "role" for a machine, such as a database, web server, etc.  Once started, we identify "products" that will run on each machine.  These might be things like "HuskyMedia.com", "Heavy.com", "Video Encoding", etc.  Obviously, each of our products requires its own set of service configurations (Apache, MySQL), users and groups, and cron jobs.

We chose Puppet to roll out these configurations to new machines (hat tip to Justin Shepherd at RackSpace for this suggestion).  If you're not familiar with Puppet, you can create classes, or roles, for each of your servers.  In the classes, you define configurations files, cron jobs, packages to install, etc.  Finally, you identify the hosts that belong to each class.  When a new machine is started up (plain old vanilla), it checks into the Puppet "master" server, and the master sends over the proper configs.

When Puppet works, it totally rocks.  It has its drawbacks, however.  There is (what I consider) a steep learning curve for its configuration language.  It's also still very much in development.  When we upgraded to a new software version, the master server didn't seem to play well with clients that were still on the old version.  We jumped through some hoops to get all of our clients talking to the master server again.

On the upside, however, we must have gone through setting up over 100 machine instances using Puppet.  And that would have taken hundreds of hours in server administration to get each machine configured.  Additionally, Puppet can do package management, tying into whatever package manager you use in your Linux distribution.  If we had started using Puppet earlier, we might have stuck with two baseline machine images, one for 32-bit, and one for 64-bit architectures.  Then we could allow Puppet to handle all of the software installations.

Many thanks to the guys at Reductive Labs.  Puppet is a very cool piece of software!

Monitoring

We use two pieces of software for monitoring: Munin and Nagios.  The Munin server we use is the same that we use for our physical machines.  The simple configuration needed for Munin nodes is built into our machine images, along with the properly installed plugins.  The control panel we built for EC2 also updates our local Munin server configuration to listen for the new machines.  As soon as we start a new EC2 instance, it begins to show up in our reports.

Our Nagios configuration is a work in progress.  There are two installations, one that we use on our physical machines, and one the lives within EC2.  The EC2-based installation is monitored by the one installed on our physical machines.  It is not tied as tightly to our control panel, yet, but it seems likely for that to happen soon.

Availability Zones

Not to be overlooked, availability zones allow you to distribute your EC2 instances across separate fault-tolerant groups.  If one availability zone goes down, machines in other zones should theoretically be insulated from the same issue, i.e. separate power, separate network connectivity, etc.

We built a color-coded indicator in our control panel of the availability zone where each machine is running.  This makes it easy for us to make sure that we balance our servers equally throughout all of the zones.

Failover

It's always handy to have a physical backup, especially since EC2 is currently still in "beta".  Since our installation on EC2 uses essentially the same architecture as we use on our physical machines at RackSpace, it would be simple for us to move the entire site back to those servers.  In fact, most of the configurations are already in place.  We also use Neustar as our DNS provider, so we can keep very low TTLs on our hostnames.  When we need to change the location of our origin servers, it'd done in a matter of seconds.

Successes

Here are some successes we took away from this project:

  1. Twenty-minute start up time. Hands down, this is the most impressive for me.  We can spin up new machines and put them into production in under 20 minutes.  This isn't (SkyNet), but it's pretty darn cool.
  2. Loads of scripts and automation. We moved from mostly manual server administration, which we got used to by running only a few physical machines, to a much more automated process.  This improves our general workflow for server administration, whether the servers are virtual machines or physical machines.
  3. Documentation of images. Using image builders based on RightScripts, we have a catalog of what software goes into each new server, cleanly spelled out in Bash. :)
  4. Fault tolerance. We don't know what is going to happen with our virtual machines.  We've seen some unexpected behavior from EC2, and have designed with that in consideration.
  5. Portable hosting. I didn't want to build a hosting architecture just for Amazon Web Services.  I wanted to build a fairly standard LAMP stack, but one that is redundant.  We can take all of these learnings and re-apply them to the physical servers we still run at RackSpace.

Acknowledgments

While I researched and developed much of this project, I couldn't have finished it off without the help of a couple of other guys at Heavy, Matt Spinks and Henry Cavillones.  Matt led the database effort and all of the scripting involved for our automated failover.  Henry took care of our monitoring needs, image maintenance, and helping me iron out some of the issues we were originally seeing with our Cent OS configuration for EC2.  Thanks, guys!

I also want to mention Scott Penberthy, our CTO, who kept us on track and was an excellent sounding board.  Without Scott at the top of this project, it wouldn't have come together.  Thanks!

Finally, the clever work put together and discussed by the guys at RightScale and SmugMug, and countless other blog and forum postings I read during this project to keep me in the right directions.