2008年7月16日水曜日

Demystifying Clouds | IT Management and Cloud Blog

Coud Computing市場のベンダーを分類しようとした記事。
ちょっと複雑な分類方法を取っているが、Cloud Computingが短い期間にかなり進歩している状況が垣間見えてくる。 今後もCloud Computing業界はさまざまな変化を遂げる事が想定される。

Demystifying Clouds

By John | February 5, 2008

force ma·jeure - noun Etymology: French, superior force
Date: 1883 1 : superior or irresistible force 2 : an event or effect that cannot be reasonably anticipated or controlled â€" compare act of god

I'll admit it: I am caught up in the cloud hype. The caveat, however, is that I truly believe that this is disruptive technology. In this article, I am going to try to demystify some of the hype around utility cloud computing and focus in on the companies that are providing cloud solutions and the technology components that they are using. By no means am I professing to be an expert on this subject. My only intent is to render what I have learned thus far in my quest to understand cloud computing.

The Myths

Cloud computing will eliminate the need to IT personnel.

Using my 30 years of experience in IT as empirical proof, I am going to go out on a limb and suggest that this is a false prophecy. One of my first big projects in IT was in the 1980s, and I was tasked to implement “Computer Automated Operations.” Everyone was certain that all computer operators would loose their jobs. In fact, one company I talked to said that its operators were thinking of starting a union to prevent automated operations. The fact was that no one really lost his or her jobs. The good computer operators became analysts, and bad ones became tape operators.

There will only be five super computer utility like companies in the future.

Again, I will rely on empirical data. I have been buying automobiles for as long as I have been grinding IT, and all one has to do is look at the automotive industry's history as a template to falsify this myth. Some clever person will always be in a back room somewhere with an idea for doing it better, faster, cheaper, and cleaner. In all likelihood, there will probably be a smaller number of mega-centers, but it is most likely that they will be joined by a massive eco-grid of small-to-medium players interconnecting various cloud services.

The Facts

Since cloud computing is in a definite hype cycle, everyone is trying to catch the wave (myself included). Therefore, a lot of things you will see will have cloud annotations. Why not? When something is not clearly defined and mostly misunderstood, it becomes one of god's great gifts to marketers. I remember that, in the early days of IBM SOA talk, IBM was calling everything Tivoli an SOA. So I did a presentation at a Tivoli conference called “Explaining the 'S' in SOA and BSM.” Unfortunately, one of IBM's lead SOA architects, not Tivoli and not a marketer, was in my presentation and tore me a new one. I was playing their game, I forgot that it was “Their Game.” Therefore, in this article I will try to minimize the hype and try to lay down some markers on what are the current variations of all things considered clouds.

Level 0

As flour is to a cookie, virtualization is to a cloud. People are always asking me (their first mistake) what is the difference between clouds and the "Grid" hype of the 1990s. My pat answer is “virtualization.” Virtualization is the secret sauce of a cloud. Like I said earlier, I am by no means an expert on cloud computing, but every cloud system that I have researched includes some form of a hypervisor. IMHO, viirtualization is the differentiator between the old "Grid" computing and the new “Cloud” computing. Therefore, my "Level 0″ definition for cloud providers is anyone who is piggy-backing, intentionally or un-intentionally, cloud computing by means of virtualization. The first company that comes to mind is Rackspace, which recently announced that it is going to add hosting virtual servers to their service offering. In fact, it new offering will allow a company to move its current in-house VMware servers to a Rackspace glass house. A number of small players are producing some rain is this space. A quick search on Google will yield monthly plans as low as $7 per month for XEN VPS hosting. It's only a matter of time before cloned Amazon EC2 providers start pronouncing themselves as Cloud Computing because they will host XEN services in their own glass house. These services will all be terrific offerings and will probably reduce costs, but they will not quite be clouds, leaving them, alas, at "Level 0."

Level 1

My definition of "Level 1″ cloud players are what I call niche players. "Level 1″ actually has several sub-categories.

Service Providers

Level 1 service provider offerings are usually on-ramp implementations relying on Level 2 or Level 3 backbone providers. For example, a company called RightScale un-mangles Amazon's EC2 and S3 API's and provides a dashboard and front-end hosting service for Amazon's Web Services (AWS) offering (I.e., EC2 and S3). AWS is what I consider a “Level 2” offering, which I will discuss later in this article.

Service Hybrids

Service Hybrids are players like ENKI and Enomaly. Both companies offer services around backbone cloud providers in the form of services and software. In fact, I was baptized in the clouds by Reuven Cohen, the founder of Enomaly, on a plane ride from Austin to Chicago. I sat next to Reuven, and he was gracious enough to school me on Amazon's AWS. Enomaly offers services and software around Amazaon's AWS, and they are clearly the go-to guys for EC2/S3. ENKI seems to be Enomaly's equivalent but with the 3Tera/Applogic application. 3Tera is what I consider a "Level 3″ technology, which I discuss below.

Pure Play Application Specific

This is where I will admit it gets a little “cloudy.” Seriously, companies such as Box.Net and EMC's latest implementation with Mozy are appearing as SaaS storage plays and piling on the cloud wagon. I am almost certain that companies like SalesForce.com will be confused with or will legitimately become cloud plays. Probably the best definition of a "Level 1 Pure Play" is with EnterpriseDB's latest announcement of running its implementation of PostgreSQL on Amazon's EC2. There are also few rumors of services that are trying to run MySQL on EC2, but most experts agree that this is a challenge on the EC2/S3 architecture. It will be interesting to see Sun's cloud formations flow in regards to its recent acquisition of MySQL.

Pure Play Technology

When ever you hear the terms Mapreduce, Hadoop, and Google File System in regards to cloud computing, they primarily refer to “Cloud Storage” and the processing of large data sets. Cloud Storage relies on an array of virtual servers and programming techniques based on parallel computing. If things like “S(P) = P âˆ' α * (P âˆ' 1)” get you excited, then I suggest that you have a party here. Otherwise, I am not going anywhere near there. I will, however, try to take a crack at explaining MapReduce, Hadoop, and the Google Files Systems. It is no wonder that the boys at Google started all of this back in 2004 with a paper describing a programming model called Mapreduce. MapReduce is used for processing and generating large numbers of data across a number of distributed systems. In simplistic terms, MapReduce is made up of two functions: one maps Key/Value pairs, and another reduces and generates output values for the key. In the original Google paper “MapReduce:Simplified Data Processing on Large Clusters,” a simple example of using GREP to match URL's and output URL counts is used. Those Google boys and girls have come a long way since 2004. Certainly, it is much more complicated than I have described. The real value in MapReduce is its ability to break up the code into many small distributed computations.

Next in this little historical adventure, a gentleman named Doug Cutting implemented MapReduce into the Apache Lucene project, which later evolved into the now commonly known Hadoop. Hadoop is an open source Java-based framework that implements MapRecuce using a special file system called the Hadoop Distributed File System (HDFS). The relationship between HDFS and the Google File System (GFS) is not exactly clear, but I do know that HDFS is open and that it is based on the concepts of GFS, which is proprietary and more likely very specific to Google's voracious appetite for crunching data. The bottom line is that a technology like Hadoop and all its sub-components allows IT operations to process millions of bytes of data per day (only kidding, I couldn't resist a quick Dr. Evil Joke here “Dr. Evil: I demand the sum… OF 1 MILLION DOLLARS “). Actually, what I meant to say quintillions of data per day.

Most of the experts with whom I have talked say that Hadoop is really only a technology that companies like Google and Yahoo can use. I found, however, a very recent blog on how a RackSpace customer is using Hadoop to offer special services to its customers by processing massive amounts of mail server logs to reduce the wall time of service analytics. Now you're talking my language.

Level 2

Level 2 cloud providers are basically the backbone providers of the cloud providers. Amazon's AWS Elastic Cloud Computing (EC2) and Simple Storage Service (S3) are basically the leaders in this space at this time. My definition of a “Level 2” provider is a backbone hosting service that runs virtual images in a cloud of distributed computers. This delivery model supports one to thousands of virtual images in a vast array of typically commodity-based hardware. The key differentiator of a “Level 2” provider vs. a “Level 3” is that the "Level 2″ cloud is made up of distinct single images and that they are not treated as a holistic grid like a “Level 3” providers (more on this later). If I want to implement a load balancer front end with multiple Apache servers and MySQL server on EC2, I have to provide all the nasty glue to make that happen. The only difference between running on Amazon's EC2 and running one's own data center is the hardware. Mind you, that is a big difference, but, even with EC2, I still might need an army of system administrators to configure file systems mounts, network configurations, and security parameters, among other things.

Amazon's EC2 is based on XEN images, and a customer of EC2 gets to create or pick from a template list of already created XEN images. There is a really nice Firefox extension for starting and stopping images at will. Still, if you want to do fancy things like on-demand or autonomic computing type stuff, you will have to use the the AWS API's or use a "Level 1″ provider to do it for you. I currently run this web site on an EC2 cloud. I have no idea what the hardware is and basically only know that it is physically located somewhere in Oklahoma. At least that's what one of the SEO tools says. If I were to restart it, it might wind up in some other city â€" who cares? Clouds are convectious.

The biggest problem with Amazon's EC2 is that the disk storage is volatile, which means that, if the image goes offline, all of the data that were not part of the original XEN image will be lost. For example this blog article will disappear if my image goes down. Of course, I take backups. One might say, "Hey, that is what S3 is for." Good luck. S3 is only for the most nimbus of folks. S3 is only a web services application to put and get buckets or raw unformatted data. S3 is NOT a file system, and, even though some reasonable applications can make it look like a file system, it is still not a file system. For example, the tool Jungle Disk can be set up to mount an S3 bucket to look like a mounted file system. Under the covers, however, it is continually copying data to temporary space that looks like a mounted file system. We have found most (not all) of the open tools around S3 to be not-ready-for-production-type tools. Also, remember that EC2 and S3 are still listed as Beta applications. I list at the end of this article a number of good articles about the drawbacks of using EC2/S3 as a production RDBMS data store. Recently, an interesting point was made to me that a lot of how EC2/S3 works is really based on Amazon's legacy. Before it offered EC2/S3 as a commercial service, it was more than likely used as its core e-tailor infrastructure. Although EC2/S3 might seem like an odd way to provide this kind of service, I am certain that it rocks as an infrastructure for selling books and CD's.

Another player in the “Level 2” game is Mosso. Mosso is a customer of Rackspace, and it has added some secret sauce to VMWare to provide an EC2 look alike. The good news is that its storage is permanent and that there is no S3 foolishness. It will be interesting to see if Mosso can compete with a proprietary hypervisor (VMWare) vs. an open source hypervisor like XEN, which is used by EC2.

In historic IBM fashion, IBM has announced its “Blue Cloud” a few years before it plans to deliver it. Last year when it was announced, it had a specified delivery for the first half of 2008. As most of you know, I have been a flea on IBM 'sback for over 30 years and have always been amazed at the brilliance of its slow-play marketecture strategies. It usually announces a new model for IT very early in its internal development cycle (i.e., like not started yet). Then, IBM sits back and watches the model mature and decides on the best entry point for joining. More often that not, its entry point is an acquisition. IMHO, this is a likely scenario for IBM's entry into cloud space. The bold reality is that IBM will be a significant player in cloud computing. In fact, just its announcement last year already has put it in play. IBM's strongest position in cloud computing is probably going to be with its enterprise facing customers. The enterprise 5k will probably trust IBM to help them make the cloud transitions when the time is right. They recently have announced a large cloud computing initiative in China. Sit back, have some popcorn, and enjoy the fun.

Obviously, other potentials are Dell, Oracle, Sun, Microsoft, and HP. There are not enough hours in a day for me to start guessing what those boys and girls are going to do. However, two giants that should be discussed are Google and Yahoo. Now, if Microsoft pulls off the acquisition of Yahoo, it will be interesting to see how or if MS can make some real commercial value out of Hadoop and other Yahoo cloud initiatives. Google is the biggest question mark. Will it ever decide to commercialize its cloud initiatives? Most experts whom I have talked to say that Google's infrastructure is so unique that its cloud implementations probably wouldn't make sense for anyone else but Google. Kind of like the legacy EC2/S3 story on steroids. As I have stated before, though, when the really big brick and mortar data centers start getting that deer-in-the-headlights look for IT costs, whom do you think Jame Dimon is going to call (IBM or possibly Google)? If Google gets enough of those calls, it might start to listen.

Level 3

Level 3 providers, IMHO, are the current highest level of the cloud food chain. IMO, 3Tera stands all alone at this level. I first heard about 3Tera when I read a Linux magazine article listing it as a top company to watch in 2008. 3Tera provides software that allows a company to run its own Virtual Private Data Center (VPD). A company or hosting provider can install 3Tera's Applogic software on a grid of commodity-based hardware and enjoy all the rewards of having a self-contained cloud. 3Tera has partnered with a number of hosting providers that will provide customers with their own private VPD isolated on a grid. 3Tera is only about 3 years old, I was that that its goal was to provide Google-style commodity computing to the masses. The primary differentiator between the 3Tera offering and EC2/Mosso offerings is that 3Tera's approach is holistic. When you get a grid using 3Tere's Applogic software, you get a blank template, sort of like a workflow editor to build your data center. Then, you can select from a catalog of firewall servers, load balance servers, Apache servers, and MySql servers. Basically, they are predefined virtual images. The kicker, however, is that, when you select one of the cataloged servers, the Applogic software understands the context in which you are selecting the server and makes the appropriate configurations automatically. You literally drag the icons (i.e., servers) onto the canvas and then use lines to connect the servers. All of the default mounted file systems are connected in all the right ways. All the nasty network configuration parms are set up with best practices. In fact, in a demo/briefing that I had with 3Tera, it built a three-tier Apache web and MySQL grid, the complete bundle, and started the image, all in less than 10 minutes. The images were as real as any EC2 image that I have ever used. I was so blown away that I didn't trust them at first and asked them to putty into the icons and do some basic linux system commands to make sure it wasn't a demo system.

The original demo configuration was made up of four systems (a firewall server, load balancer, Apache server, and a MySQL server). Then, we went back and added another Apache server and a clustered MySQL server. We re-built the package and restarted, and we had six systems running. Not once did we have to touch a configuration file. Then, they started showing off by ssh'ing into the apache server and tried to ping yahoo.com. It didn't work because their best practice out-of-the-box implementation had already isolated the servers behind the firewall. I think that at one point I shouted, “My God .. This is how IT has to be." If you want more ranting on this, listen to my recent IT Management Guys Podcast over at Redmonk. As Cote pointed out in that podcast, it is certainly not all fairy tales and pixie dust, but it is a hell of a lot further than EC2 and S3, and look at how excited everyone is about those services. 3Tera has graciously lent me a development grid to play with, and, unbeknownst to them, I am going to see if this is something that could be used for DevCampTivoli (I guess they know now). Since I have been having so much fun on AWS, I thought that I would distribute the fun to other tools like 3Tera and start blogging about some of my experiences using its grid software.

Summary

Minus some of my possible grammatical errors, I think this is probably a good first cut on demystifying the clouds. If you have any input corrections or things that I have missed, please feel free to add comments to this blog article or contact me directly.

Disclaimer: I have not read Nick Carr's book the "Big Switch"