2009年3月10日火曜日

The Skinny on Cloud Lock-in

Hibrid Cloud Computingサービスを提供するベンダの一つ、Rightscale社がCloud Computing業界の問題である、Lock-Inの戦略と自社がどのようにそれを解決しているかを説明している記事。

The topic of cloud lock-in is getting quite some attention as of late, and it definitely needs to be a primary concern for anyone planning to move business critical applications to the cloud. (And who isn't planning on that these days?) Given all the different layers of cloud computing the conversation can quickly get more confusing than anything else. At Cloud Connect a few weeks ago the lock-in discussion bounced from Salesforce.com to Google App Engine, and then to Amazon Web Services within a single argument — which just makes no sense. To put it simply, different layers of cloud offerings vary widely when it comes to the dangers of lock-in.

Lock-in hypothesis

Let me state Thorsten's Lock-in Hypothesis:

The higher the cloud layer you operate in, the greater the lock-in.


lockin increase

This means that if you use an application in the cloud, such as an all-in-one CRM package, you have the highest chance of getting locked-in. Move one level down to a platform in the cloud and you are somewhat less likely to get locked-in. Google App engine is one example: you can move a simple Python app off that platform fairly easily, but anything of substance that uses its BigTable storage and other services will end up relying on a lot of proprietary technology.  This "black box" effect locks you in more than, for example, a platform like Heroku where apps follow more of a standard Rails code base. When you move down to an infrastructure cloud, such as Amazon Web Services, it becomes even easier to see how you can move your application stack from one provider to another. After all, there's not much distinguishing the Linux box you get in EC2 from the Linux box you get at GoGrid. But even here, lock-in needs to be thought through because the system behavior –from storage persistence to networking details and on and on — is far from identical.

So where does this leave us? I've been talking about lock-in, but what does that really mean? Well, with cloud computing you outsource the operation of compute resources to a cloud vendor who "runs" your application and who "stores" your data. Lock-in occurs with this vendor to the extent it is prohibitively expensive or time-consuming to run your application elsewhere or move your data elsewhere. Whether this "elsewhere" is another vendor or whether it is your own infrastructure is not important: if you can't move, or it costs a lot or takes a long time to do so, you're locked-in. We recently asked our customers and prospects what concerned them most about lock-in. Here are the results:

lockin concerns

The layer cake

Lock-in can actually occur at many levels in the stack, and that's why the cloud layers differ in their effective lock-in risk. The more code that is controlled "behind the curtain" by the cloud, the more you tend to lose freedom. Conversely, the more that is under your control, the easier it is to replicate it elsewhere and retain freedom. Here are a number of different layers at which you could find yourself locked-in:

  • Application: do you own the application that manages your data or do you need to find/write another one to move?
  • Web services: does your app make use of 3rd party web services that you would have to find or build alternatives to (e.g. storage, search, billing, accounting, …)?
  • Development & run-time environment: does your app run in a proprietary run-time environment and/or is it coded in a proprietary development environment? Would you need to retrain programmers and rewrite your app to move to a different cloud?
  • Programming language: does your app make use of a proprietary language, or language version? Would you need to look for new programmers to rewrite your app to move?
  • Data model: is your data stored in a proprietary or hard to reproduce data model or storage system? Can you continue to use the same type of database or data storage organization if you moved or do you need to transform all your data (and the code accessing it)?
  • Data: can you actually bring your data with you and if so, in what form? Can you get everything exported raw, or only certain slices or views?
  • Log files and analytics: do you own your history and/or metrics and can you move it to a new cloud or do you have to start from scratch?
  • Operating system and system software: do your sysadmins control the operating system platform, the versions of libraries and tools so you can move the know-how and operational procedures from one cloud to another?

All these issues become pertinent when you face questions such as: "How can I move my Force.com application or my web site running in Google App Engine to my own data center?" Or "Can I get the click-stream data for my site out of the platform so I can analyze, for example, last year's traffic compared to this year's?" Or "Can I easily move an application between my datacenter and EC2 easily?"

Altitude increases lock-in

The value proposition of the higher cloud layers is appealing and I predict more and more movement in that direction. But lock-in is one of the issues that really gives me pause and that has kept me in the past from adopting some of the services that otherwise have looked compelling.

Let me pick on Google App Engine for a minute. Suppose you develop your site on App Engine and you find yourself having to move away for whatever reason. I don't know of a good solution for you at that point. While there are ways to port an app from App Engine to Django it's not clear this is really an answer if you're running a high volume production app. It's going to be interesting to see whether we will end up with commercial or perhaps open-source App Engine clones that are "industrial strength" to the point where one can really contemplate moving a big app from one App Engine vendor to another. (Well, first Google App Engine needs to be complete enough to host the types of apps where this is a real concern.)

An example closer to home is Amazon's Simple DB. I've been interested in Simple DB since I first heard about it, but we have yet to use it as part of the RightScale service and the #1 reason is lock-in. For example, we store audit entries for everything that happens with our users' servers and I'd love to get those out of the SQL database they're in. Simple DB may be a good solution to the problem from a technical point of view, but we don't see how we'd be able to move that data out of Amazon without major headaches. In addition, we need to be able to run all pieces of the RightScale service in other clouds and we'd have to build an alternate storage solution there. By the time we do that we might as well only use this alternate solution and forego Simple DB altogether.

At the level of infrastructure clouds like Amazon EC2 the questions around lock-in are somewhat different but still pertinent. The cloud vendor provides what I like to think of as the "atoms of computing," namely processing, storage, and networking. You get to build your infrastructure using virtual machines (EC2), disk block devices (EBS), hashed storage buckets (S3), security groups, etc. This means that the choices of programming language, development environment, runtime environment, database storage and so forth are all yours and can all at least in principle be duplicated in another cloud, at a traditional hosting provider, or in your own datacenter. Where lock-in starts to creep in is in the system architecture and in the operations infrastructure (automation, scripts, procedures) that your sysadmins put in place to manage everything.

Maintaining freedom of choice

One of the principles that I've upheld in the design of the RightScale system from the beginning is transparency. Everything happening on your systems should be visible to you. This not only means that you can find out why something happened and who did it, but also that you can replicate it elsewhere. There's no magic happening behind the curtain to which you're held hostage. I love it when others can do magic for me and save me a lot of time and effort by providing a pre-built platform. But there are solid reasons — both business and technology-related — to demand the ability to look into the "secret sauce." That way, I can be enchanted by the magic but not locked-in to the magician. Our users need to be able to enjoy the same capability.

A second principle we follow is to focus as much as possible on standard software, architectures and configurations. This means that our solutions can easy be replicated elsewhere, such as in your own datacenter. This can present more of a challenge when designing for a cloud environment, which is why we provide cloud-ready solutions for various types of scalability, but it also frees you from being tied to a particular cloud.

lockin details

In the end, there may not exist a zero lock-in option. In fact, certain kinds and degrees of lock-in are probably unavoidable and are actually tolerable. The point is that the lock-in question is an important consideration to take into account when choosing among different cloud computing alternatives, and it's equally important to keep the differences among cloud layers in mind when you decide what you're willing to live with. All clouds are not created equal, and all clouds do not create equal lock-in. The key is to know the implications of your cloud choices.