Clouds make it all seem so easy—a tool that is simple to learn and powerful. As always, the devil is in the details, and some missteps in a cloud can brew the perfect storm. As companies pose this question, it seems timely to share some thoughts and recommendations.
We all know that the cloud is simply a chunk of resources, both server and storage, that can be allocated to specific users on an as-needed basis. The resources are priced by both their size or capacity and the amount of time they are used. Virtualization technology has enabled the packaging of workloads in nice little Virtual Machine Disk Format (VMDK) or Virtual Hard Disk (VHD) files which, due to their portability, can be copied, turned on, suspended and removed with relative ease. Thus, the cloud is a perfect place to toss an overflow workload to run for days or weeks or, if the cloud vendors have their way, forever.
But then, if we scratch our heads for a minute, we realize that the cloud is just a very large resource pool, like a mega-cluster. Every IT shop that has adopted virtualization today has a cluster or two (or 10) running virtual machines for a whole host of business customers. Those VMs have associated storage and network resources, and at provisioning time, each VM is assigned its own chunk—just like an external cloud. With one exception: when it is an external cloud, it isn't your problem.
External clouds live out there in the ether, like a power station down the street, or the wind farm on the hill. You don't have to worry if your brand-new plasma TV is sucking down three times the power—the power station keeps on piping you the electrons. And when the power does go out? Sure, you light some candles, but none of us go climbing poles and driving cherry pickers to fix it—it's the power company's problem. And when you move out of your apartment to a new place down the street, and someone has to disconnect your account and reconnect the other account and bill some landlord for the period in between tenants—that, too, is not your problem.
So, let's talk about what is your problem. The cloud down the hall—that's yours. All yours. And there are a few basic rules you have to follow to ensure when you're running a cloud down the hall:
Rule No. 1: You can't run out
Unlike rolling blackouts, your application owners don't want to hear that human resources put up some giant new application and now their Web server is going to be slow. You have to grow the cloud to meet the consumption needs. And these days, lightning-fast provisioning and cheaper than physical pricing is really driving up demand.
Rule No. 2: You can't store leftovers
Until someone manages to put time in a bottle, all that extra unused storage, unused server capacity, underutilized machines and over-provisioned space is a waste. Unless you're among the mythical few IT shops that don't feel the squeeze of budget cuts, there is no room for waste in your cloud.
Rule No. 3: You can't have outages
Now that it's virtual, one outage can mean a whole neighborhood without power (even a 99 percent uptime service-level agreement (SLA) equals three days of outage per year!). Since every VM is owned by a different group—each tinkering around in there to make it work for their application—it becomes a veritable game of bumper cars around the electric poles, and every so often, something's going down. And you are the only one driving the cherry picker.
Rule No. 4: You're the meter reader
The guy that comes by to check the meter? That's you. And you better be sure you know whose bill to tack that onto. And when they moved. And whether it is time to cut off power. Right now you have a 32-character string, an Excel spreadsheet and a pad of Post-it notes. You better remember your arithmetic because this is your justification for new investment. And per rule number one, you can't run out.
That's a lot. If the power company did it, we should be able to do it too. There are really two categories of things that would make these challenges easier:
1. Better information
Whose VM is whose? When did they use it? How much resource did it consume? When should I shut it down? How full is my environment? Which VMs are over-provisioned? There's a long laundry list of fairly basic questions to which we just don't have answers now. It's like plugging a subdivision into a power strip—how could you possibly allocate, grow, charge, decommission or manage those customers?
2. Demarcated control
Certain things are fixed and your customers can't change them. You get 120v power, whether you like it or not. You probably haven't haggled over your power bill lately either, and you don't get power unless you have an account, so there are no squatting tenants running hair dryers all day. To get a piece of the cloud, you have to play by the rules.
But there is also flexibility. If I want to stick three PCs, four monitors, a surround sound system, and a keyboard into the same power strip, and my circuit breaker flips, taking down my unsaved masterpiece, that's my own problem to fix. Power is still running to my house; what I do with it is my problem.
Your cloud needs the same thing: controls that prevent individual users from compromising the cloud itself (the barriers around the electric poles) and freedoms to enable users to effectively use their piece.
The analog way to get information and control would be to send out meter readers, collate Post-it notes and become veritable policemen of your own cloud. Most of us don't enjoy those tasks, nor do we have the time or extra people to staff these goals. Luckily for us, we work in the digital age, so our solution may come from something more automatic.
I haven't seen an actual person reading meters in quite some time. For that matter, when the electric company comes to repair a downed line, they probably have a pretty good idea where the problem is and aren't circling neighborhoods looking for dark windows. It's automated. Similarly, we need some automation in the cloud, as follows:
Automated data collection
Any time spent transcribing, documenting and manually keeping track of VMs—their owners, their settings, their service windows, their file system access, their storage stack, their lineage and so on—is really tedious and unnecessary. Tools should be able to answer all these questions for you, in real time.
Automated historical analysis
If you are collecting data from the present, surely you can keep track of yesterday, last week and last month. Over time, utilization patterns, growth patterns and other trends emerge. Those should be clear to you with some historical perspective. No need to go calculating it yourself—automate it.
Automated capacity management
From the past, you may be able to learn about the future. With a baseline, average growth rates, and a sense for VM decommissioning and reclamation expectations, you can get a lot closer to real capacity planning—which is the justification for additional investment.
Automated barriers to maintain environmental integrity
As these environments grow, you can't monitor every VM user and make them swear not to touch the system directory. Automate the enforcement of read-only zones in the VM, whether directories, configuration elements or application stacks. Set it and forget it.
Automated identification of changes
Some things will change—and some things should change. But nothing should change without leaving a trail, both because best practice requires it and because it speeds up troubleshooting significantly. Automate your system to leave bread crumbs as it changes over time.
Luckily, there are a growing number of vendors in this space who are rapidly solving the challenges of managing your internal infrastructure. But, let's stop short from calling it an internal cloud. When it's right down the hall, there's nothing ephemeral about it.
But those external clouds are growing by the minute and, who knows, your company may want a piece of those one day too. They are pretty cost-effective and getting increasingly sophisticated. They also turn the tables 180 degrees on you, the IT department: all that was your problem down the hall is now their problem in the ether. You're the customer now and with that come a list of requests. At a minimum, you would like to:
1. Get what you pay for
As a customer of a cloud, you want to ensure that the service levels you've been promised are being delivered, leverage all the resources you've ordered, and make use of that VM for every minute it is running. When you pay by the hour, every hour counts. And if there is an outage or a problem, you want to be alerted and have a rapid resolution.
2. Track what's changed
Even though it isn't in your infrastructure, the workload is still your own. To aid in your side of troubleshooting, and to satisfy your audit processes, changes should continue to be tracked. This is harder than tracking your own local infrastructure, of course, and sophisticated tools must be used to automate change audit, providing both the cloud provider and the customers the level of visibility they need.
3. Protect your privacy
A public cloud, such as any public resource, is not built for you alone. But you still want a sense of privacy and isolation, ensuring that your system is not tainted by communicating with others in the cloud. Just because it isn't home, doesn't mean it can ignore the house rules.
Prognosticators have gotten a bit starry-eyed about the cloud, envisioning a world in which multiple clouds are at your disposal—where your workloads run on the nearest, cheapest, fastest cloud and are easily motioned around the ether. This dream, beautiful in the soft lighting and morning mist of a good fairy tale, may be somewhere in our future. Nearer term, companies may build a relationship with a single cloud vendor. And today, that cloud lives down the hall, encased in the steel and silicon of your own data center.
To get to tomorrow, we'll conquer today's challenges of scale and automation, mastering the local before exploring the beyond. Because we know: if you try to push your problems onto the cloud, things will only get stormier.