Pages

Sunday, July 17, 2011

On the General Topic of the Cloud

"Cloud computing" seems, largely, to be a semantic void into which one pours one's hopes and wishes for the future of one's IT resources, with added hopes that it will lead to a Promised Land of savings and profitability.  There are a few commonly-accepted connotations to the term, however, and much of the answer to the question posed depends on which "cloud computing" we're talking about:  the "cloud computing" that means "linearly- and infinitely-scalable architecture"?  The "cloud computing" that means "computer stuff I just pay somebody else to take care of for me"?  The "cloud computing" that implies a utility model, provided by specialized businesses as reliably dreary as the gas company, consisting of a supply of computing resources as indistinguishable as molecules of propane?

All of these meanings are frequently confused in the minds of many people, the definition frequently shifting from minute to minute in conversations about "the cloud".  Managers are most of these people.  Like cowbell, the cure for all problems now appears to be "more cloud".

So let's parse the three meanings given above.  They are actually very old concepts, and by way of clarification, I will call them by their old names:

(a)  Parallel processing
(b)  Outsourcing
(c)  Utility computing

(b) and (c) have the most effect on systems administrators in terms of employment, at least in the short term; the dream of managers everywhere is the day when they will be able to fire all of their smelly, weird, obstinate, technobabble-loving nerds and replace them with a nice monthly billing statement from a Fortune 500 company.  Although this goal is non-viable for many well-understood and well-explored reasons, like Spain's Philip II, no experience of the failure of this policy deters such people from their belief in its essential excellence, and so we must turn our attention to the first option.

Part of the attraction of so-called "cloud computing" is the notion that, with the application of More Cloud(tm), all of the impediments to "business agility" (read:  "the immediate satisfaction of management fiat") will magically go away.  Do we need to expand our capacity 300%?  Well then, get quotes from three Certified Cloud Providers, pick the cheapest one, and we'll have that done by close of business today!

The problem is that, as most computer scientists have known for decades:  parallel processing is HARD.  Designing systems and processes to efficiently take advantage of parallelism in problems is HARD.  And make no mistake, this is about parallelism - we cannot obtain a computer of infinite speed or capacity, but we can obtain a whole bunch of computers to throw at our problems.  Which sounds swell!  If only we had designed our processes to make the use of such a system possible!

Where this notion breaks down, in other words, is that most of the broken designs dealt with by today's IT departments do not permit efficient (preferably linear) scaling, since they were never designed for it in the first place.  IT installations are full of special mail servers, one-off clustered database servers, weird dependencies on bizarre hardware and software that cannot easily be moved to commodity or open-source tools (the actual basis of "the cloud", Microsoft ads notwithstanding).  Massive, bone-shattering amounts of pain are lurking in the shadows for anybody who wants to migrate their current core IT infrastructure into "the cloud"; for those who do not believe this, consider your current "must-support" IT infrastructure, then imagine that your CIO walking up to you and saying "We need to do what we're doing now, only ten times more of it.  All of it.  … By next week."  The simple truth is that while you might be able to teleport your entire infrastructure into somebody else's datacenter - with all the logistical and security problems that implies, for a net gain of "nothing" - your environment will almost certainly not scale to that extent.  If it is like most installations, it is very probably scaled past its original design parameters right now, with kludges, hacks, bubblegum, and baling wire holding things together.  To scale it efficiently to some arbitrary extent, then, would require going all the way back to the drawing board and making massive changes, with all the cost and disruption that implies.  To design it so that not only could you scale it up to any arbitrary extent now, but also to any arbitrary extent in the future as well, without redesign?  That cost would be immense.  The time required would be daunting.  There would not be "disruptions" of current IT services so much as "disintegrations" of them.  And, worst of all…  it would require lots of those malodorous nerds with weird personal habits.  And they would have to be listened to.  And they would have to be good at what they do, which means that they would want lots of money.

It is precisely that sort of design that is necessary to take advantage of "the cloud", and that only exists in the IT infrastructure of about two companies in the world.  (Google and Amazon, hey, how about that!)  Most IT departments are still dealing with the aftermath of bad decisions made over the last decade by a constantly-churning personnel, management, and vendor pool.  And so, we see one important reason why the promise of "the cloud" is largely a mirage; little businesses see giants like Google and Amazon lumbering about the landscape, giving wonderful presentations on their awesome tools for solving problems like "Download the entire Web and index it", and wonder how they can get them some of that, how they can be Big Businesses too.  They hear buzz about how Google and Amazon are structured to permit this colossal growth, and think "Wow!  It must be their cloudiness!  Where can I get some cloud?"  What these people do not take away from these presentations - and they should - is the understanding that it took man-centuries of dedicated engineering effort by really smart people to carve up their particular business problems and processes in the right way such that those problems were susceptible to infinite horizontal scale-out of the type promised by cloud computing, and then to build and test the tools that allowed them to do that.  They actually paid smelly, weird nerds lots of money to sit down and think about how to make that happen - with all of the expense and uncertainty that implies, since there is no guarantee that anybody can design something like that successfully - instead of buying a million-dollar product from a snickering vendor and a hundred-million-dollar ad campaign to push the brand.  It is the personnel, in other words, that made Google and Amazon, and precisely those personnel that everybody else wants to, ahem, "right-size" and overwork.

Which brings us to the point:  due to the immense cost of "doing things right" from the cloud perspective, it will be several years before sysadmins experience any problems in the job market due to migrations to the cloud.  Any way you slice the "cloud migration" scenario, it plays out the same way; somebody has to be in the sewers of the business with a pipe wrench until the migration is done, and because most of these migration projects are doomed to failure (as are most projects in IT), most of the perceived benefits - namely, the ability to cut in-house IT support personnel to the absolute minimum - will never materialize.  It will, however, waste lots of money and time, and will be part of that vast interlocking dependency chain produced from historical bad decisions made by a churning personnel, management, and vendor pool that tomorrow's system administrators will get to deal with.