Sunday, July 8, 2012

Electrical and cloud outages: Is it time to bring both on premise?

Amazon experienced an outage that affected a number of companies that rely on their cloud service. The company informed its users that its service went down due to the power outage stating: 

"On June 29, 2012 at about 8:33 PM PDT, one of the Availability Zones (AZ) in our US-EAST-1 Region experienced a power issue.  While we were able to restore access to a vast majority of  RDS DB Instances that were impacted by this event, some Single-AZ DB Instances in the affected AZ experienced storage inconsistency issues and access could not be restored despite our recovery efforts.  These affected DB Instances have been moved into the “failed ” state.

This notice was actually taken from CodeGuard (a start-up that takes snapshots of websites enabling owners to undo unwanted changes) who was one of the companies affected by the outage. 

As can be expected, many will use this as an opportunity to illustrates the danger of moving from on premise to the cloud. A parallel argument would be to highlight the dangers of drawing on electricity from the central grid. One would argue one is more reliant on power than on computing - so why not bring electricity "back" on premise? This is an absurd argument, but that is exactly the point. Companies, as pointed out by Nicholas Carr in the Big Switch, used to produce their own electricity, but eventually moved to rely on the grid for power.  Today hardly anyone produces their own power, but has backup generators in place to provide power should grid go down. And that's the right question to ask: why was there inadequate backup power at Amazon? In other words, society has decided to live with the fact that electricity is delivered centrally - but has built in controls to manage issues that may arise. 

Instead of viewing this as a black mark against cloud computing, it is important to view this discussion in the context of risk. Charles Babcock, InformationWeek published a good article on the reaction to the Outage. He noted that some are leaving AWS in reaction to the service. Specifically, (an online dating service) is moving to a hosted solution - away from the cloud. However, he also mentions, Okta (an identity management service) that was unaffected by the outage because they designed their application to be fault tolerant.  

In other words companies need to focus on whether the benefits of cloud computing outweigh its risks. Cloud provide pay-as-you-go computing - giving companies who have uneven workloads the ability to buy compute resources when they need it. It also give start ups, like CodeGuard, a chance to get their offerings into the market.  Here,here and here are the follow-up posts to their outage - they were able to get back online and they are sticking with Amazon. And this should not be a surprise to anyone. Technology startups can leverage the pay-as-you-go model of cloud computing to conserve their capital and instead focus on getting their offering out. For example, the founder of Animoto, points out they went from 50 to 80 compute to 3,500 instances over three-days (they were signing up 25,000 new users per hour at the peak) when their app went viral. So companies will hopefully use the cloud outage to highlight the need for good design and appropriate controls instead of an excuse to stick to the status quo of on-premise computing. 

No comments: