Friday, April 12, 2013

Cloud Computing Dangers: Just Forget About It

This is the final posting (Part 10) of the Case Study in Cloud Computing Dangers.

By the end May 15, Day 7 of our outgoing mail Denial-of-Service on Office 365, on May 15, 2012, everything returned to normal. I was thrilled to find my VA email address flooded with test messages from over the preceding week.

Relief. And then, nothing.

We received no update from Microsoft, no communication from senderbase.org/Cisco, no satisfactory closure of any help desk tickets. Nothing, except for business as usual.

Have you ever gone out to a restaurant, ordered, and then waited for 30 minutes for even a glass a water to come out? You may have looked around and saw staff working the room, beverages being delivered, food exiting the kitchen, but still you sit there with nothing. You then get up to go find someone, anyone, who will help you, and you may even complain to the manager about the service. But then, when you get back to the table, everything is sitting there waiting for you and the rest of your family is happily eating. Taking your seat, you wonder if you just looked like a schmuck.

You did. But, that doesn't mean that you were wrong. Only that no one was willing to own up to the fact that you were right.

That's kind of what it felt like once our Office 365 service returned to normal. The wizard behind the curtain had fixed our ailment and then simply moved on without another word. We were just expected to go on with our jobs, simply content that our benevolent service provider even took the time to listen.

The lack of communication begs one simply question: Was anything actually fixed?

Like the magic purveyed by that ruler of Oz, I'm not convinced that Microsoft actually did (or even had the power to do) anything at all. One moment, the SenderBase Reputation Score (SBRS) for the Outlook.com gateway we were on was around 0.0. The next, a more respectable 2.8.

Did Cisco just simply go in and reset the reputation? Seriously…7 days of suffering through this whole event, and all Cisco had to do was reset the score? At the end of the day, all that had to happen was the IT equivalent of flipping the switch. That was it. A high school intern could have fixed the problem if he or she had just had the access.

If only one theme has threaded itself through this blog series, it's that of control. Organizations considering a move to cloud services needs to be firmly aware that, no matter how simple the solution, their usually able IT personnel have absolutely no control to do anything. Nothing.

But, it's even worse than that. Not only do organizations have no control over the quality of service that they receive from cloud services providers, they have limited ability to be heard.

A long time ago, in a lifetime far, far, away (before the consumer Internet), I was sitting at the freshman convocation for my beloved alma mater in Cambridge. With the sun beating down on us and the Charles River silently flowing by, President Charles Vest asked all of the high school valedictorians to raise their hands. Then, after asking that the hands stay up, he asked all of the salutatorians to do the same. He then encouraged us to look around and note that around two-thirds of the members of our freshman class had their hands raised. His final message was simple, "You're no longer alone at the top."

Corporate executives, especially those from small-to-medium-sized companies, need to ask themselves how comfortable they are to being just one of the nameless faces in a crowd of executives. That's what they become when they choose to subscribe to cloud services for their back office, sales, travel, storage, virtual servers, etc. 

When I started this blog series nearly 6 months ago, I stated, "It's not hard to understand why business executives are completely intoxicated by cloud computing." The potential cost savings is really extraordinary. But, the cost savings that an organization can achieve through cloud computing doesn't come cheap.

That's not to say that I see no benefit in cloud computing. As an entrepreneur, I recognize that cloud computing empowers organizations to focus on their core business services and ultimately outsource non-core functions to external providers. That fact alone allows them to not only increase productivity, but to focus that productivity into the areas that support the business rather than getting lost to overhead. That's the true power of cloud computing. But, executives should acknowledge that subscribing to cloud services means that they must shift their expectations and be honest about how to limit the damage from outages. 

There are several areas where I see opportunities for innovation. They include:
  • Collaborative Forums. I suggest that cloud service providers should participate in some sort of joint forum for discussing and resolving service disruptions. In our case, I would suggest that an "email providers" forum would be much more effective at providing good customer service than having one forum for Microsoft and one for Cisco. It also would demonstrate a level of recognition that multiple nested-providers may need to respond to a given issue. This type of innovation could have a great positive impact on service quality and subscriber satisfaction.
  • Managed Cloud Service Providers. Rather than force organizations to subscribe to individual services that best suit individual needs, I suggest that managed service providers that specialize in cloud services could charge a reasonable premium to manage all of an organizations cloud services. Then, when outages occur, that single provider could accept responsibility to identify and work with all of the associated providers.
  • Reform Reputation Scoring. My partner in our incident response, Jason Shropshire, equated our situation to that of consumers dealing with credit rating agencies in his blog posting on the subject.

    [In days past], consumers had little recourse to resolve problem credit scores, and it was difficult to even find out what was in their credit report. The scoring mechanisms were unpublished, and were trusted implicitly by those issuing credit. If a mistake was made with a credit score, it was solely up to the consumer to resolve with very little recourse or information at their disposal.

    Cisco has no responsibility or accountability over its reputation scoring algorithms. Nor is it a neutral party over the distribution of network traffic on the Internet. The industry needs to come up with a better solution than to put so much power in the hands of one for-profit organization (the non-profit status of senderbase.org aside).

For organizations, I only suggest that they take care to evaluate service providers from the perspective of tolerance to failure. Our example illustrates that services could be down for as long as a week. Use that knowledge wisely when considering operations, contingency, and disaster recovery plans.

Then again, perhaps ours is a rare case. But, as I implied in my first posting, we may never know how common or rare the case since I suspect many go unreported. If and when it does happen to you, remember to just forget about it when it's over. Your service provider most certainly will.