Tuesday, February 12, 2013

Cloud Computing Dangers: A Case of the Mondays

This posting is Part 8 of the Case Study in Cloud Computing Dangers.

We started the business day on May 14, 2012 finally able to send email to the primary contractor on our VA project, but not to the VA email accounts. This development was not an indication that Day 5 represented the end of our outgoing mail Denial-of-Service between our Office 365 cloud service and just about any mail gateway using Cisco devices or any other devices that used senderbase.org to receive SPAM reputation scoring. The organization had simply been shamed (either within or without) into lowering its SPAM blocking threshold to allow any email through that was rated Neutral. Not only was the organization the victim of being unable to receive legitimate email from business partners and clients, it was forced into a making a business decision that would allow more malicious messages to pass through the gateway. It was not a good sign.

Friday, February 8, 2013

Cloud Computing Dangers: Blame When Things Go Wrong

This posting is Part 7 of the Case Study in Cloud Computing Dangers.

When technology problems occur, IT folks will typically focus first on finding a technical solution. It's in our nature because solving technical problems is what we've been trained to do. Waking up on Sunday, May 13 to find ourselves still suffering from an outgoing mail Denial-of-Service on our Office 365 business platform, we were in disbelief that the technical problem still had not been solved. Our challenge was to move past our confidence in understanding the problem's technical nature and to recognize that we were falling victim to a broader issue of being unable to assign responsibility in a massively distributed communications system.

Friday, February 1, 2013

Cloud Computing Dangers: False Hope


This posting is Part 6 of the Case Study in Cloud Computing Dangers.

On Saturday, May 12, as my company continued to suffer from an Office 365 outgoing mail Denial-of-Service, I woke up to an email a colleague sent me from the primary contractor that we were unable to communicate with. A test message that I had sent at 3:33 PM on Thursday, May 10 had been received at 2:24 AM Saturday morning. Despite a transit time of just under 36 hours, I was elated to discover that a message had gotten through. Perhaps Microsoft was really true to its word and we could expect to have the problem resolved soon so that we could move on with our lives. Or, perhaps it was just a fluke since I hadn't seen any other messages get through.

Saturday, January 26, 2013

Cloud Computing Danger: Seeking Problem Clarity


This posting is Part 5 of the Case Study in Cloud Computing Dangers.

After establishing the legitimacy of our outgoing mail Denial-of-Service the morning of May 11, we expected Microsoft to resolve the issue by the end of the day. Since it was related to some SPAM condition associated with the Office 365 outgoing mail gateways, Microsoft should be able to rally its resources to quickly address the technical problems and enable us to re-establish communications with our largest customers. We were overly optimistic.

Wednesday, January 16, 2013

Cloud Computing Dangers: Pointing the Finger

This posting is Part 4 of the Case Study in Cloud Computing Dangers.

All businesses face significant IT challenges but they are far more insurmountable for small businesses with limited resources with which to tackle them. Cloud computing in any form, be it Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), Software-as-a-Service (SaaS), or WhateverYouImagine-as-a-Service (WYIaaS), promises to level the playing field by providing small businesses a level of enterprise support that they couldn't possible retain individually, all at a "low" regular subscription fee (at least lower than the alternative CapEx/OpEx values). With the level of support that a small business receives from a large organization such as Microsoft, the business should reasonably expect to have a much more available and resilient resource than it could expect of itself. Most business executives can easily see the benefits and are generally eager to sign up.

As someone who has run an IT operations group, I can tell you that IT people immediately blame the user when the user reports a problem. Perhaps its driven by pride in the environment that they maintain or by some sense of self preservation. For whatever reason, the user is wrong until proven right. You can see the results of this in large business help desks that immediately try to pass you off to an online "knowledge base" or threaten you by offering to "take away your computer" to examine the problem deeper. If the problem is an outlier, then it is more likely related to the user than to the system or application. That culture of denial is enhanced in a cloud environment where the service provider knows how to run the system much better than any individual user, so if it doesn't detect a problem, then there is no problem.

Sunday, January 13, 2013

Cloud Computing Dangers: Establishing Responsibility

This posting is Part 3 of the Case Study in Cloud Computing Dangers.

At around 4:30 PM on Wednesday, May 9, I was preparing to make the trek from my VA site location near DC's Union Station to my home in Fairfax City, VA. For anyone who isn't well versed in the journey, understand that it is something that you really need to psyche yourself up for. It wasn't uncommon for me to lose 90 minutes of my life making just the one way trip over the course of just 17 miles. Doing the math, I could travel at just a little over 11 miles per hour, covering a mile in perhaps 5 minutes. Knowing that you will never get that time back, that most of the time you'll be staring at dozens or hundreds of taillights, that you could probably cover the distance faster by bike if you didn't have to wear a suit, is an excruciating fall from innocence that I would promote as the contemporary definition of madness. You have to develop a dissonant optimism to keep from just barreling through a crowded street in a moment of temporary relief. "Maybe it won't be that bad today." "My kids will thank me some day for working so hard." "I'll be able to make soccer practice…no problem."

Jason and I both knew how critical our email communications were for maintaining business continuity. As a small business with less than a dozen revenue-producing employees, our position was tenuous and depended on the perception of always being present, available, and responsive. This problem had cut off our communications with our two largest revenue generators, representing over half of our active business, and with a contractor with which we were working on several proposals. We had to solve the problem, and fast. It seemed obvious to me that I should just break out my iPad and troubleshoot while navigating DC/Northern VA traffic. When Jason realized what I was doing, he simply cautioned, "Please don't kill yourself over this." At least I was able to justify not riding a bike to the office for another day.

Friday, October 26, 2012

Cloud Computing Dangers: Incident Detection

This posting is Part 2 of the Case Study in Cloud Computing Dangers.

At around Noon U.S. Eastern Daylight Time (EDT) on Wednesday, May 9, I forwarded a calendar invite from my corporate account to my VA address. The message included some important attachments that originated from a prime-contractor colleague. I also responded to several email messages from the same colleague, sending mail to both his corporate and his VA accounts. Everything that seemed to have worked fine a few minutes prior was about to blow up in my face.