Friday, March 1, 2013

Cloud Computing Dangers: Stand By and Wait

This posting is Part 9 of the Case Study in Cloud Computing Dangers.

It took six days after I detected an outgoing mail Denial-of-Service for Microsoft to publish a public admission that a problem did truly exist. In the contemporary fast-paced IT world, for any problem to take six days to recognize is like waiting to be taken across the river Styx. But, I doubt that Microsoft was working on it's obituary.

Cause

Currently Office 365 outbound email servers have a SenderBase reputation of neutral and a score of 0. As a result any policy set to throttle or reject mail from a server rated neutral or with a score of less than or equal to 0 may impact delivery of the mail from Office 365 customers.  

Microsoft currently believes this is due to an instance where a large number of bulk mail messages were sent to a user via a server that contributes reputation information. This mail did not get classified as spam by us, the sender is reputable, but the volumes, combined with Cisco’s rating system, have temporarily reduced our email servers' reputation in their SenderBase service. According to Cisco, it will take time and additional mail flowing through their system to retrain it and restore our email servers’ reputation.

Tuesday, February 12, 2013

Cloud Computing Dangers: A Case of the Mondays

This posting is Part 8 of the Case Study in Cloud Computing Dangers.

We started the business day on May 14, 2012 finally able to send email to the primary contractor on our VA project, but not to the VA email accounts. This development was not an indication that Day 5 represented the end of our outgoing mail Denial-of-Service between our Office 365 cloud service and just about any mail gateway using Cisco devices or any other devices that used senderbase.org to receive SPAM reputation scoring. The organization had simply been shamed (either within or without) into lowering its SPAM blocking threshold to allow any email through that was rated Neutral. Not only was the organization the victim of being unable to receive legitimate email from business partners and clients, it was forced into a making a business decision that would allow more malicious messages to pass through the gateway. It was not a good sign.

Friday, February 8, 2013

Cloud Computing Dangers: Blame When Things Go Wrong

This posting is Part 7 of the Case Study in Cloud Computing Dangers.

When technology problems occur, IT folks will typically focus first on finding a technical solution. It's in our nature because solving technical problems is what we've been trained to do. Waking up on Sunday, May 13 to find ourselves still suffering from an outgoing mail Denial-of-Service on our Office 365 business platform, we were in disbelief that the technical problem still had not been solved. Our challenge was to move past our confidence in understanding the problem's technical nature and to recognize that we were falling victim to a broader issue of being unable to assign responsibility in a massively distributed communications system.

Friday, February 1, 2013

Cloud Computing Dangers: False Hope


This posting is Part 6 of the Case Study in Cloud Computing Dangers.

On Saturday, May 12, as my company continued to suffer from an Office 365 outgoing mail Denial-of-Service, I woke up to an email a colleague sent me from the primary contractor that we were unable to communicate with. A test message that I had sent at 3:33 PM on Thursday, May 10 had been received at 2:24 AM Saturday morning. Despite a transit time of just under 36 hours, I was elated to discover that a message had gotten through. Perhaps Microsoft was really true to its word and we could expect to have the problem resolved soon so that we could move on with our lives. Or, perhaps it was just a fluke since I hadn't seen any other messages get through.

Saturday, January 26, 2013

Cloud Computing Danger: Seeking Problem Clarity


This posting is Part 5 of the Case Study in Cloud Computing Dangers.

After establishing the legitimacy of our outgoing mail Denial-of-Service the morning of May 11, we expected Microsoft to resolve the issue by the end of the day. Since it was related to some SPAM condition associated with the Office 365 outgoing mail gateways, Microsoft should be able to rally its resources to quickly address the technical problems and enable us to re-establish communications with our largest customers. We were overly optimistic.

Wednesday, January 16, 2013

Cloud Computing Dangers: Pointing the Finger

This posting is Part 4 of the Case Study in Cloud Computing Dangers.

All businesses face significant IT challenges but they are far more insurmountable for small businesses with limited resources with which to tackle them. Cloud computing in any form, be it Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), Software-as-a-Service (SaaS), or WhateverYouImagine-as-a-Service (WYIaaS), promises to level the playing field by providing small businesses a level of enterprise support that they couldn't possible retain individually, all at a "low" regular subscription fee (at least lower than the alternative CapEx/OpEx values). With the level of support that a small business receives from a large organization such as Microsoft, the business should reasonably expect to have a much more available and resilient resource than it could expect of itself. Most business executives can easily see the benefits and are generally eager to sign up.

As someone who has run an IT operations group, I can tell you that IT people immediately blame the user when the user reports a problem. Perhaps its driven by pride in the environment that they maintain or by some sense of self preservation. For whatever reason, the user is wrong until proven right. You can see the results of this in large business help desks that immediately try to pass you off to an online "knowledge base" or threaten you by offering to "take away your computer" to examine the problem deeper. If the problem is an outlier, then it is more likely related to the user than to the system or application. That culture of denial is enhanced in a cloud environment where the service provider knows how to run the system much better than any individual user, so if it doesn't detect a problem, then there is no problem.

Sunday, January 13, 2013

Cloud Computing Dangers: Establishing Responsibility

This posting is Part 3 of the Case Study in Cloud Computing Dangers.

At around 4:30 PM on Wednesday, May 9, I was preparing to make the trek from my VA site location near DC's Union Station to my home in Fairfax City, VA. For anyone who isn't well versed in the journey, understand that it is something that you really need to psyche yourself up for. It wasn't uncommon for me to lose 90 minutes of my life making just the one way trip over the course of just 17 miles. Doing the math, I could travel at just a little over 11 miles per hour, covering a mile in perhaps 5 minutes. Knowing that you will never get that time back, that most of the time you'll be staring at dozens or hundreds of taillights, that you could probably cover the distance faster by bike if you didn't have to wear a suit, is an excruciating fall from innocence that I would promote as the contemporary definition of madness. You have to develop a dissonant optimism to keep from just barreling through a crowded street in a moment of temporary relief. "Maybe it won't be that bad today." "My kids will thank me some day for working so hard." "I'll be able to make soccer practice…no problem."

Jason and I both knew how critical our email communications were for maintaining business continuity. As a small business with less than a dozen revenue-producing employees, our position was tenuous and depended on the perception of always being present, available, and responsive. This problem had cut off our communications with our two largest revenue generators, representing over half of our active business, and with a contractor with which we were working on several proposals. We had to solve the problem, and fast. It seemed obvious to me that I should just break out my iPad and troubleshoot while navigating DC/Northern VA traffic. When Jason realized what I was doing, he simply cautioned, "Please don't kill yourself over this." At least I was able to justify not riding a bike to the office for another day.