Monday, September 30, 2013

Porter's Outage: Dealing with an outsourcer's system failure

A couple of weeks ago, I got caught in the Porter Airlines network outage. I was heading back from a meeting from Ottawa and we had managed to get the airport on time, only to find that we could not get our flight because the "system was down". Although I was scrambling to figure out how to get back to Toronto, my colleague had it much worse as she had a connecting flight back to Windsor! For me it was one of those "check out" moments. You know when you are at the grocery store and the guy ahead of you is haggling with the attendant, and you think to yourself: "Should I wait for this situation to resolve itself or move to the next line?" As the Porter folks informed us that they will give us a refund, I decided to book the next Air Canada flight back to Pearson (instead of the Billy Bishop airport - where I had parked at. Although I was supposed to fly out at 9:20 PM, they managed to put me on the 7:30 flight. A number of us at the back were "refugees" from the Porter flight. It is tempting to get exasperated and complain in these situations, but one of my fellow refugees pointed out how this is essentially  "first world problem": we only ended up waiting about an hour and we had all the amenities (food, water, shelter, etc) waiting for us when we got back to Toronto!  
As reported in the Toronto Star, the source of the outage was due to a failure at Navitaire: the "reservation and flight planning system" that Porter outsourced to. It turns out that other airline companies, such as Air Tran, were also affected by the outage.

Surprisingly, this is not the first time that Navitaire has experienced an outage: the company also had an outage in 2010 that affected Virgin Blue airlines. As would be expected, Virgin sued Navitaire. The case was settled out of court. As noted by the Register (who commented on the 2010 outage):

"It is becoming more and more obvious that Navitaire's business continuance and disaster recovery provisions failed completely in this outage. There should have been standby systems ready to take on the load of any failed system or system component, but there weren't any. That is a blunder of the first magnitude by whoever designed, implemented and ran the system."

Well, it seems that the "blunder of the first magnitude" has repeated itself only 3 years later.

As you know from my previous posts, that I have written about the cloud from a CPA perspective, so the logical question is: where is the SysTrust or other third party review of their IT controls to ensure that this type of thing doesn't happen?

Well, I could not find it. The brochure for the services offered by Navitaire, does not make mention of the third party audit report. However, it is possible (although unlikely due to the cost) that Navitaire allows its customers to send in their own auditors.

Regardless, the incident illustrates the need for customers who outsource their operations to third parties to get an assurance report (e.g. Trust Services) that ensures that such controls (e.g. disaster recovery) are in place.

To Porter's credit they gave me a refund and they also gave a free flight to anywhere they fly. So from their end they did their best to make amends due to the fiasco.


No comments: