Tuesday, October 4, 2022

Fiona’s Fury: Flashback to Summer’s Great Rogers Outage (Part 1)

Canadians continue to pick up the pieces after tropical storm Fiona battered the maritime provinces. Although estimates of the damage are yet to be calculated, the “Nova Scotia Premier Tim Houston announced over C$40 million ($29.1 million) in support to help those who were impacted by Fiona” (link). In terms of cellphone outages, CBC reported that “there are still areas of the province without cellphone service Monday although companies declined to say exactly  how many customers have been affected.”

 

The Canadian Radio-television and Telecommunications Commission (CRTC) has asked for estimates on how many people were affected by the outage, but the telecom companies are reticent to share this information. As CBC reported: “Bell and Telus asked for some of their submissions to be redacted, while Eastlink and Rogers demanded their entire reports be kept confidential.”


Photo by Pixabay: link

 

Rogers Outage in Review: What happened last summer?

When looking at the outage that hit the Maritimes, it reminds us of the situation that unfolded over the summer. In July 2022, the Rogers outage was not limited to the East Coast. Instead, it affected the entire country. When Rogers was requested to explain what happened, it appears that they had a more conciliatory tone:

Rogers Communications Canada Inc. (“Rogers”) is in receipt of a letter containing Requests for Information (“RFIs”) from the Canadian Radio-television and Telecommunications Commission (“CRTC” or the “Commission”), dated July 12, 2022, concerning the above-mentioned subject. Attached, please find our Response to that letter… At the outset, Rogers appreciates the opportunity to explain to the Commission, the Government of Canada and all Canadians what transpired on July 8th, 2022.  The network outage experienced by Rogers was simply not acceptable. We failed in our commitment to be Canada’s most reliable network. We know how much our customers rely on our networks and we sincerely apologize.” [Emphasis added]

 

Though the documented was redacted, it did provide some background as to what happened. For this post, we will take a look at the outage itself. For the next post, we will look at the lessons learned.  

 

Cause of the outage

Rogers explained the cause of the outage as follows:

“Given the magnitude of the outage, it appears that Rogers had to be more forthcoming with what happened and were “Maintenance and update windows always take place in the very early morning hours when network traffic is at its quietest. At 4:43AM EDT, a specific coding was introduced in our Distribution Routers which triggered the failure of the Rogers IP core network starting at 4:45AM… The configuration change deleted a routing filter and allowed for all possible routes to the Internet to pass through the routers. As a result, the routers immediately began propagating abnormally high volumes of routes throughout the core network. Certain network routing equipment became flooded, exceeded their capacity levels and were then unable to route traffic, causing the common core network to stop processing traffic. As a result, the Rogers network lost connectivity to the Internet for all incoming and outgoing traffic for both the wireless and wireline networks for our consumer and business customers.” [Emphasis added]

In other words, the change inadvertently resulted in an attack pattern similar to a denial-of-service attack – where the network shutdown because it became overwhelmed with traffic.

They also go on to explain that the company “uses a common core network, essentially one IP network infrastructure, that supports all wireless, wireline and enterprise services. The common core is the brain of the network that receives, processes, transmits and connects all Internet, voice, data and TV traffic for our customers… Certain network routing equipment became flooded, exceeded their memory and processing capacity and were then unable to route and process traffic, causing the common core network to shut down.” The implication being that the common core network became a single point of failure.

 

What was and was not impacted

With respect to Rogers Bank (yes, Rogers operates a bank):

“The impact to the Bank’s customers was minimal as the Bank services were available and the Bank’s customers were able to transact on their Rogers Bank credit cards. There was no interruption in the Bank’s core systems (credit card processing, Interactive Voice Response (“IVR”), Call Centre and customer self-serve mobile application) and these core systems remained available to the Bank’s customers. No critical Bank systems were impacted, and all daily processing was completed as required, including by the Bank’s statement printing vendor and its card personalization bureau which received their daily files and were processing them per standard service level agreements and procedures.”

 

This was a different story for those that relied on Rogers phone lines to process payments at their businesses with Interac tweeting:

“There is a nationwide Rogers outage that encompasses all their business and consumer network services. This is impacting INTERAC Debit and INTERAC eTransfer. INTERAC Debit is currently unavailable online and at checkout..

 

Beyond the millions who had no service, emergency communications were also impacted:

  • “Unfortunately, the outage of July 8th did impact 9-1-1 service across Rogers’ service area, to both wireline and wireless services.
  • Wireline impact:  There were approximately [REDACTED] 9-1-1 calls placed successfully across Rogers’ network on July 8th.  The typical daily average of total wireline 9-1-1 calls is [REDACTED] per day. Data is unavailable for unsuccessful wireline 9-1-1 calls.  On July 9th, there were approximately [REDACTED] 9-1-1 calls placed successfully across Rogers’ network.
  • Wireless impact: As can be seen in table below, the outage similarly affected wireless 9-1-1. Total successful calls were [REDACTED] the average daily amount of about [REDACTED] 9-1-1 calls made from Rogers wireless devices.
  •  

Rogers offered service outage credits

The key remedy offered was service credits, but this was not due to breaches in service agreements:

“There was no breach of our service agreements with our retail customers. However, in order to address our customers’ disappointment with the outage, Rogers has already announced it will be crediting 5 days of service fees to its customers. This will be applied automatically to their next invoice.”

 

Cooperation with Bell and Telus

Regardless of the highly-competitive nature of the business, it does appear the Rogers, Bell and Telus were coordinating with each other:

  • “On July 17th, 2015, the Canadian Telecom Resiliency Working Group (“CTRWG”), formerly called Canadian Telecom Emergency Preparedness Association, established reciprocal agreements between Rogers and Bell, and between Rogers and TELUS, to exchange alternate carrier SIM cards in support of Business Continuity.”
  • “As we stated in Rogers(CRTC)11July2022-1.xviii above, our Chief Technology and Information Officer reached out to his counterparts at Bell and TELUS early on July 8th. Assistance was offered by both Bell and TELUS. However, given the nature of the issue, Rogers rapidly assessed and concluded that it was not possible to make the necessary network changes to enable our wireless customers to move to their wireless networks.”
  • “Rogers, Bell and TELUS are presently assessing potential options and will report further findings and potential solutions per the creation of the Memorandum of Understanding that will be delivered in September 2022 to the Minister of ISED by CSTAC.”

In closing, the outage comes down to change management. The error was exacerbated by the industry-standard approach to using a single platform to provide the various telecommunication services. Rogers did offer service credits, but were careful to note that this was not due to breach of agreements. Finally, the industry does come together during crisis situation, putting their competitive differences aside. 


In our next post, we’ll take a look at the lessons learned from this outage. Stay tuned!

Author: Malik Datardina, CPA, CA, CISA. Malik works at Auvenir as a GRC Strategist that is working to transform the engagement experience for accounting firms and their clients. The opinions expressed here do not necessarily represent UWCISA, UW, Auvenir (or its affiliates), CPA Canada or anyone else.

No comments: