So what was the cause of this fiasco?
According to CNN, "BNY Mellon outage occurred after a SunGard accounting system it uses became "corrupted" following an upgrade. A back-up also failed."
Normally, this type of thing will force the party experiencing the breach intense scrutiny over what went wrong. However, as I went through the timeline posted by the company, I found (reading between the lines) that they did a number of things right, such as:
- Incident Management Communication Plan: One of the aspects of incident management is communicating to the public and making them aware of the issue. As it can be seen, the company posted details on estimates as to when the system would be ready as well as what was the source of the delay when they didn't meet their estimated deadline.
- Contingency planning: BNY Mellon was able to get the Net Asset Values (NAVs) "using a secondary and more manual process" to the clients without the production system. According to WSJ, this involved "mobilizing more than 100 accountants to manually calculate the values of thousands of fund securities". The secondary procedure was not perfect as small differences were noted between the fund and, as reported the WSJ, this could expose the bank to "potential liability from the resolution of trades that may have been made during the pricing fog".
- Effective Service Level Agreement (SLA) with SunGard: Based on the close coordination with SunGard, it appears that they were able to get SunGard to eventually rebuild the system "from scratch".
"The issue appears to have been caused by an unforeseen complication resulting from an operating system change performed by SunGard on Saturday, August 22nd. This maintenance was successfully performed in a test environment, per our standard operating procedure, and then replicated in SunGard’s U.S. production environment for BNY Mellon. This change had also been previously implemented, without any issues, in other InvestOne environments. Unfortunately, in the process of applying this change to the SunGard production environment of InvestOne supporting BNY Mellon’s U.S. fund accounting clients, that environment became corrupted. Additionally, the back-up environment hosted by SunGard, supporting BNY Mellon’s U.S. fund accounting clients, was concurrently corrupted, thus impeding automatic failover. Because of the unusual nature of the event, we are confident this was an isolated incident due to the physical/logical system environment and not an application issue with InvestOne itself."
Given my background as a CA, CPA and CISA, I have always thought it is an odd contradiction that we expect infrastructure (road, dams, bridges, etc.) to be certified by engineers to be in working order (key word is expect, as John Oliver notes in the video below, this is not exactly up to snuff!), but do not have the same expectations for the technology that runs the Information Age.
And that's where I have always proposed that it is necessary to have a framework like SysTrust (now SOC2 and SOC3) in place that requires companies to ensure that their systems are reliable: secure, available, and able to process information without messing it up.
Based on the experience between SunGard and BNY Mellon, I think it actually proves the case. Although companies, like SunGard, likely have such controls in place it is beneficial to others to have a second set of eyes on those controls, ensuring that they are in place, are designed effectively and are operating effectively. The reason is that with such mandatory audits in place, it will allow for the circulation of best practices through such audits. This occurs in the financial auditing world through "management letter points".
One other area that we should explore is the total impact of this error, as it will give insights into the "total impact of failed IT controls". This will be the topic of the next blogpost.