Monday, June 16, 2014

Auditing the Algorithm: Is it time for AlgoTrust?

This is the third instalment of a multi-part exploration of the audit, assurance, compliance and related concepts brought up in the book,  Big Data: A Revolution That Will Transform How We Live, Work, and Think (the book is also available as an audiobook and hey while I am at it, here's the link to the e-book ).  In the last two posts we explored the more tactical examples of how big data can assist auditors in executing audits resulting in a more efficient and effective audit. The book, however, also examines the societal implications of big data. In this instalment, we look explore the role of the algorithmist.

Why do we need to audit the "secret sauce"?
When it comes to big data analytics, the decisions and conclusions the analyst will make hinges greatly on the underlying actual algorithm.  Consequently, as big data analytics become more and more part of the drivers of actions in companies and societal institutions (e.g. schools, government, non-profit organizations, etc.), the more dependent society becomes on the "secret sauce" that powers these analytics. The term "secret sauce" is quite apt because it highlights the underlying technical opaqueness that is commonplace with such things: the common person likely will not be able to understand how the big data analytic arrived at a specific conclusion. We discussed this in our previous post as the challenge of explainability, but the nuance here is that is how do you explain algorithms to external parties, such as customers, suppliers, and others.

To be sure this is not the only book  that points to the importance of the role of algorithms in society. Another example is "Automate This: How Algorithms Came to Rule Our World" by Chris Steiner, which (as you can see by the title) explains how algorithms are currently dominating our society. The book bring ups common examples the "flash crash" and the role that "algos" are playing on Wall Street in the banking sector as well as how NASA used these alogrithms to assess personality types for its flight missions. It also goes into the arts. For example, it discusses how there's an algorithm that can predict the next hit song and hit screenplay as well as how algorithms can generate classical music that impresses aficionados - until they find out it is an algorithm that generated it! The author, Chris Steiner, discusses this trend in the follow TedX talk:



So what Mayer-Schönberger and Cukier suggest is the need for a new profession which they term as "algorithmists". According to them:

"These new professionals would be experts in the areas of computer science, mathematics, and statistics; they would act as reviewers of big-data analyses and predictions. Algorithmists would take a vow of impartiality and confidentiality, much as accountants and certain other professionals do now. They would evaluate the selection of data sources, the choice of analytical and predictive tools, including algorithms and models, and the interpretation of results. In the event of a dispute, they would have access to the algorithms, statistical approaches, and datasets that produced a given decision."

The also extrapolate this thinking to an "external algorithmist": who would "act as impartial auditors to review the accuracy or validity of big-data predictions whenever the government required it, such as under court order or regulation. They also can take on big-data companies as clients, performing audits for firms that wanted expert support. And they may certify the soundness of big-data applications like anti-fraud techniques or stock-trading systems. Finally, external algorithmists are prepared to consult with government agencies on how best to use big data in the public sector.

As in medicine, law, and other occupations, we envision that this new profession regulates itself with a code of conduct. The algorithmists’ impartiality, confidentiality, competence, and professionalism is enforced by tough liability rules; if they failed to adhere to these standards, they’d be open to lawsuits. They can also be called on to serve as expert witnesses in trials, or to act as “court masters”, which are experts appointed by judges to assist them in technical matters on particularly complex cases.

Moreover, people who believe they’ve been harmed by big-data predictions—a patient rejected for surgery, an inmate denied parole, a loan applicant denied a mortgage—can look to algorithmists much as they already look to lawyers for help in understanding and appealing those decisions."

They also envision such professionals would work also work internally within companies, much the way internal auditors do today.

WebTrust for Certification Authorities: A model for AlgoTrust?
The authors bring up a good point: how would you go about auditing an algo? Although auditors lack the technical skills of algoritmists, it doesn't prevent them from auditing algorithms. The WebTrust for Certification Authorities (WebTrust for CAs) could be a model where assurance practitioners develop a standard in conjunction with algorithmists and enable audits to be performed against the standard. Why is WebTrust for CAs a model? WebTrust for CAs is a technical standard where an audit firm would "assess the adequacy and effectiveness of the controls employed by Certification Authorities (CAs)". That is, although the cryptographic key generation process is something that goes beyond the technical discipline of a regular CPA, it did not prevent the assurance firms from issuing an opinion.

So is it time for CPA Canada and the AICPA to put together a draft of "AlgoTrust"?

Maybe.

Although the commercial viability for such a service would be hard to predict, it would help at least start the discussion around of how society can achieve the outcomes Mayer-Schönberger and Cukier describe above. Furthermore, some of the ground work for such a service is already established. Fundamentally, an algorithm takes data inputs, processes it and then delivers a certain output or decision. Therefore, one aspect of such a service is to understand whether the algo has "processing integrity" (i.e. as the authors put it, to attest to the "accuracy or validity of big-data predictions"), which is something the profession established a while back through its SysTrust offering. To be sure this framework would have to be adapted. For example, algos are used to make decisions so there needs to be some thinking around how we would identify materiality in terms of  total number of "wrong" decisions as well as defining "wrong" in an objective and is auditable manner.

AlgoTrust, as a concept, illustrates not only a new area where auditors can move its assurance skill set into an emerging area but also how the profession can add thought leadership around the issue of dealing with opaqueness of algorithms - just as it did with financial statements nearly a century ago.




Thursday, June 5, 2014

Big Data Audit Analytics: Dirty data, explainability and data driven decision making

This is the second instalment of a multi-part exploration of the audit, assurance, compliance and related concepts brought up in the book,  Big Data: A Revolution That Will Transform How We Live, Work, and Think (the book is also available as an audiobook and hey while I am at it, here's the link to the e-book ). In this instalment, I explore another example of Big Data Audit analytics noted in the book and highlight the lessons learned from it. 

Con Edison and Exploding Manhole Covers
The book discussed the case of Con Edison (the public utility that provides electricity to New York City)  and its efforts to better predict, which of their manhole covers will experience "technical difficulties" from the relatively benign (e.g. smoking, heating up, etc) to the potentially deadly (where a 300 pound manhole can explode into the air and potentially harm someone). Given the potentially implications on life and limb, Con Edison needed a better audit approach, if you will, then random guessing as to which manhole cover would need maintenance to prevent such problems from occurring.

And this is where Cynthia Rudin, currently an associate professor of statistics at MIT, comes into the picture. She and her team of statisticians at Columbia University worked with Con Edison to devise a model that would predict, where the maintenance dollars should be focused.

The team developed a model with 106 (with the biggest factors being age of the manhole covers and if there were previous incidents) data predictors that ranked manhole covers in terms of which ones were most likely to have issues to those least likely. How accurate was it?  As noted in the book, the top 10% of those ranked most likely to have incidents ended up accounting for 44% of the manhole covers with potentially deadly incidents. In other words, Con Edison through big data analytics was able to better "audit" the population of manhole covers for potential safety issues.  The following video goes into some detail on what the team did:

What lessons can be drawn from this use of Big Data Analytics?
Firstly, algorithms can overcome dirty data. When Professor Rudin was putting together the data to analyse, it included data from the early days of Con Edison, i.e. as in 1880s when Thomas Edison was alive! To illustrate the book notes how there 38 different ways to enter the word "service box" into service records. This is on top of the fact that some of these records were hand written and were documented by people who didn't have a concept of a computer let alone big data analytics.

Second, although the biggest factors seem obvious in hindsight, we should be aware of such conclusions. The point is that data driven decision making is more defensible than a "gut feel", which speaks directly to the professional judgement versus statistical approach of performing audit procedures. The authors further point out that there at least 104 other variables that were contenders and their relative importance cannot be known without preforming such a rigorous analysis.  The point here is that for organizations to succeed and take analytics to the next level need to embrace culturally the concept that, where feasible, organizations should invest in the necessary leg work to obtain conclusions based on solid analysis.

Third, the authors highlight the importance of "explainability". They attribute to the world of artificial intelligence, which refers to the ability of the human user to drill deeper into the analysis generated by the model and explain to operational and executive management why a specific manhole needs to be investigated. In contrast, the authors point out that models that are complex due to the inclusion of numerous variables are difficult to explain. This is a critical point for auditors. As the auditors must be able defend why a particular transaction was chosen over another for audit, big data audit analytics needs to incorporate this concept of explainability.

Finally, it is but another example of how financial audits can benefit from such techniques, given the way non-financial "audits" are using big data techniques to audit and assess information. So internal and external auditors can highlight this (along with two examples identified in the previous post) as part of their big data audit analytics business case.