Thursday, June 5, 2014

Big Data Audit Analytics: Dirty data, explainability and data driven decision making

This is the second instalment of a multi-part exploration of the audit, assurance, compliance and related concepts brought up in the book,  Big Data: A Revolution That Will Transform How We Live, Work, and Think (the book is also available as an audiobook and hey while I am at it, here's the link to the e-book ). In this instalment, I explore another example of Big Data Audit analytics noted in the book and highlight the lessons learned from it. 

Con Edison and Exploding Manhole Covers
The book discussed the case of Con Edison (the public utility that provides electricity to New York City)  and its efforts to better predict, which of their manhole covers will experience "technical difficulties" from the relatively benign (e.g. smoking, heating up, etc) to the potentially deadly (where a 300 pound manhole can explode into the air and potentially harm someone). Given the potentially implications on life and limb, Con Edison needed a better audit approach, if you will, then random guessing as to which manhole cover would need maintenance to prevent such problems from occurring.

And this is where Cynthia Rudin, currently an associate professor of statistics at MIT, comes into the picture. She and her team of statisticians at Columbia University worked with Con Edison to devise a model that would predict, where the maintenance dollars should be focused.

The team developed a model with 106 (with the biggest factors being age of the manhole covers and if there were previous incidents) data predictors that ranked manhole covers in terms of which ones were most likely to have issues to those least likely. How accurate was it?  As noted in the book, the top 10% of those ranked most likely to have incidents ended up accounting for 44% of the manhole covers with potentially deadly incidents. In other words, Con Edison through big data analytics was able to better "audit" the population of manhole covers for potential safety issues.  The following video goes into some detail on what the team did:

What lessons can be drawn from this use of Big Data Analytics?
Firstly, algorithms can overcome dirty data. When Professor Rudin was putting together the data to analyse, it included data from the early days of Con Edison, i.e. as in 1880s when Thomas Edison was alive! To illustrate the book notes how there 38 different ways to enter the word "service box" into service records. This is on top of the fact that some of these records were hand written and were documented by people who didn't have a concept of a computer let alone big data analytics.

Second, although the biggest factors seem obvious in hindsight, we should be aware of such conclusions. The point is that data driven decision making is more defensible than a "gut feel", which speaks directly to the professional judgement versus statistical approach of performing audit procedures. The authors further point out that there at least 104 other variables that were contenders and their relative importance cannot be known without preforming such a rigorous analysis.  The point here is that for organizations to succeed and take analytics to the next level need to embrace culturally the concept that, where feasible, organizations should invest in the necessary leg work to obtain conclusions based on solid analysis.

Third, the authors highlight the importance of "explainability". They attribute to the world of artificial intelligence, which refers to the ability of the human user to drill deeper into the analysis generated by the model and explain to operational and executive management why a specific manhole needs to be investigated. In contrast, the authors point out that models that are complex due to the inclusion of numerous variables are difficult to explain. This is a critical point for auditors. As the auditors must be able defend why a particular transaction was chosen over another for audit, big data audit analytics needs to incorporate this concept of explainability.

Finally, it is but another example of how financial audits can benefit from such techniques, given the way non-financial "audits" are using big data techniques to audit and assess information. So internal and external auditors can highlight this (along with two examples identified in the previous post) as part of their big data audit analytics business case.







No comments: