UWCISA: Official Blog: Analytics

Showing posts with label Analytics. Show all posts

Sunday, January 10, 2021

Data Tsunami: How big is the Data Deluge? (Part 1)

Was invited last year to speak about the Data Tsunami at the AICPA Engage conference, but I didn't quite make it there! Instead, I presented virtually.

So, will be breaking out some of the topics that I will be discussing over a few blog posts.

How big is the data tsunami?

Probably, the first thing that comes to mind is social data. The Internet truly unleashed the first torrent of the data tsunami. Google's search index alone is 100,000,000 GB. In terms, of social data we are looking at the following:

Twitter: 200 billion tweets per year (Twitter)
Facebook: 4 petabytes of data per day (WEF)
WhatsApp: 65 Billion Messages per day (WEF)
YouTube: 250 million hours per day (Variety)
Apple: 50 billion podcasts downloads (Fast Company

It's interesting how the data tsunami encompasses print, sight and sound. This is of course lends itself to analytics, but we will discuss that in a future post.

In terms of organizational data, Walmart generate 2.5 petabytes of data per hour. According to American Banker, 12 million petabytes (per year) of data flows through the financial industry. In terms of manufacturing, 6,000 fan blades manufactured by Rolls Royce generates 3 petabytes. It gives an idea of how much data is generated by the millions of parts that go into airplanes, trains and automobiles.

In terms of medical data, Stanford published the following:

“The sheer volume of health care data is growing at an astronomical rate: 153 Exabyte…were produced in 2013 and an estimated 2,314 Exabyte will be produced in 2020, translating to an overall rate of increase at least 48 percent annually.”

This obviously has tremendous privacy concerns.

How big will the data tsunami get?

According to IDC, "the collective sum of the world’s data will grow from 33 zettabytes this year [2018] to a 175ZB by 2025".

A couple of key contributors to this 'tsunami of data', will likely be the Internet of Things (IoT).

IDC predicts that 40+ billion IoT devices will generate 79.4 ZB of data by 2025. The other generation of 'digital exhaust' will likely be autonomous vehicles, which according to Intel produce about 4 terabytes of data per hour.

But the big question is so what?

We'll take a look at this question in the next post.

Author: Malik Datardina, CPA, CA, CISA. Malik works at Auvenir as a GRC Strategist that is working to transform the engagement experience for accounting firms and their clients. The opinions expressed here do not necessarily represent UWCISA, UW, Auvenir (or its affiliates), CPA Canada or anyone else.

Friday, August 7, 2020

CPAs to the Future: Why Data Governance?

In 2018, CPA Canada held the Foresight Sessions where they consulted CPAs and others how the profession should move forward. CPA Canada took a broad view of the topic and brought a diverse crowd of people to look at how things could unfold. There were a number of facilitated sessions that looked at a number of possible scenarios and how the profession could thrive in each of those scenarios. What I liked about the sessions was the diversity of thought. The environment was so open that attendees were even willing to talk about things like wealth inequality and its potential impact on the profession.

So where did things end up?

A report was published and the two key areas that became the focus where Value Creation and Data Governance.

Before looking at where we are now, it is good to take a step back and look at the underlying need to re-examine the profession. The CPA profession was borne in a book-based world where knowledge went through a manufacturing process of sorts. Regardless of whether it is the accounting standards themselves or the actual financial statements, the idea was there was a sense of finality to the process. The Internet, and more specifically the hyperlink, changed that. Data, information and knowledge are now networked.

It's not to say that the profession was unaware of this.

As a CPA who got his start in the world of Audit Data Analytics back in 2000 (yes, 20 years ago, when this type of work was known as computer-assisted audit techniques). Back then, IT-focused CPAs like myself used to tools like Audit Command Language or IDEA (sometimes referred to as 'generalized audit software'). This required the analysis of data largely for audit support.

CPA Canada also published the Information Integrity Control Guidelines (authored by Efrim Boritz and myself), which looked at how controls and "enablers" would create information integrity. The project was designed to take a fresh look at the traditional dichotomy between "general computer controls" and application controls". For example, the publication also looked at controls specifically around content.

Why Data Governance?

The challenge I have found is how to succinctly articulate how CPAs can play on the dividing between business and technology. Data governance probably is a good place to start. Even when you consider something more technical like a 'data scientist', a key component is to have business domain knowledge. Hence, to capture the future it makes sense to look at something that is beyond technology but rather data and information. After accountants have experience with data, but not configuring routers. Furthermore, as pointed out in this CPA Canada article "there is already a need for foundational standards of practice around all aspects of data governance and the data value chain".

Why are CPAs suited for data governance?

I have always felt that CPAs have a solid foundation in understanding information. Through the FASB framework, we realize the trade-offs between relevance and reliability, as well as understanding the reality of what is needed to audit something. When looking at the work Efrim and I have done around information integrity, this was a key resource because it is unique in understanding the parameters of information.

When teaching a class at Waterloo, I linked how this framework is now even relevant to social media companies. Google/YouTube, Facebook, and Twitter have all been "auditing" posts on their respective sites due to misinformation about COVID-19 or other matters. When covering this in-class, the concern I raised was around the "slippery slope". For example, does that mean all the other posts are "materially correct"? Such things illustrate how CPAs can add value when it comes to data governance.

Monday, October 2, 2017

What can driving algorithms tell us about robo-auditors?

On a recent trip to the US, decided to opt for a vehicle with the sat-nav as I was going to need directions and wanted to save on the roaming charges. I normally rely on Google Maps for guiding me around traffic jams but thought that the sat-nav would be a good substitute.

Unfortunately, it took me on a wild goose chase more than once – to avoid the traffic. I had blindly followed the algorithm's suggestions assuming it would save me time. I ended up being stuck at traffic lights waiting to a left-turn for what seemed like forever.

Then I realized that I was missing was that feature in Google Maps that tells you how much time you will save by taking the path less traveled. If it only saves me a few minutes, I normally stick to the highway as there are no traffic lights and things may clear-up. Effectively, what Google does is that it gives a way to supervise it’s algorithmic decision-making process.

How does this help with understanding the future of robot auditors?

Algorithms, and AI robots more broadly, need to give sufficient data to judge whether the algorithm is driving in the right direction. Professional auditing standards currently require supervision of junior staff – but the analogy can be applied to AI-powered audit-bots. For example, let’s say there is an AI auditor assessing the effectiveness of access controls and it’s suggesting to not rely on the control. The supervisory data needs to give enough context to assess what the consequences of taking such a decision and the alternative. This could include:

Were controls relied on in previous years? This would give some context as to whether this recommendation is in-line with prior experience.
What are the results of other security controls? This would give an understanding whether this is actually an anomaly or part of the same pattern of an overall bad control environment.
How close is it between the reliance and non-reliance decision? Perhaps this is more relevant in the opposite situation where the system is saying to rely on controls when it has found weaknesses. However, either way the auditor should understand how close it is to make the opposite judgment.
What is the impact on substantive test procedures? If access controls are not relied on, the impact on substantive procedures needs to be understood.
What alternative procedures that can be relied on? Although in this scenario the algo is telling us the control is reliable, in a scenario where it would recommend not relying on such a control.

What UI does the auditor need to run algorithmic audit?

On a broader note, what is the user interface (UI) to capture this judgment and enable such supervision?

Visualization (e.g. the vehicle moving on the map), mobile technology, satellite navigation and other technologies are assembled to guide the driver. Similarly, auditors need a way to pull together the not just the data necessary to answer the questions above but also a way to understand what risks within the audit require greater attention. This will help the auditor understand where the audit resources need to be allocated from nature, extent and timing perspective.

We all feel a sense of panic when reading the latest study that predict the pending robot-apocalypse in the job market. The reality is that even driving algos need supervision and cannot wholly be trusted on their own. Consequently, when it comes to applying algorithms and AI to audits, it’s going to take some serious effort to define the map that enables such automation let alone building that automation itself.

Author: Malik Datardina, CPA, CA, CISA. Malik works at Auvenir as a GRC Strategist that is working to transform the engagement experience for accounting firms and their clients. The opinions expressed here do not necessarily represent UWCISA, UW, Auvenir (or its affiliates), CPA Canada or anyone else

Monday, June 13, 2016

Can accounting errors ruin your life? JohnOliver explains how they can.

In this episode Last Week Tonight, John Oliver explores the world of debt buying:

The segment received wide publicity as he tried to out do Oprah by conducting the biggest giveaway on television - he bought $15 million worth of medical debt and forgave it. This article on Fortune does a good job of summarizing the show:

US households owe $12 trillion in debt of which $436 billion is 90+ days past due.
Companies who discharge the debt sell it for pennies on the dollar to a growing number of companies that specializes in debt buying.
One company, Encore Capital, notes that in 1 in 5 Americans owes or has owed them money.
Debt that's been paid "come back to life", which is affectionately known as Zombie debt.

There was some controversy, however, about who he worked with to write-off the debt (they noted their grievances here, to which John responded here) and the value of the debt. On the latter count, is it really fair to criticize an act of charity that improved the lives of approximately 9,000 people?

Nothing good happens in Excel.
But the segment which is most relevant to us is when he starts talking about how the information is actually sold. It is sold on spreadsheets. Oliver gets quite dramatic as he shares his his phobia of Excel and notes how "nothing good happens in Excel". He also explains that the spreadsheets are sold "as is"; meaning that the seller does not guaranty accuracy of the information related to the debt contracts being sold.

And that's where the jokes stops.

In the segment, he has footage from interviews with Jake Halpern, who wrote "Bad Paper: Chasing Debt from Wall Street to the Underworld". The book follows the life of a debt buyer of Aaron Siegel, who is born to a rich family in Buffalo, New York. He takes an array of characters, including his Brandon, who is an ex-con who does that gritty part of work of finding the debt, ensuring its good and collecting on it.

What caught my attention as I was going through book, is that it gave a bit more detail to what John Oliver mentioned about the banks selling the paper "as is". Halpern notes on page 58 of his book (see below for the link to the book), that when Washington Mutual sold Joanna and Theresa's debt to Aaron, the credits awarded against their accounts that were not reflected in the spreadsheet that was given to the debt buyer.

And that's how accounting errors can ruin lives.

When you read the life stories of these two ladies it's heart wrenching to think that a few lines on an Excel spreadsheet could have a detrimental impact on their lives. Some would cynically say this is over dramatic and try to find reason to blame Joanna and Theresa falling into this problem. But I don't think that's fair. When you read the lives of these people, it's clear that they were affected by factors beyond their control. It's really this broken system of debt collection that is responsible for them failing to get the debt relief that they were owed.

The way accounting systems and spreadsheets are designed and operated can have real impact on real people. As an accountant myself, I often wondered what value is accounting in the grand scheme of things. But as Halpern's story illustrates the accountants, bookkeepers, etc. had a real impact on the livesof these two women.

No one is saying that accountants have the same impact on the lives of people the way a cancer specialist does. But at the same time a few a lines on Excel spreadsheet could be the difference between perpetual anxiety and a good nights sleep.

Thursday, April 23, 2015

Google's Mobile Launch: It really may be about the big data!

Yesterday Google launched "Project Fi" - Google's foray into providing mobile service. As CBC reported the service "will cost $20 US a month plus $10 per gigabyte of data used" (I am still an accountant, trained to find the numbers!). According to the Google blog post on Project Fi, the service will:

Find the fastest connection: The service will enable the Google Nexus 6 to switch to the fastest mobile connection, whether it's home/work WiFi, WiFi hot spot, Sprint's network or T-Mobile's network.
Seamless transition between networks: The above service is not just about data, but also voice: when you transition between networks, you can keep on talking without any disruption.
Ties cellphone number to the cloud, not the device: Is this the end of SIM cards? With this service, you can take a call on any device (tablet, laptop, etc.)
Refund for unused data: While implied in the CBC article above, Google has structured the plan to refund the customer for the amount of unused data.

As I had noted in an earlier blog post, one of the possible reasons that Google is entering into mobile world is to get access to mobile data. Specifically:

"the hidden strategic objective is a big data play: what could Google do with the new data feeds? Sure they already get from being able to correlate the information it already gets from their Android devices. However, they will now be able to analyze this data with the additional data that moves through their MVNO network, such as demographic information and location data. What good is this to Google? In a word: advertising. Advertising is still the biggest source of Google's revenue and adding this pool of data to their reservoir can only add to the bottom line."

Although this project is in "user testing" mode, the video indicates that this is not simply a giant "user acceptance test". Specifically, the announcer says "Getting it in users hands and finding out all the new amazing things we can build that will make your lives easier." (Go to 1:34, if you don't have the 2 minutes to spare)

In other words, the service will actively work with the early adopters to target services that work with the users. Of course these services will be a better way to target ads, such as location based advertising or augmented reality.With respect to the latter, you could use your phone to interact with an augmented reality billboard, store, etc. And Google could turn these numbers back to potential advertisers to demonstrate the effectiveness of such technology. In fact, Google (according to the Verge) invested over half a billion in Magic Leap, an augmented reality firm. But let's see how this rolls out.

Monday, March 2, 2015

Explaining Big Data Technology in under 2 minutes

The class I was teaching this week was looking at Big Data from multiple perspectives, including security. The approach I used with cloud last week was to identify the key differences between ASP and cloud. With Big Data the key difference is between the SQL world of relational databases and the non-relational world of NoSQL technologies, such as Hadoop.

I took a course on Big Data that explained how there is a distributed architecture that enables a "master" to send out the job to vast army of "slaves" to complete the processing. However, how do I explain this in a succinct and effective way to the students?

In a word, YouTube.

I found this video that gives a pretty good overview of Big Data, but it's real value is how it explains how Hadoop works at a high level (go to 4:10):

Of course, we will be covering social media later in the term :)

Wednesday, August 6, 2014

Worth mentioning: KPMG's take on the state of tech in the audit profession

In a recent post (as in just this week) on Forbes, KPMG's James P. Liddy who is the Vice Chair, Audit and Regional Head of Audit, Americas put out a great post that summarizes the current state of analytics in financial audits.

He diplomatically summarizes the current state of the financial audit as "unchanged for more than 80 years since the advent of the classic audit" while stating "[a]dvances in technology and the massive proliferation of available information have created a new landscape for financial reporting. With investors now having access to a seemingly unlimited breadth and depth of information, the need has never been greater for the audit process to evolve by providing deeper and more relevant insights about an organization’s financial condition and performance –while maintaining and continually improving audit quality." [Emphasis added]

For those that have started off our careers in the world of financial audit as professional accountants and then moved to the world of audit analytics or IT risk management, we have always felt that technology could help us to get audits done more efficiently and effectively.

I was actually surprised that he stated that auditors "perform procedures over a relatively small sample of transactions – as few as 30 or 40 – and extrapolate conclusions across a much broader set of data". We usually don't see this kind of openness when it comes to discussing the inner-workings of the profession. However, I think that discussing such fundamentals is inevitable given those outside the profession are embracing big data analytics in "non-financial audits". For example, see this post where I discuss the New York City fire department's use of big data analytics to identify a better audit population when it comes to identifying illegal conversions that are a high risk and need to be evacuated.

For those that take comfort in the regulated nature of the profession as protection of disruption, we should take note of how the regulators are embracing big data analytics. Firstly, the SEC is using RoboCop to better target financial irregularities. Secondly, according to the Wall Street Journal, FINRA is eyeing an automated audit approach to monitoring to risk. The program is known as "Comprehensive Automated Risk Data System" (CARDS). As per FINRA:

"CARDS program will increase FINRA's ability to protect the investing public by utilizing automated analytics on brokerage data to identify problematic sales practice activity. FINRA plans to analyze CARDS data before examining firms on site, thereby identifying risks earlier and shifting work away from the on-site exam process". In the same post, Susan Axelrod, FINRA's Executive Vice President of Regulatory Operations, is quoted as saying "The information collected through CARDS will allow FINRA to run analytics that identify potential "red flags" of sales practice misconduct and help us identify potential business conduct problems with firms, branches and registered representatives".

As a result, I agree with Mr. Libby: sticking to the status quo is no longer a viable strategy for the profession.

Tuesday, July 16, 2013

The Power of Visualized Analytics

In my new role at Deloitte, I have recently come across tools, such as Tableau or Qlikview, that allow users to "visualize data". To be honest I didn't think they would add much value compared to "rule-based analytic tools", such as IDEA and ACL. However, after using these tools I realized the real power of being able to visualize data in contrast to producing an exception. It brings the dashboard concept within the executive management suite to the analyst or other business professional. But as they say "seeing is believing".

So let's try an experiment.

I recently came across an amazing visualization that really illustrates that power of visualization that visualizes economic data, specifically the distribution of wealth.

But don't click on it yet!

To get the most of the experiment first read this report (which the visualization is based on) to see how the stats hit you in terms of impact.

So here is an excerpt from the Oxfam report which the visualization is based on. (The numbers at the end of the sentence are footnotes; see the original report for the sources)

And

So now let's see how this data (plus other sources) hits you when it is visualized:

Is there really any contest?

What I've realized is that the visualization really enables the business user to bring together multiple dimensions into a single sheet of paper and enables you to tell the story about the underlying data. Having said that, I do believe that there is a complementary relationship between visualized analytics and rule-based analytics. For example, if you want to quantify the difference between budgets-and-actuals, produce a list of exceptions, etc, then rule based analytics are better for such a purpose. Furthermore, visualizations can help explain the results of rule-based analytic procedures.