UWCISA: Official Blog: MIT

Friday, August 29, 2025

MIT’s GenAI Freakout: A "95% Failure Rate" or 95 Years Worth of Productivity?

The now infamous MIT study has found that 95% of enterprise AI projects are generating zero returns. Like many statistics in the early days of any emerging technology, the truth is more complicated. When we look beyond the headlines, the story isn’t about the failure of GenAI — it’s about how we define success, what we expect from AI, and how employees are already rewriting the rules of enterprise adoption.

The Measurement Trap: Financial Metrics vs. Productivity Reality

The main challenge with the article was that it focused on financial returns, not the success of the actual technology. The article highlights the difficulty in quantifying GenAI’s “micro-productivity gains”. They cite the following from a Fortune 1000 procurement executive:

"If I buy a tool to help my team work faster, how do I quantify that impact? How do I justify it to my CEO when it won't directly move revenue or decrease measurable costs?"

For those of us who advocate for GenAI, we can empathize with the executive’s dilemma. I call this “micro-productivity gains” because, although saving minutes with GenAI is hard to quantify, these small efficiencies accumulate across the economy.

A great example is using GenAI to generate images.

Let’s say we save 5 minutes per image using GenAI instead of going on the “perfect pic for my presentation hunt”. Over a handful of images, we don’t see the gains. However, over 10 million images those time savings amount to 95 years of productivity!

AI Has Already Won—Where It Can

The article itself actually testifies to the significant success that the technology is bringing to the average knowledge worker. Remarkably, the article actually said the following:

"AI has already won the war for simple work."

The core argument of the article is that standard generative AI technology is not yet equipped to fully replace human workers. For example, only 10% of respondents would entrust multi-week client management projects to AI rather than to human colleagues.

This, however, is not surprising. Anyone with a paid subscription certainly knows that GenAI needs multiple iterations to get the desired output.

The idea that we have such high expectations of the technology – for it to replace a junior lawyer – is a function of hype, the automation bias, and science-fiction movies.

From BYOD to BYOAI? AI Governance in Crisis

Perhaps the most interesting finding is that 90% of employees use generative AI regularly, regardless of official policies. The study found that “almost every single person used an LLM in some form for their work”.

History does not repeat itself, but it certainly rhymes. This is not the first time that employees have tried to impose consumer tech on enterprise IT. With the ascent of the iPhone and Android in the early 2010s, workers demanded the IT department figure out a way to make their devices work with the corporate email server. This Bring Your Own Device (BYOD) movement ultimately displaced BlackBerry's enterprise dominance.

The advent of Shadow AI, as the report aptly termed this trend, is more problematic. Formerly, it would take someone quite technically adept to figure out how to get corporate data onto their device. With Shadow AI, it is only a matter of copy and paste. Consequently, AI adoption raises a range of considerations related to privacy/confidentiality, data leakage, and regulatory compliance that organizations must address.

Although Shadow AI speaks to the resounding success of the tech, it also speaks to the urgent need to get AI governance in place.

Beyond the Hype: What the Study Actually Reveals

Though the headlines were laser-focused on the lack of cash flow resulting from the money invested in AI, a more careful read of the article reveals the productivity boom resulting from the technology. It's startling to think that three years ago GenAI was non-existent to most. Today, we are disappointed with it because it can't replace a junior at a professional services firm.

That said, the article offered some valuable insights into what success with GenAI can look like—a topic I'll be unpacking in a future post.

Author: Malik D. CPA, CA, CISA. The opinions expressed here do not necessarily represent UWCISA, UW, or anyone else. This post was written with the assistance of an AI language model.

Tuesday, December 31, 2019

If Artificial Intelligence can identify Shakespeare's linguistic signature, can similar techniques be used in audit?

Can AI help us identify who the real authors of classic literature?

According to MIT, the answer is yes. In a recent, article they noted how machine learning was used to identify how much a co-author helped fill in the banks for Shakespeare's Henry VIII. They had long suspected that John Fletcher was the individual but couldn't identify what passages he wrote into the play.

Petr Plecháč at the Czech Academy of Sciences in Prague trained the algorithms using plays that Fletcher that corresponded with the time that play was written because "because an author’s literary style can change throughout his or her lifetime, it is important to ensure that all works have the same style".

Based on his analysis, it appears half the play is written by Fletcher.

The experiment is a proof-of-concept that there is a certain linguistic signature to how people author things. In a sense, it means we have a unique pattern when it comes to how we construct sentences. With respect to the experiment run by Dr. Plecháč, the algorithm was able to detect what was written Fletcher because he "often writes ye instead of you, and ’em instead of them. He also tended to add the word sir or still or next to a standard pentameter line to create an extra sixth syllable."

Can this be used within an audit?

A paper co-authored by Dr. Kevin Moffitt of Rutgers University entitled "Identification of Fraudulent Financial Statements Using Linguistic Credibility Analysis" found just that. In the paper, they explained how they used a "decision support system called Agent99 Analyzer" to "test for linguistic differences between fraudulent and non-fraudulent MD&As". The decision support system was configured to identify linguistic cues that are used by "deceivers". The papers cites as examples of how deceivers when they speak "display elevated uncertainty, share fewer details, provide more spatio-temporal details, and use less diverse and less complex language than truthtellers".

The result?

The algorithm had "modest success in classification results demonstrates that linguistic models of deception are potentially useful in discriminating deception and managerial fraud in financial statements".

Results like these are a good indication of how the audit profession can move beyond the traditional audit procedures.

Author: Malik Datardina, CPA, CA, CISA. Malik works at Auvenir as a GRC Strategist that is working to transform the engagement experience for accounting firms and their clients. The opinions expressed here do not necessarily represent UWCISA, UW, Auvenir (or its affiliates), CPA Canada or anyone else