Monday, December 29, 2014

Low Decision Agility: BigData's Insurmountable Challenge?

Working in the field of data analytics for over decade there is one recurring theme that never seems to go away: the overall struggle organizations have with getting their data in order.

Courtesy of this link
Although this is normally framed in terms of data quality and data management, it's important to link this back to the ultimate raison d'etre for data and information: organizational decision making. Ultimately, an organization has significant data and information management challenges it culminates into a lack of "decision agility" for executive or operational management. I define decision agility as follows:

"Decision agility is the ability of an entity to provide relevant information 
to a decision maker in a timely manner." 

Prior to getting into the field, you would think that with all the hype of the Information Age it would be easy as pressing a button for a company to get you the data that you need to perform the analysis you need to do. However, after getting into the field, you soon realize how wrong this thinking: most organizations have low-decision agility.

I would think it is fair to say that this problem hits those involved in external (financial) audits the hardest. As we have tight budgets, low-decision agility at the clients we audit makes it cost-prohibitive to perform what is now known as audit analytics (previously known as CAATs). Our work is often reigned in by the (non-IT) auditors running the audit engagement because it is "cheaper" do the same test manually rather than parse our way through the client's data challenges

So what does this have to do with Big Data Analytics?

As I noted in my last post, there is the issue of veracity - the final V in the 4 Vs definition of Big Data. However, veracity is part of the larger problem of low decision agility that you can find at organizations. Low-decision agility emerges from a number of factors and can have implications on a big data analytics initiative at an organization. These factors and implications include:

  • Wrong data:  Fortune, in this article, notes there is the obvious issue of "obsolete, inaccurate, and missing information" data records itself. Consequently, the big data analytics initiative needs to assess the veracity of the underlying data to understand how much work needs to be done to clean up the data before meaningful insights can be drawn from the data. 
  • Disconnect between business and IT: The business has one view of the data and the IT folks see the data in a different way. So when you try to run a "simple" test it takes a significant amount of time to reconcile business's view of the data model to IT's view of the data model. To account for this problem there needs to be some effort in determining how to sync the user's view of the data and IT's view of the data on an ongoing basis to enable the big data analytic to rely on the data that sync's up with the ultimate decision maker's view of the world.  
  • Spreadsheet mania: Let's fact it: organizations treat IT as an expense not as an investment. Consequently, organizations will rely on spreadsheets to do some of the heavy lifting for the information processing because it is the path of least resistance. The overuse of spreadsheets can be a sign of an IT system that fails to meets the needs of the users. However, regardless of why they are used, the underlying problem is dealing with these vast array of business-managed applications that are often fraught with errors and outside the controls of production system. The control and related data issues become obvious during compliance efforts, such as SOX 404 or major transitions to new financial/data standards, such as the move to IFRS. When developing big data analytics, how do you account for the information trapped in these myriad little apps outside of IT's purview? 
  • Silo thinking: I remember the frustration of dealing with companies that lacked a centralized function that had a holistic view of the data. Each department would know it's portion of the processing rules, etc. but would have no idea of what happened upstream or downstream. Consequently, an organization needs to create a data governance structure that understands the big picture and can identify and address the potential gaps in the data set before it is fed into the Hadoop cluster.  
  • Heterogenous systems: Organizations with a patch-work of systems require extra effort from getting the data formatted and synchronized. InfoSec specialists deal with this issue of normalization when it come to security log analysis: the security logs that are extracted from different systems need to have the event IDs, codes, etc. "translated" into a common language to enable a proper analysis of events that are occurring across the enterprise. The point is that big data analytics must also perform a similar "translation" to enable analysis of data pulled from different systems. Josh Sullivan of Booz Allen states: "...training your models can take weeks and weeks" to recognize what content fed into the system are actually the same value. For example, it will take a while for the system to learn that female and woman are the same thing when looking at gender data. 
  • Legacy systems:  Organizations may have legacy systems which do not retain data, are hard to extract from and difficult to import into other tools. Consequently, this can cost time and money to get the data into a usable format that will also need to be factored into the big data analytics initiative.
  • Business rules and semantics: Beyond the heterogenity differences between systems there can also be a challenge in how something is commonly defined. A simple example is currency: an ERP that expand multiple countries the amount reported may be in the local currency or the dollar, but requires the metadata to give that meaning. Another issue can be that different user group define something different. For example, for a sale for the sales/marketing folks may not mean the same thing as a sale for the finance/accounting group (e.g. the sales & marketing people may not account for doubtful accounts or incentives that need to be factored in for accounting purposes). 
Of course these are not an exhaustive list of issues, but it gives an idea of how the reality of analytics is obscured the tough reality of state of data.  

In terms of the current state of data quality, a recent blog post by Michele Goetz of Forrester noted that 70% of the executive level business professionals they interviewed spent more than 40% of their time vetting and validating data. (Forrester notes the following caveat about the data: "The number is too low to be quantitative, but it does give directional insight.")

Until organizations get to a state of high decision agility - where business users spend virtually no time vetting/validating the data - organizations may not be able to reap the full benefits of a big data analytics initiative. 



Tuesday, December 23, 2014

How would you explain BigData to a business professional? (Updated)

Most people are familiar with the 4 Vs definition of Big Data: Volume, Variety, Velocity and Veracity. (And if you are not here is an infographic courtesy of IBM:)


I have written about the Big Data in the past, specifically, on its implication on financial audits (here, here, and here) as well as privacy. However, I was meeting with people recently and were discussing big data and I found that business professional understood what it was divorced from it operational implications. This is problematic as the potential for big data is lost if we don't understand how big data has changed the underlying analytical technique.

But first we must look at the value perspective: how is big data different from the business intelligence techniques that business have used for decades?

From a value perspective, big data analytics and business intelligence (BI) ultimately have the same value proposition: mining the data to find trends, correlations and other patterns to identify new products and services or improve existing offerings and services.

However, what Big Data really is about is that previous analytical technique that was limited due to technological constraints no longer exists. What I am saying is that big data is more about how we can do analysis differently instead of the actual data itself. To me big data is a trend in analytical technique where the volume, variety, or velocity is no longer an issue in performing analysis. In other words - to flip the official definition into an operational statement - the size, shape (e.g. unstructured or structured), speed - is no longer an impediment to your analytical technique of choice.

And this is where you, as a TechBiz Pro, need to weigh the merits of walking them through the technological advances in the NoSQL realm. That is, how did we go from the rows & columns world of BI to the open world of Big Data?  Google is a good place to start. It is pretty good illustration of big data techniques in action: using Google we get extract information from the giant mass of data we know as the Internet (volume), within seconds (velocity) and regardless if it's video, image or text (variety). However, Internet companies found that the existing SQL technologies inadequate for the task and so they went into the world of NoSQL technologies such as Hadoop (Yahoo), Cassandra (Facebook), and Google's BigTable/MapReduce. The details aren't really important but the importance lies in the fact that these companies had to invent tools to deal with the world of big data.

And this leads to how it is has disrupted the conventional BI thinking when it comes to analysis.

From a statistical perspective, you no longer have to sample the data and extrapolate to the larger population. You can just load up the entire populations, apply your statistical modeling imagination to it and identify the correlations that are there.  Chris Anderson, of Wired, noted that this is a seismic change in nothing less than the scientific method itself. In a way what he is saying is that now that you can put your arms around all the data you no longer really need a model. He did get a lot of heat for saying this, but he penned the following to explain his point:

"The big target here isn't advertising, though. It's science. The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.

But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the "beautiful story" phase of a discipline starved of data) is that we don't know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on."

Science aside the observation that Chris Anderson makes has big implications for business decision making. Advances in big data technologies can enable the deployment of statistical techniques that were previously not feasible and can yield insights without having to bother with model development. Statisticians and data scientists can play with the data and find something that works through trial and error. From financial audit perspective, this has tremendous implications - once we figure out the data extraction challenge. And that's where veracity comes in, which is the topic of a future blogpost.

But to close on a more practical level, companies such as Tesco are leveraging big data analytics to improve their bottom. An example, courtesy of Paul Miller from the Cloud of Data blog/podcast site, is how Tesco extracted the following insight: “[a] 16 degree sunny Saturday in late April will cause a spike. Exactly the same figures a couple of weeks later will not, as people have had their first BBQ of the season”. In terms of overall benefits to the company, he notes “Big Data projects deliver huge returns at Tesco; improving promotions to ensure 30% fewer gaps on shelves, predicting the weather and behaviour to deliver £6million less food wastage in the summer, £50million less stock in warehouses, optimising store operations to give £30million less wastage.”

Wednesday, December 17, 2014

SEC and the Quants: Will RoboCop get a BigData overhaul?

As reported in this Forbes article in 2013, the SEC began to use so-called RoboCop to assist with their regulatory duties.

Who is RoboCop?


No, it's not that infamous crime-fighting cyborg from the late-80s (coincidentally remade in 2014). It is actually the Accounting Quality Model (AQM) - not quite as exciting I know. According to Forbes:

"AQM is an analytical tool which trawls corporate filings to flag high-risk activity for closer inspection by SEC enforcement teams. Use of the AQM, in conjunction with statements by recently-confirmed SEC Chairman Mary Jo White and the introduction of new initiatives announced July 2, 2013, indicates a renewed commitment by the SEC to seek out violations of financial reporting regulations. This pledge of substantial resources means it is more important than ever for corporate filers to understand SEC enforcement strategies, especially the AQM, in order to decrease the likelihood that their firm will be the subject of an expensive SEC audit."

Another interesting point raised by the Forbes article is the use of XBRL in this accounting model: "AQM relies on the newly-mandated XBRL data which is prone to mistakes by the inexperienced. Sloppy entries could land your company’s filing at the top of the list for close examination."

(On a side note: AICPA has published this study to assist XBRL filers ensure that they are preparing quality statements, given that there are many possible errors; as noted in this study).

Within this context, we should take note of how the SEC is hiring "quantitative analysts" (or "quants" for short). As noted in this WSJ article:

"And Wall Street firms, for their part, are able to offer quantitative analysts—or “quants”—far higher pay packages than the regulator. The SEC’s access to market data also remains limited. In 2012, it approved a massive new computer system to track markets, known as the Consolidated Audit Trail, but the system isn’t likely to come online for several years, experts say."

Could the SEC pull a fast one and become the source of innovation? Although the WSJ article seems to downplay the possibility that the SEC can outpace the firms, it is not something that the audit industry can ignore.

As noted in a previous post on Big Data, it was just this type of mindset that Mike Flowers of New York City looked to revolutionize how the NYC leveraged big data to improve its "audit" of illegal conversions. Perhaps the SEC may follow in his stead.

Thursday, December 11, 2014

Time for Windows 10? I can't wait!

I have been overly optimistic about Windows in the past, but here me out!

Boy Genius published a post on the future Windows 10 that they are releasing next year. (Note Microsoft decided to skip Windows 9 altogether).

And does it looks good. As can be seen in this video, it will feature Cortana who is Microsoft's personal digital assistant that incorporates voice search, voice commands (i.e. you can get Cortana to set-up an appointment with you) and machine learning (i.e. it learns from your interactions.
Impressed?

It speaks to the overall move towards using natural language processing (NLP) and elements of Artificial Intelligence. Apple was arguably first to the scene with its Siri application. However, IBM's Watson is also a clearer example of where this technology is heading. Gartner refers to these types of technologies as "smart machines", which they claim has the following implications:

"Most business and thought leaders underestimate the potential of smart machines to take over millions of middle-class jobs in the coming decades," said Kenneth Brant, research director at Gartner. "Job destruction will happen at a faster pace, with machine-driven job elimination overwhelming the market's ability to create valuable new ones."

Will Cortana take your job? Well, let's just enjoy the possibility that Microsoft may build in some real cool NLP technology into your every-day computer and worry about that one a future post!



Tuesday, December 9, 2014

Europe vs Google et al: Long term ramifications of the Snowden Revelations?

Wall Street Journal had an interesting piece today where they discuss how the "clash that pits [European] governments against the new tech titans, established industries against upstart challengers, and freewheeling American business culture against a more regulated European framework". For example, "[t]he European Parliament in late October called on Internet companies operating in the region to “unbundle” its search engines from its other commercial properties". The obvious company that would be impacted by this is Google (and the WSJ article notes that Microsoft is aiding and abetting such calls to help boost its own profile).

However, the WSJ article notes: "And perhaps most fundamentally, it is about control of the Internet, the world’s common connection and crucial economic engine that is viewed as being under the sway of the U.S. This exploded following the revelations by Edward Snowden of widespread U.S. government surveillance of Americans and Europeans—sometimes via U.S. company data and telecommunications networks."

This would not be the first article to note that the Snowden revelations have put a chill on the move to the (US) cloud. However, it does highlight how far the revelations have gone to force the hand of European regulators to at least act in public like they are trying to do something to protect the data of their companies.

What the article did not into much detail is the likely reason that the Europeans are concerned. Although it may presented to be an issue of privacy or anti-surveillance, the likely real reason is industrial espionage.  As per the Snowden revelations, governmental spy agencies are not just interested in obtaining information on matters relating to national security, but are also interested in obtaining data related to international trade or other business dealings. As noted by the CBC, “NSA does not limit itsespionage to issues of national security and he cited German engineering firm,Siemens as one target”. It is unfair just to single out the US for such actions, as other governments do it as well. For example, Canada’s CSEC is also alleged to be involved in similar activity. The Globe & Mail reporting that “Communications SecurityEstablishment Canada (CSEC) has spied on computers and smartphones affiliatedwith Brazil’s mining and energy ministry in a bid to gain economic intelligence.” Former Carleton University Professor Martin Rudner explains (in the same G&M article) that the objective of such surveillance is to give Canadian government a leg up during negotiations, such as NAFTA. 

Although most have forgotten the commercial rivalries (see quote from then US president Woodrow Wilson about the roots of international conflict) that exist between the G8 Nations, it is important to understand the implications that this has for data security on the cloud. Anything that is sensitive and is relevant to business dealings should never be put on the cloud. Of course it is a matter of judgment of what constitutes "sensitive", but the criteria can effectively "reverse engineered" based on what was revealed.

Friday, December 5, 2014

Remembering those Blackberry days

The Globe and Mail reported on BlackBerry's latest approach in terms of rebuilding its mobile user base. The company is offering $400 trade + a $150 gift card for anyone who trades in their iPhone for the rather odd square shaped Passport. Here is the review from the Verge regarding the latest:


Coincidentally, I came across an BlackBerry of mine: the Torch. I remembered thinking that after using the device how it was the perfect compromise between the touch screen and the classic keyboard. However, that feeling faded quite quickly: the device was so under-powered compared to the competition and of course it lacked the apps that you could find in the Apple AppStore. But at the time I could never imagine giving up the physical QWERTY keyboard.

Since then I have moved onto Android and more specifically to the SwiftKey keyboard - to the point I can't go back to a physical keyboard!

How did BlackBerry fail to keep up with the times?

As noted in this article, Mike Lazaridis the founder of the CEO, was inspired to develop the BlackBerry when he recalled his teacher's advice while watching a presentation in 1987 - almost a decade before the Internet - on how Coke used wireless technology to manage the inventory at the vending machines. What was his teacher's advice? His teacher advised him not to get swept in the computer craze as the real boon lay in integrating wireless technology with computers.

BlackBerry caused a storm in the corporate introducing it's smartphones in 1998. It went on to dominate the corporate smartphone market as the gold standard in mobile communications. The following graphic from Bloomberg really captures the subsequent rise and fall quite well:


What happened how did the iPhone, unveiled in 2007, and the Android Operating System outflank the Blackberry? This article in the New Yorker larger blames BlackBerry's inability to understand the trend of "consumerization of IT": users wanted to use their latest iPhone or Android device instead of the BlackBerry in the corporate environment - and was it just a matter of technology to make this happen.

Although luminaries, such as Clay Christensen, have written extensively on the challenge of innovation. And there's always the problem of hindsight bias. However, is the problem more basic? When we look at the financial crisis, some people like to blame poor modeling. But I think that is more convenient than accepting the reality that people got swept up in the wave.

Isn’t it fair to say that people knew that house of cards was going to come down (and some of the investment banks were even betting on it falling apart), but were overly optimistic that they would get out before everyone else does?

But that’s the point.

When we are in a situation where we are surrounded by people who confirm our understanding of the world – we may believe them instead of trying to see if our understanding of the situation is correct. With the housing bubble, the key players wanted to believe that those models were correct – even though models have failed the infamous Long Term Capital Management. With BlackBerry what was it? Did they think their hold over the corporate IT? What I wonder is did they not even try to see within their families and those around them who were using the iPhone or Android devices? Weren’t they curious what “all the fuss was about”?

Although this is problem with many of us who want to believe that the present situation is going to continue indefinitely (especially when things are going our way), there are others who do stay on top of things. Most notably is the Encyclopedia Britannica that actually stopped issuing physical encyclopedias and moved to the digital channel instead.

Change is a challenge, but the key is to be prepared to admit that the current way of doing things can be done better, faster and in radically different way.