Monday, December 29, 2014

Low Decision Agility: BigData's Insurmountable Challenge?

Working in the field of data analytics for over decade there is one recurring theme that never seems to go away: the overall struggle organizations have with getting their data in order.

Courtesy of this link
Although this is normally framed in terms of data quality and data management, it's important to link this back to the ultimate raison d'etre for data and information: organizational decision making. Ultimately, an organization has significant data and information management challenges it culminates into a lack of "decision agility" for executive or operational management. I define decision agility as follows:

"Decision agility is the ability of an entity to provide relevant information 
to a decision maker in a timely manner." 

Prior to getting into the field, you would think that with all the hype of the Information Age it would be easy as pressing a button for a company to get you the data that you need to perform the analysis you need to do. However, after getting into the field, you soon realize how wrong this thinking: most organizations have low-decision agility.

I would think it is fair to say that this problem hits those involved in external (financial) audits the hardest. As we have tight budgets, low-decision agility at the clients we audit makes it cost-prohibitive to perform what is now known as audit analytics (previously known as CAATs). Our work is often reigned in by the (non-IT) auditors running the audit engagement because it is "cheaper" do the same test manually rather than parse our way through the client's data challenges

So what does this have to do with Big Data Analytics?

As I noted in my last post, there is the issue of veracity - the final V in the 4 Vs definition of Big Data. However, veracity is part of the larger problem of low decision agility that you can find at organizations. Low-decision agility emerges from a number of factors and can have implications on a big data analytics initiative at an organization. These factors and implications include:

  • Wrong data:  Fortune, in this article, notes there is the obvious issue of "obsolete, inaccurate, and missing information" data records itself. Consequently, the big data analytics initiative needs to assess the veracity of the underlying data to understand how much work needs to be done to clean up the data before meaningful insights can be drawn from the data. 
  • Disconnect between business and IT: The business has one view of the data and the IT folks see the data in a different way. So when you try to run a "simple" test it takes a significant amount of time to reconcile business's view of the data model to IT's view of the data model. To account for this problem there needs to be some effort in determining how to sync the user's view of the data and IT's view of the data on an ongoing basis to enable the big data analytic to rely on the data that sync's up with the ultimate decision maker's view of the world.  
  • Spreadsheet mania: Let's fact it: organizations treat IT as an expense not as an investment. Consequently, organizations will rely on spreadsheets to do some of the heavy lifting for the information processing because it is the path of least resistance. The overuse of spreadsheets can be a sign of an IT system that fails to meets the needs of the users. However, regardless of why they are used, the underlying problem is dealing with these vast array of business-managed applications that are often fraught with errors and outside the controls of production system. The control and related data issues become obvious during compliance efforts, such as SOX 404 or major transitions to new financial/data standards, such as the move to IFRS. When developing big data analytics, how do you account for the information trapped in these myriad little apps outside of IT's purview? 
  • Silo thinking: I remember the frustration of dealing with companies that lacked a centralized function that had a holistic view of the data. Each department would know it's portion of the processing rules, etc. but would have no idea of what happened upstream or downstream. Consequently, an organization needs to create a data governance structure that understands the big picture and can identify and address the potential gaps in the data set before it is fed into the Hadoop cluster.  
  • Heterogenous systems: Organizations with a patch-work of systems require extra effort from getting the data formatted and synchronized. InfoSec specialists deal with this issue of normalization when it come to security log analysis: the security logs that are extracted from different systems need to have the event IDs, codes, etc. "translated" into a common language to enable a proper analysis of events that are occurring across the enterprise. The point is that big data analytics must also perform a similar "translation" to enable analysis of data pulled from different systems. Josh Sullivan of Booz Allen states: "...training your models can take weeks and weeks" to recognize what content fed into the system are actually the same value. For example, it will take a while for the system to learn that female and woman are the same thing when looking at gender data. 
  • Legacy systems:  Organizations may have legacy systems which do not retain data, are hard to extract from and difficult to import into other tools. Consequently, this can cost time and money to get the data into a usable format that will also need to be factored into the big data analytics initiative.
  • Business rules and semantics: Beyond the heterogenity differences between systems there can also be a challenge in how something is commonly defined. A simple example is currency: an ERP that expand multiple countries the amount reported may be in the local currency or the dollar, but requires the metadata to give that meaning. Another issue can be that different user group define something different. For example, for a sale for the sales/marketing folks may not mean the same thing as a sale for the finance/accounting group (e.g. the sales & marketing people may not account for doubtful accounts or incentives that need to be factored in for accounting purposes). 
Of course these are not an exhaustive list of issues, but it gives an idea of how the reality of analytics is obscured the tough reality of state of data.  

In terms of the current state of data quality, a recent blog post by Michele Goetz of Forrester noted that 70% of the executive level business professionals they interviewed spent more than 40% of their time vetting and validating data. (Forrester notes the following caveat about the data: "The number is too low to be quantitative, but it does give directional insight.")

Until organizations get to a state of high decision agility - where business users spend virtually no time vetting/validating the data - organizations may not be able to reap the full benefits of a big data analytics initiative. 



No comments: