Big Data and analytics are, as we keep being told, hot, hot, hot. Trolling through masses of input from as many sources as we can grab will provide all sorts of valuable insights, and help us grow our businesses. But just as numerically illiterate (or maybe sneaky) marketers don’t seem to grasp that 9000% of nothing is still nothing – hence some impressive-sounding “growth” claims that mean zip – people haven’t quite wrapped their brains around the fact that when you analyse rubbish, you get statistically significant rubbish, also worth zip.
Yet despite this painful truth, vendors of all sorts of products keep flogging the notion that analytics is king, without even a peep about data quality. Granted, it’s probably because many don’t offer tools to validate and cleanse the data and so have nothing to talk about, but the result is often unfulfilled expectations for users.
That’s why it was refreshing to hear about Information Builders’ “3 Is”: Intelligence, Integrity, and Integration. The company’s wares not only include analytics, it also makes sure that the data being analysed is clean.
At a recent breakfast event, the company told attendees that over half of respondents in Information Week’s 2015 BI and Analytics survey cited data quality issues as the main barrier to analytics and BI success, topping ease of use, and by far trumping lack of analytics talent and expertise.
The issues, according to a report by Ventana Research, include the fact that the data is spread across too many places, that there’s no single version of the truth, that the data isn’t timely enough, and that it’s just plain not clean enough to use.
“Five or ten years ago, they said data integration would be simpler,” noted Jeremy Ballanco, VP of iWay software sales at Information Builders, “but it’s actually more complex.”
And, he added, there’s a lot of competition in the business intelligence (BI) and analytics space, mostly from companies that only focus on the BI and analytics, not the integration of the data. Yet with so much data coming from the outside that needs to be connected, there’s a huge challenge. People want information now.
“We’re in the smartphone generation,” Ballanco said. “People don’t want to wait 24 hours for ETL (extract, transform, load) – they want it right away. We need to deal with data in real time. But, at the same time, ensuring the integrity of the data adds value. We can build reports til the cows come home, but if the data is suspect, so are the results. It needs to be standardized and governed for it to have the most value. And data integrity is not a one-time process; it needs to be ongoing.”
“Data needs to be treated as a business asset,” he said. And since it’s a business asset, it’s owned by the business, not by IT.
Yet despite its importance, Gartner says that 25 percent of core business data is flawed. That means that potentially one quarter of decisions based on analyses are also suspect. “Do you want to make good decisions only 75 percent of the time?” Ballanco asked.
However, not all data is equal, he added. Companies should put their efforts into the data that matters the most, since it’s virtually impossible to fix everything. That means investing in master data management (MDM) and governance processes.
Why? Because with data scattered around the company, there’s no one version of the truth. There’s duplicate data, often incomplete, that some employees use, and a tremendous amount of time wasted trying to fix the same data issues, over and over, in those duplicate datasets. Ballanco dislikes the term MDM, though; he doesn’t think it’s meaningful to most people. Instead, he suggests “single view of the customer,” or “golden record,” to describe that one record that contains the correct, complete picture of the customer that’s actively managed and kept up-to-date.
“It’s harder to sell ‘MDM’ than ‘customer centricity’,” he observed. “Use words that resonate with stakeholders in the organization.” For IT, he suggested selling the idea of needing fewer staff for data cleansing, and improved productivity of standards-based application development. For finance and corporate, the hot button could be compliance, enhanced and accurate reporting, or improved decision-making.
Regardless, the process has to entail collaboration between IT and the business. “If the business is not in tune, things will not continue to get better,” Ballanco stated, as “Focus on the end state will be lost.”
That requires tools, and Ballanco cited Information Builders’ new Omni-Gen platform, which helps automate the creation of golden records according to accepted best practices, as an example of the functionality to look for. There needs to be complete auditability, he said, so years later it’s still easy to discover where a data element came from, what in it has been changed over time, and by whom. It needs to be simple for business users, and it needs to provide a way to flag elements that look wrong so they can be verified and fixed if necessary.
Not only do these golden records provide a single version of the truth, they can provide context to data in Hadoop data lakes, he added, bridging the gap between traditional data warehouse, OLTP, OLAP, and data marts and the modern Big Data world of data lakes, the Internet of Things, and streaming data.
“It will establish trust in Big Data,” he concluded. And Big Data is what feeds much of today’s analytics.