Wikibon: Big Data Adds Complexity, Nuance to the Data Quality Equation

InsightaaS: Jeff Kelly of Wikibon, the open source community of world-leading storage analysts, questions whether the “GIGO” principle that governs the utility of smaller data sets can be meaningfully applied in a big data environment.


There’s an old saying in the data management world: garbage in, garbage out, or GIGO. It means that the results of any data analysis project are only as good as the quality of the data being analyzed. Data quality is of critical importance when data sets are relatively small and structured. If you only have a small sample of data on which to perform your analysis, it better be good data. Otherwise, the resulting insights aren’t insights at all.

But does GIGO apply in Big Data scenarios? From a purely practical standpoint, is it realistic to cleanse and scrub data sets that reach the hundreds of terabytes to petabytes level to the same degree as smaller data sets? And since most Big Data lacks traditional structure, just what does data quality look like?

Read the entire post: