Across the Net

Wikibon: Big Data Adds Complexity, Nuance to the Data Quality Equation

2013-07-18

InsightaaS: Jeff Kelly of Wikibon, the open source community of world-leading storage analysts, questions whether the “GIGO” principle that governs the utility of smaller data sets can be meaningfully applied in a big data environment.

There’s an old saying in the data management world: garbage in, garbage out, or GIGO. It means that the results of any data analysis project are only as good as the quality of the data being analyzed. Data quality is of critical importance when data sets are relatively small and structured. If you only have a small sample of data on which to perform your analysis, it better be good data. Otherwise, the resulting insights aren’t insights at all.

But does GIGO apply in Big Data scenarios? From a purely practical standpoint, is it realistic to cleanse and scrub data sets that reach the hundreds of terabytes to petabytes level to the same degree as smaller data sets? And since most Big Data lacks traditional structure, just what does data quality look like?

Read the entire post: http://wikibon.org/blog/big-data-adds-complexity-nuance-to-the-data-quality-equation/

RELATED ARTICLESMORE FROM AUTHOR

ATN: WebHosting Talk thread discusses simultaneous shutdown of 20+ web hosting providers

ATN: Keyfactor survey finds Canadian cybersecurity legislation lacking

GCDCS19 event guide – and CDCXA100 data centre source guide – now available on InsightaaS

ATN: Reviewing Required Reading (1) – Management Consultants and HBR

ATN: From Monolithic to Microservices

RELATED ARTICLES MORE FROM AUTHOR