Context: Paul Lewis is a CTO with a deep interest in Big Data. Now with Hitachi Data Systems after holding IT executive positions at Davis + Henderson and other firms, Lewis has a deep understanding of the importance of data to business strategy, and the relationship between that data and storage strategies, and between storage strategies and IT budgets.
In this video blog, Lewis starts by describing situations that may require attention, but which aren’t Big Data problems — situations that don’t meet the “Five V” definition of Big Data. He illustrates several different scenarios that involve data, but not Big Data: a single application that produces less than 1 Tb of data per year, an environment that has hundreds rather than thousands of transactions per day, a data warehouse that is accessed via a limited number of queries, a data repository based entirely on outputs from company-owned applications. Lewis goes on to say that “Big Data, like cloud, is not an implementation of technology, nor does it have an explicit set of solutions. Big Data defines a class of information management problems that are difficult or impossible to effectively solve using conventional tools and techniques.” In the video, Lewis explains how each of the “five Vs” — volume, velocity, variety, veracity and value — create need for, and are addressed by, Big Data solutions.
Three KeYs:
- The absence of a “Big Data” problem doesn’t imply the absence of storage challenges. In the video, Lewis notes that storage managers constantly struggle with many issues, such as failed backups, managing within and across silos of heterogeneous storage, the difficulty of keeping pace with compliance requests, and simply understanding the types and purpose of stored data. These aren’t “Big Data problems” specifically, though — they are storage problems common to many different environments.
- Big Data problems are seen in environments that are beyond the structured control of the IT department. Lewis uses a series of questions to identify situations that indicate that a firm does not have a Big Data issue — but these same questions indicate cases where Big Data is an appropriate business response. Examples include environments where the IT team can’t keep pace with information management project demands, where existing BI tools are incapable of meeting user needs, where those needs themselves are beyond IT’s ability to categorize and define, where immediate feedback is required, and where at least some of the data sources required to meet user needs are created outside of the organization, and are not well-defined or understood.
- Deployment reference architectures are important. Lewis states that deployment reference architectures “are likely available for your specific industry.” Technical components should include: cloud-delivered infrastructure elements, structured and unstructured data sources, ETL, real time streams, real time structured databases and real time processing, interactive analytics, and batch processing. The combination of these elements — via individual software products, appliances, or converged Big Data workload-specific solutions (such as Hitachi’s Unified Compute Platform).
The bottom line
There is currently “more smoke than fire” around Big Data. Thanks to examples that illustrate the business benefits that analytics based on Big Data can deliver to companies, many business leaders would like to jump headlong into their own Big Data analytics initiatives. However, there is a disconnect within this phrase: “Big Data” and “analytics” are not a single concept. As research from Techaisle has shown, analytics/Business Intelligence is a “Business-led” solution area, one where business sets direction and defines outputs, and IT acts in a supporting role — and as Lewis points out in his video blog, this kind of analysis can in many cases be conducted without access to a wide variety of high-velocity, voluminous data; it can be conducted on a relatively low volume of structured data from internal systems. This approach to analytics has its own challenges, and can yield substantial business benefit. But it isn’t, as Lewis points out, a “Big Data” problem.
Firms looking to base their analytics on the Big Data sources that Lewis describes in the second half of his video have two discrete challenges: they need to not only use analytics tools to extract value from the data (and to ensure that the data itself is trustworthy), they need to aggregate, store and process large volumes of fast-moving data of many types and from many sources. As the Techaisle chart above (based on a survey of 635 Canadian IT and business leaders) illustrates, unlike analytics/BI, Big Data is not an “Business-led” type of solution — it requires collaboration between IT and business, where the business leaders define the operational use for the different data sources, but the IT department is intrinsic to the initiative, building the systems that are capable of responding to these new demands.