For many organizations, Big Data is “a riddle wrapped in a mystery inside an enigma,” but as the celebrated British wartime statesman advised, “difficulties mastered are opportunities won.” Potential opportunity in Big Data has been defined, many times, as the ability to gain new business insight from analysis of new and existing sources of data – in large volume. But how is this achieved, what do businesses need to understand about the relative advantages of Big Data vs. traditional data architectures, and how does the former lend itself to analytics that can unlock incremental business value? These questions and more were addressed at a Toronto lunchtime session hosted by partners in Big Data crime, Cloudera and Information Builders (IB).
Cloudera burst on the IT scene back in 2009 with a commercial distribution of Hadoop software, which took an open source version of the file system and MapReduce technology developed by Google to run rapid searches on vast data stores and adapted it to enterprise customer needs. While Hadoop remains a free, Java-based programming framework for distributed storage and processing across clusters of commodity servers, Cloudera has bundled core elements of Hadoop that provide scalable processing of large data sets with security, high availability and integration capabilities, and added support and consulting services to great effect. The Silicon Valley startup has garnered considerable investment interest – including a $750 million cash infusion in 2014 by Intel, which now owns 18 percent of the company – and significant market traction. According Brian Rowatt, Cloudera sales director for Canada, with 65 percent of the market, Cloudera is the largest open source provider of Hadoop, and is installed in over 90 percent of financial institutions. In other words, Cloudera is well positioned to speak to issues and emerging opportunity in Big Data.
Cloudera’s Rowatt kicked off the session by asking why Big Data is putting strain on traditional data architectures. Some of his answers will be familiar to IT managers and the businesses they serve: while the volume of structured data is growing linearly, unstructured data is growing exponentially, driven by increasing regulatory compliance requirements, increasing implementation of the “concept of instrumentation” (with its pesky sensor data that does not lend itself to relational database environments), and a growing business trend towards “experimentation,” or the search for new patterns in data that can produce new business outcomes. But Rowatt’s key message for traditional enterprises may be less so: “Big Data is the data you have [13 copies are typically kept by banks],” he observed, to underscore the fact that data challenges are here and now and felt by not just by the Internet giants. Straining under the weight of data volume, enterprises face additional issues with storage that can test traditional data management practices, he added. And while data privacy and compliance are important system-wide requirements that must be addressed, organizations are hard pressed to decide what data to save. According to Rowatt, on average the typical enterprise saves only 12 percent of unstructured data that has been collected, a practice that of necessity limits the kind of insights that may be gleaned from this information.
So how do new architectures help? Essentially, by turning the traditional data management paradigm in which applications are federated out to structured data bases on its head. While it’s possible to push out only one terabyte of data before an application crashes using traditional approaches, Rowatt argued that with the Hadoop approach, it’s possible to provide unlimited storage of both structured and unstructured data in perpetuity (obviating the need for archiving), and “bring compute to the data” by running logic on storage clusters. An additional advantage of the Cloudera approach is the ability to apply security, including perimeter, data access, etc., in the cluster, native with the application, a capability that Rowatt claims is unique in the marketplace.
To demonstrate the impact of Hadoop-based storage technology, Rowatt introduced a number of Cloudera case studies designed to show “the steak” (new operational efficiencies) and its “sizzle” (data monetization). Allstate, for example, has been able to process 80 years’ worth of data (500 times what was previously possible) 75 times more quickly, running on commodity hardware; FINRA, which evaluates 30 billion market events per day has managed to save $10-20 million on an annual basis; and eBay, which implemented Cloudera to optimize search estimates a 2 percent increase to top line revenues.
This enhanced storage and processing capability aligns well with Information Builders’ shared focus on the “steak.” As director of channels for Information Builders Don Grady, noted at the event, “Cloudera is a great partner because of its ability to manage the volume and velocity of real time, streaming data,” which is becoming ubiquitous with new applications such as connected car. Information Builders is in the business of data analysis: with over 40 years of experience helping customers manage the “three ‘I’s” – data Integrity, Integration and Intelligence – Information Builders is also well qualified to speak to Big Data challenges, in addition to a fourth ‘I’ – data Impact. In Grady’s view, business “impact” is achieved when insights generated through analysis of massive amounts of data are delivered to hundreds of users within the organization who can turn this information into action.
To explain how the IB analytics platform can help organizations achieve business impact, a second Grady – son Dan, who acts as social media analytics and enterprise search sales manager for Information Builders – offered the packed lunchtime crowd a walk through IB capabilities designed to deliver actionable Big Data information to multiple users. As the excerpted video of Grady’s presentation below shows, IB divides analytics solutions into “tools” that are powerful but limited in their use to specific user groups, such as the analyst or the ‘data scientist’, who have specialized knowledge but are rare, and the app which can be embedded into operational processes to present data to many more users. In Grady’s view, impact, a more rapid return on analytics investment or even incremental revenue streams are achieved when data is delivered within operational applications to internal workers, partners and even customers, as in the case of IB and Cloudera customer, YellowPages. And the elements of operationalizing Big Data are? IB’s cornerstone “three ‘I’s” – the delivery of business intelligence/predictive analytics insights developed through integration of the “Cloudera hub” and/or other enterprise data sets to operational applications built on data integrity.