A deep dive into predictive analytics

Data is the new gold, but how do we mine it for actionable insight into business potential? Increasingly, organizations are turning to analytics, and to a specific subset of that — predictive analytics — in order to capture and create previously-unimagined sources of value. Analysis of business information is not new. Rooted in decision support systems that emerged in the 1960s, Executive Information Systems, OLAP and data warehousing began to take shape in the late 1980s, morphing into business intelligence solutions by the late 90s. Today, powerful BI systems serve as the foundation for the analysis and sharing of information in most organizations. But predictive analytics represent a horse of a different colour. According to Peter O’Grady, Information Builders director of global product marketing, while business intelligence is a technology that typically enables review of historical data on internal processes, predictive analytics look forward, rather than back. And because the better predictive analytics solutions combine new and existing information sources (from inside and outside the organization) in a unique way, they enable users to ask more, and different questions of the data.

If the promise of predictive analytics is gaining recognition, the mechanics of the technology’s use is less well understood. In a webinar on Predictive Analytics for Better Decision Making, Bruce Kolodziej, analytics manager with Information Builders (IB), has attempted to change that with a presentation outlining key features of the IB platform for building, deploying and testing predictive models and a sampling of applications that can deliver analytics results. Kolodziej’s talk concluded with a software demo showing how historical data is captured, patterns identified, and a scoring model delivered to an application to support better decision-making, which is available here.

As compared to traditional BI, which relies on KPI alerts, drill-down queries and reporting, Kolodziej explained that predictive analytics employs more advanced techniques, including statistical analysis, forecasting and extrapolation, predictive modeling and optimization to understand why is something happening, what will happen if this trend continues, what will happen next and ultimately, what is the best that can happen. "We employ techniques for pattern recognition," he added, "allowing a mathematical model to find patterns in data that we can then exploit on new data to make predictions about particular behaviours, whether it is customer churn, or fraud, or risk or acquisition targets." The IB platform combines both approaches — the ‘what’ view into historical and real time data as well as the forward looking ‘why’ provided by predictive techniques, merging these into a single application to support data-driven end user decisioning.

The IB solution also features capabilities designed to establish a data-driven foundation for action. Automated methods for finding data patterns, for example, complement decision making based on experience, while the ‘forward view’ allows proactive, as opposed to reactive decisioning, using data that organizations typically maintain (customer accounts, transactional, sales history, responses to offers, claims, etc.) to model prospect or customer behaviour.  A key feature is distribution — "taking predictive analytics out of the back end," according to Kolodziej — of analytics results to multiple employees to ensure that predictive insights support action on business goals.

In addition to process improvement, Kolodziej’s slide below the two key objectives to which predictive analytics are productively applied: revenue and profit generation; and reduction of costs, risk and fraud.

IB slide


The WebFOCUS Analytic environment spans three steps that work together to build predictive modeling: data access, predictive analytics and delivery of results. According to Kolodziej, the first stage often consumes the bulk of time and energy in an analytics project as access must be created to disparate sources such as flat files, databases, mainframe and third-party information, and data joined, cleaned and organized into new fields. "The models are only as good as the data," he noted, and so data preparation is critical. Merging, sampling, filtering, aggregating, deriving transforming and improving data quality are important IB platform capabilities in both BI and predictive analytics environments.

The second stage consists of model development, where a predictive model is trained with an algorithm based on patterns discovered in historical data, and then these patterns deployed in an end user application via IB’s WebFOCUS functions. For this process, IB relies on the open source R language, providing users with a GUI that obviates the need for coding and syntax knowledge to speed development. If custom analysis is required, customers can use R to supplement techniques that are available in the Rstat extension module for model development. To assess model performance in advance of production, Rstat also enables testing through evaluation of metrics such as error rates, lift and predictive vs. observed results — while serving as the means to convert information from the R language to a user application.

In Kolodziej’s view, the third step — delivery of results — is potentially the most important stage, as this is when an organization begins to see business value. With WebFOCUS, information in the scoring model can be ‘productionized’ for delivery to business users in a variety of ways — "limited only by your imagination," he added — including dashboards, core reporting, charts, scorecards, geospatial maps, queries, active reports, OLAP, mobile apps or as a downstream feed to a user system, and delivered in different formats depending on business need, such as a simple binary yes/no flag, scores (i.e. 1 — 10) that users can rank, sort or filter, and/or numeric values.

As with virtually all technology projects, a key predictor of success for organizations considering deployment of predictive solutions is collaboration between business and IT, pre-implementation review and alignment of business processes with solution capabilities and access to requisite skills. This last item has been much discusses in industry circles, with the ‘data scientist’ emerging as a rare but critical input to realizing full business value from analytics. According to Kolodziej, different skills and contexts are needed at different stages of predictive analytics projects. Involvement of business managers, who define project objectives, and IT, which has access to and knowledge of the source data, are both essential at the first and third stages, while the data analyst, who is responsible for analyzing the source data and developing the predictive model, provides support and interpretation throughout the project, but is most critical during the second (model development) stage of the process.

Using sophisticated tools like the RStat, and with some training in model development and testing, the data analyst is typically able to produce actionable results: as Kolodziej explained, "In most cases a data scientist is not needed, just someone with knowledge of predictive algorithms and best practices of preparing the data and testing model performance." Another challenge for potential users is how to conceptualize new questions in order to move beyond BI, a function that is often relegated to the data scientist. IB services offer support on this front through "discovery sessions" that identify use cases where predictive analytics can help an organization generate new revenue/profits, reduce risk or fraud costs or allocate resources more effectively, while advising on gritty issues like legal compliance in cases where personal data is used.

Adopting this approach that connects data sources, model development and delivery of results, Kolodziej has seen a "growing number of customers" transition to the delivery of predictive analytics to the daily workflow of non-technical users. To illustrate, Kolodziej offered some colourful case studies. The Charlotte-Mecklenberg Police Department, for example, is using the BI platform to analyze historical crime data, combining this with time, weather and mapping information to build models that predict locations where and when crime is more likely to occur, and to display this information in a real time, geo-location format. The outcome is better allocation of personnel resources, and ultimately, better citizen and officer safety.

Charlotte crime map

With the IB platform, Taylor University is assessing student enrollment risk factors such as GPA, gender, midterm grades, financial aid status and SAT scores to perform student retention modelling that will identify not only who has left the school but also who is likely to leave in order to effect proactive intervention such as registration in mentoring programs. For Taylor, the goal was to maintain enrollment levels — and associated income. Other applications include use of analytics in a manufacturing environment to predict failure rates of manufactured parts in customer environments to queue up repair persons and order parts into the supply chain, a financial services investigation into the likelihood of loan default by analysis of different customer demographics such as age, income level, occupation, and credit score (IB customer Dealer Services has built a similar solution to assess the credit worthiness of car dealerships), and affinity analysis in marketing to develop product recommendations for cross selling. Across these examples, the ability to apply different views of data, to apply predictive models and then integrate information into other applications help customers take quick action based on the information provided, but also  to discover new insights that can be mined within the data trove.