‘Governance’ is a term that is heard frequently in business and technology circles. It can be a confusing word, as it has several different but related meanings. Generally, governance is used to describe objectives and/or activities that connect to “control” or “process” (or “the control of processes”). However, some objectives are components in achievement of others, and the various governance processes are often overlapping. In most organizations, there is a need to understand the hierarchy of governance in order to ensure that its components are correctly aligned.
The figure below, drawn from discussions in the Governance, Risk and Compliance working group of the Toronto Cloud Business Coalition, illustrates these interconnected requirements around governance. At a top level, governance is a corporate requirement – the responsibility of the board of directors, and managed by senior executives from across the organization. At an operational level, each major function within the enterprise has a specific set of issues and responsibilities that fall from the corporate governance framework. IT needs to address policy, technology and risk issues that are specific to its role in supporting corporate governance. And at the core of the IT responsibilities is the need to provide data governance – to ensure that corporate data is appropriately managed, and that it is clean and consistent enough to support business activities across the organization.
As the figure suggests, cloud computing, user managed applications (often SaaS applications such as Salesforce), and user-managed storage (for example, via Dropbox) can complicate data governance, weakening the ability to secure and audit data, and creating ‘multiple versions of the truth’ – disconnected and inconsistent records purporting to represent a single data set – within an enterprise. How can management reconcile the need to take advantage of advanced cloud and intelligence applications with need to address core governance hierarchy requirements?
IB tackles data governance
Don Bowers, senior director of iWay Integrity products at Information Builders, provided guidance on using advanced tools to delivery data governance at a downtown presentation earlier this fall. Bowers began his session by working with audience members to identify some of the real-world concerns faced by data management professionals: dealing with missing data, ensuring information integrity through ongoing data cleansing and data mastering initiatives (especially with respect to data sourced from external organizations), de-duplication, support for end-to-end compliance activities, addressing PCI requirements, and the struggles associated with establishing a “single version of the truth.”
Bowers addressed a number of questions by highlighting iWay’s position as a platform capable of delivering support for data stewardship and data quality – functions that are either not addressed or handled via discrete point products in many environments. Session attendees expressed strong interest in finding a means of connecting data management with data analysis, to provide a basis for delivery of insights rooted in complete, reliable data supporting the core of data-dependent corporate activities – governance, and also business objectives ranging from fraud detection to decision support.
According to Bowers, “the starting point is to get a good assessment of the state of the data today….is it correct? Is it complete? Is it valid? Is it populated?” – that is, do all important fields contain necessary data? This is, in Bowers’s view, a key stage in a cycle rather than a set of discrete questions. Data quality “is not a one-off process…there really is a need to monitor the state of the data – not only the source systems, but also the target.”
The next step in the cycle is to analyze outputs from the target systems to identify key data elements and data problems. For example, does a system using age as a decision criteria have birth dates that are in the future? This type of problem appears as an analytical issue in its application to business decisions, and is also an indicator of data quality problems that can (and should) be addressed programmatically with tools like IBI’s Data Quality Suite, an asset management platform tuned to the specific requirements associated with corporate information, regardless of whether it is real-time or batch input, and whether it is sourced from a message queue, internal databases and/or data feeds from external sources.
Bowers turned his attention next to Master Data Management (MDM), which is the key to “getting a single version of the truth.” In many cases, a business (or health care institution, government, etc.) will have multiple records for a single individual (or company, or other important entity), with different addresses, spelling variants or other inconsistencies. MDM consolidates these records to ensure that an organization, such as a bank or hospital or public safety agency, has a single and complete view of an individual or entity. But MDM relies on source data – and as Bowers said, “you can’t have MDM if you don’t have data cleansing…if you can’t standardize” the key identifying data factors, and they can’t be matched across multiple sources or instances.
In the real world, as another Information Builders staffer pointed out, enterprises generally employ a blend of different data management strategies. Many firms use centralized repositories, but while “the easiest thing is to take data and dump it somewhere,” these repositories can become very large if not carefully managed. Other approaches include coexistence, where data exists in a master and source systems, and changes in the master need to be propagated to the source systems, and registry approaches, which rely on links to other sources. Analytical MDM strategies apply intelligence to management across these different models to provide the single best view of data needed to support business objectives.
Bowers continued his examination of MDM by discussing data remediation – manual changes made to exceptions after programmatic data cleansing. There are, he stated, “a lot of one-off tools” positioned as solutions to this problem, but these lack characteristics needed by an enterprise: an integrated view of data assets, support for consistent data governance policies, the ability to verify accuracy, flags that highlight situations in which manual or programmatic data cleansing has been applied to a record – and, most importantly, support for a data quality cycle rather than a single point in time. Variances from a “golden record” may indicate a data quality problem, but they may also indicate a legitimate change – for example, an address that is different from what is contained on a master record may simply be wrong, or it may reflect a recent move by an individual or a new business location. “Traceability” – tracking changes and causes for changes – is important to data quality and data remediation.
Security is also an essential aspect of MDM. In many business contexts (and especially, within regulated industries), there is a requirement to protect against release of sensitive data. It’s important to note that this requirement often applies to specific fields within a record – for example, a patient’s address is less sensitive than details of her medical record, a customer’s email address is not as critical as credit card information. Different individuals or providers may have access to some of this information (for example, for marketing campaigns), but this access can’t extend to private data.
Tactics addressing data governance issues
If data quality and MDM describe important corporate objectives, what are the steps needed to support these outcomes? Bowers dedicated much of his presentation to examining ways of delivering real-world answers to this question.
Data cleansing
Bowers started his data governance ‘deep dive’ with a description of data cleansing, offering examples of three different methods of improving data quality. The highest-level approach involves general cases. Here, the objective isn’t to alter a single record, but rather, to create a ticket that will require a technical staff member or data steward to look at the matching rules that have flagged a potential match. This approach can be used to help address issues with the overall system, fine-tuning the cases that are flagged for individual review.
What happens if a record needs to be changed? Bowers described two approaches that respond to different types of requirements. Manual cleansing is applied when data can’t be automatically corrected. For example, a record containing a social insurance number that has fewer than nine digits or a date of birth that is in the future may be flagged for manual treatment, but the correct value can’t be extrapolated from the record or addressed via data handling rules. In this case, Information Builders supports manual cleansing of a record in either the master or source repository, allowing corrections to be applied at the level that best meets business requirements. The need for cleansing is triggered by analytical reports that highlight data problems, and the system tracks remediation activities so that staff dealing with future issues have visibility into earlier manual corrections.
Another data cleansing approach involves matching override. It is used in cases where two records appear to refer to a single entity (such as a person) but have different information on characteristics. This approach uses a scoring approach (based on a unique key or a unique key hierarchy) that examines each record for completeness and matching. Records that have a high likelihood of referring to the same individual are flagged for a data steward, who can link or unlink the records as needed to correct “false negatives” (separate records that refer to the same individual) and “false positives” (linked records that refer to different individuals). In an automated platform, creating a link to correct a false negative creates an entry on a “white list” that acts as a repository of records that appear separately in different operational systems, but should be linked to identify a single individual; similarly, “false positive” corrections appear on a “black list” that maintains corporate memory of seemingly-identical records which should not be combined under a single individual’s profile.
Governance in practice
As part of his presentation, Bowers demoed the IB data quality suite, starting with Integrity Management Central, which acts as a home page for the data governance team. It provides a view of current activity, including (in the demo) a graph showing data steward workloads and a list of cases with issues.
Stepping through the workflow associated with the platform, Bowers showed how a supervisor approves changes from data stewards, working backwards to show how a ticket flagging the record and identifying both the issue and its severity was created by data cleansing rules (for example, an address field that is clearly wrong) or by matching rules. The case is created at an attribute level, so that individual attributes can be treated separately – which may be important in situations where some data (such as health records) can’t be exposed to all data quality staff members.
Each record contains original data and comments that have been entered by the data stewards, providing context for changes and history that traces activity associated with the record. If something in that history prompts a need for further review, the supervisor can assign the record to another data steward; the system itself keeps track of which data stewards have access to specific data domains (for example, address vs. credit card). Changes are scored for probable accuracy – for example, by checking spelling of a city against legitimate values; in some cases, the system can be set up to provide proposals (for example, valid provinces or states containing the specified city).
One interesting feature in Integrity Management Central is the 360 Viewer, which provides a view through to the master record. 360 Viewer includes capabilities that are important to the compliance process – for example, it will show which data stewards have accessed which data fields (e.g., health records) to support regulatory audits. It also allows for comparisons between master and instance records, which may be important when new data – from an external supplier, marketing campaign, or other source – is introduced to an existing repository. As is the case throughout the iWay platform, changes made to these new records are tracked to support future analysis, and tied back to the master record.
Concluding observations
As we move ever deeper into the world of Big Data, there is potential for the enterprise user to become overwhelmed by the volume of records and information sources that need to be aggregated for analysis. At the same time, errors in individual records can result in critical errors in the business processes that rely on that data; a false positive or false negative may result in a healthcare professional, a public safety official, a fraud adjudicator or a civil servant taking actions that are appropriate for a class of individuals but that do not apply to a specific case, compromising treatment, safety, credit ratings, approvals, etc. These kinds of errors result in rework (and potentially, liability) at the business process level. It is far better, and more cost effective, to take the steps needed to ensure that the underlying data in the system is as correct and consistent as possible at the outset. As analytics become an integral component of enterprise process management, data governance becomes an integral part of corporate governance. Platforms that connect clean data to business analytics create the connections that tie seemingly-limitless data resources to an enterprise’s specific business objectives.