“IT as an industry has declared war on data modeling for all the wrong reasons,” said Dr. Rado Kotorov, chief innovation officer and VP global product marketing at Information Builders. “The self-service business intelligent movement declares: ‘Let’s not model data. Let’s just give access to people who know what do with data for one-off analysis.’ The problem is, you can rarely propagate those results back to operations.”
Kotorov believes this data modeling dismissiveness came into being because fewer IT professionals know how to do data modeling. “If you look at the computer science department, users are not taught this discipline, so there is no systematic treatment of data outside of a data warehouse.”
New business intelligence tools like QlikView or Tableau have zero logical modelling capabilities, a fact that is creating market perception that modeling isn’t needed, he argued. These self-service business intelligence tools hold appeal because they allow analysts access to data without the the support of IT. “But once you start combining different data sets, and haven’t modeled them properly, there are many possibilities for errors. Without logical data modeling, analysis is neither repeatable nor usable by other people. In many cases, the analysis is completely wrong.”
Because the self-service approach has been designed to serve more immediate needs, it does not offer the benefits of repeatability, he added. “Modeling opens the gates to more advanced analytics – if you do it right.”
One particular challenge on the modeling front is that IT is typically conversant with transactional data. However, it has always been difficult to apply data modeling on top of that data. Not only is querying slow, the links between data can be problematic, Kotorov explained. “Data has to be normalized to be efficient. “
There are capable modeling tools that can be run against transactional databases to make analytics both usable and shareable. But these are rare, he added: “There are few tools that can do logical modeling against transactional data within organizations. In fact, there are only two vendors that have a metadata modeling approach to address this challenge.”
But that’s not the only option. Manon Tremblay, IT consultant with DIR Systemes has built her career working on the data management side of the Quebec pulp and paper industry and now consults for industrial customers in the arcane discipline of data modeling and analysis.
For Tremblay, the key to optimizing business intelligence and advanced analytics capabilities lies in re-engineering what’s available to develop a data warehouse that makes sense for business intelligence applications. More often than not, the databases she starts with are transactional. “BI requires something completely different,” she said. “So when I talk to customers, I always ask them – do you want an analysis or a report? If you want good performance, you always need to model the data. That’s the first place you need to begin.”
The process she uses is dimensional data modeling, but Tremblay describes it in simpler terms – as reverse engineering. “You have to look at the database, extract the data to understand its structure. Only then can you have a global view of the links between data. Once you have that, you can use data modeling tools to reverse engineer the databases to meet data warehousing needs for BI.”
In her projects, she uses two tools from Embarcadero: ER/Studio Data Architect, a data modeling and data architecture solution; and DBArtisan which allows her to connect to almost any native database and “look inside” to troubleshoot performance problems. ER/Studio supports dimensional models by allowing users to map data lineage and transform transactional data into a dimensional model.
Ron Huizenga, senior product manager ER/Studio for Embarcadero Technologies explains the distinction between transactional and dimensional modeling using order entry as an example: “With a transactional database, you’re working with order details and their relationships to master data such as products and customers. In this modeling, normalization is used to minimize redundancy as much as possible.”
In other words, a dimensional model brings together information from different areas so users can work with both facts (transactional information) and dimensions such as customer, store and product line categorization. “With business intelligence solutions, you are typically sourcing information from a number of different sources,” he added, “including ERP and departmental systems, which leads to complex architectures where data is extracted from the source systems, staged and then loaded into the data warehouse.”
Huizenga says an organization’s use of data modeling depends very much on the maturity of the organization. While many large enterprises with complex infrastructures are well entrenched and familiar with the process, “There are many who are not mature in terms of data modeling and may not know how to create dimensional models, nor the architecture required to take information from all different systems and stage them for loading into a data warehouse.”
For those unfamiliar with dimensional modeling, he outlined the basic steps. “The first is defining the dimensional model and what it looks like, which involves understanding and mapping the data in source systems as well as the data warehouse. For example, the tables and columns in different databases may have cryptic names which need to be deciphered. Data for some concepts may be fragmented or duplicated across several systems. Creation of a logical model is a valuable first step towards building this understanding.
“Universal mapping is a capability in our tool that allows these different instances to be identified and linked across models,” he said. “Using models to map the data landscape is necessary to identify the correct data sources.”
Once that analysis is complete and the model defined, the next stage involves mapping out the transformations needed to take data from source systems to the staging area and subsequently to load this to the data warehouse.
For user organizations, the outcome of all this is confidence that they are sourcing the correct information for advanced analytics, he added. “That’s why data modeling is so important. If you don’t know what your data is, how do you know you are moving the right information to your data warehouse? Modeling is the way to figure out what source systems are relevant, what information is in them, and how to utilize it. It is imperative to map out what your data landscape looks like. Otherwise you could be making decisions based on incorrect or incomplete data. The implications of that could be minor…or disastrous.”
Whatever some IT professionals may think, anyone entrenched in data analytics is a strong proponent of data modeling, Kotorov said. “There’s no way around it. Anyone seriously approaching analytics knows the value of modeling.”