Many of the organizations I consult with are looking for dashboards to help them gain better visibility into their data. Most of the time, their focus is on the analytics portion of the undertaking and not on the data. But once requirements are gathered, they quickly realize that where data is stored, how it is loaded and managed, what transformations are required and where they will happen are important aspects to ensuring a successful project. The realization that data integration can make or break an analytics project relates to any technology project that has to deal with the movement of data. Some industry research estimates state that data integration activities represent 80 percent of total project effort. This means that failing to identify and address data integration considerations can cause a lot of headaches.
Since data integration commands such a big role in any analytics or data management project, organizations need to ensure they understand how to develop the right framework and infrastructure to allow them to scale the environment as needed. Within analytics, this means making sure that the number of data sources and the volume of data needed is identified, that latency requirements can be supported, etc. Other data related initiatives are the same: system and data migration projects require similar considerations, as do Big Data projects. Not factoring in the importance of scalability can lead to project failure or to the need to re-evaluate the infrastructure that has been developed. This is because not addressing scale as time goes on will lead to performance roadblocks, and to a situation where the needs of the business aren’t being supported on a broader level.
Irrespective of data project type, evaluating scalability involves review of the following:
- Storage — Data can grow exponentially as organizations capture historical data and then add external and diverse data sources to their databases. When selecting an infrastructure to house data, understanding what storage requirements are up front becomes important — not only what will be stored now, but also in the future. If the database selected cannot support future expansions, additional storage will be required, This means that infrastructure needs to be able to handle expansion either natively or through the ability to add storage to the current infrastructure.
- Processing and delivery— Processing speeds require that data be delivered in a timely fashion. This extends beyond latency to address any issue associated with query performance. Business use cases can be very diverse and differing business requirements will lead to different expectations surrounding data processing and delivery. Because data loading might not be done in an operational fashion, when infrastructures are being designed, delivery of data in addition to transformations and loading need to be taken into account.
- Analytics capabilities — Though not all data integration projects will be associated with analytics, many data integration activities do require some level of metrics management. Organizations need to decide where data transformations will occur and where within the data integration process algorithms will be applied.
- Future projects — Organizations may be hard pressed to identify what their future projects will entail. However, it is important to make sure that the current initiative can support additional data sources, fields, and added complexities over time because data needs will change as data management within the company matures. Sometimes organizations evaluate solutions based on their current project requirements without taking into account changes that are planned in the near future.
It is easy to see that data integration considerations involve looking beyond the simple movement of data to the infrastructure designed to support it. Organizations require a more holistic approach to data integration to ensure that platforms designed to manage data movement and storage can also support business needs, whether these be data migration, analytics, or Big Data, etc. From a business perspective, building a solution that can scale helps the organization save time and money as an infrastructure is developed that will ensure efficient information delivery over time. Challenges that may exist when scale is not considered include:
- Redesigning data frameworks to take into account workarounds for data that cannot be supported within a system.
- Latency issues that include the inability to deliver data on time (inability to meet SLAs), longer loading times, and query processing limitations.
- Lack of flexible data delivery based on the inability of the platform to support slowly changing dimensions (in the case of analytics) or new data requirements.
- The need for additional storage in advance of planned upgrades, leading to increased expenditures and additional time required to load data.
- Support for a data infrastructure that doesn’t meet business needs, leading to dissatisfaction among business units and IT frustration.
- The requirement to evaluate complementary technologies and databases to support additional data storage or more robust processing.
- Added costs and resources required to meet the data need gaps.
When organizations fail to consider scalability, they run the risk of investing time and money in a project that becomes a short-term fix to current data issues, but which cannot meet future competitive market challenges. The roles of data access and data management are becoming more prominent within organizations as business decision makers place more value on the role data plays within daily operations. Making sure that information architectures can support expanding data integration requirements means incorporating scalability needs into any data management initiative. This will ensure the creation of a data infrastructure that supports broad business needs and efficient data delivery over time.