Exploding Data Volumes. Real-Time Demand. System Overload.

What to do about it.

Most organizations have a need to consolidate data from multiple systems to get the information required to be successful. Over the last few decades, organizations built Operational Data Stores(ODSs) and Data Warehouses in order to accommodate the requirements for consolidated data and high-performance data analysis. The ODS and the Data Warehouse are traditionally populated using batch Extract, Transform and Load (ETL, or ELT (Extract, Load, Transform)) jobs. The jobs run at an interval e.g. nightly or sometimes even less frequently e.g. weekly, monthly or just at the end of the quarter.

The traditional Data Warehouse and ETL/ELT approach has served organizations well, but is under stress for multiple reasons:

More data: organizations grow, either organically or through acquisitions, and existing systems collect more and more data, all of which has to be processed during the batch window.
Shrinking batch windows: as organizations increase services, expand beyond the region and provide on-line, self-service access to its customers there is no longer a long natural batch window “outside of office hours” when it is safe to assume that no users will be impacted by the batch loads. Users will of course primarily notice a slow down in system performance when the heavy batch extracts run. However, a secondary challenge of the shrinking batch window is that the ETL/ELT job may assume that no changes are made to the source system during the extract to maintain transactional consistency which is no longer the case.
Real-time requirements: organizations realize that in order to be competitive analysis should be done sooner than (at best) a day after the data originated.

Add to these reasons sensor-generated (IoT – Internet of Things) data that not only causes data volumes to explode, but also increases the urgency to analyze data closer to real-time. Imagine a power generation turbine’s sensors indicate an anomaly in its operation that – when analyzed – indicates a failure is imminent. Ensuring a backup energy generator is ready to take over and shutting down the turbine for maintenance to prevent extensive damage is obviously better than activating emergency backup routines when a turbine unexpectedly fails.

Continuous Data Integration is the strategy of continuously integrating data to support more frequent analysis.

No single technology addresses the full spectrum of challenges related to continuous data integration. There is still a need to ensure Data Quality and Master Data Management in addition to the ability to capture data changes as they happen and, depending on the requirement, the desire to set alerts if certain conditions occur.

(Recently we discussed the topic of continuous data integration in a webinar with 451 Research Director Matt Aslett, without a doubt one of the leading analysts on the topic of data integration and analytics. To learn more about the topic of Continuous Data Integration view the recording.)

The HVR technology is a strong enabler of continuous data integration. Through our support of log-based Change Data Capture (CDC) on a multitude of database technologies we eliminate the need for a batch window whilst keeping the system impact very low, and changes are captured as soon as the transaction is committed in the database’s log. HVR also keeps track of the transaction boundaries to always present a transactionally consistent representation of the source system in the destination technology.

It is also worth noting that HVR features built-in optimizations for the various destination technologies to apply changes highly efficiently. These optimizations ensure that, despite a destination system’s focus on analytical over transactional processing, the environment can continuously integrate changes coming from not just one source (that may process far more transactions than the destination could process), but many sources consolidating data from transactional systems.

If you would like to discuss your data integration challenge with us, we invite you to a consultation with us. We have industry experts on staff who are happy to speak with you and discuss your integration challenges.

About Mark

Mark Van de Wiel is the CTO for HVR. He has a strong background in data replication as well as real-time Business Intelligence and analytics.

© 2017 HVR Software

Free Trial Contact Us