7 major benefits of using log-based CDC for data replication from SAP
What a Global Water Technology Company Achieved with Real-Time Data Replication
The world’s water resources are stretched beyond all previous limits and the toughest industrial water process challenges still lay ahead. A Global Water Technologies and Solutions company (referred to as GWT) is taking on the challenge of reducing costs, meeting environmental regulations, and preparing for an evolving future in part with data from their SAP systems.
Like more than 64,000 companies around the world, SAP ECC is the core Enterprise Resource Planning system for GWT. Financial, accounting, inventory and purchasing components are used by thousands of employees and contractors to run the day-to-day business.
GWT uses SAP data in Oracle databases for forecasting of purchasing, materials, and inventory to assist in improving business decisions. Forecasting can include such things as predicting when chemicals in inventory will expire and need to be disposed and replaced based on government regulations. GWT manages these SAP deployments on premises by internal IT teams. And it’s not cheap. Luckily these systems contain a goldmine of data that GWT can use through analytics and reporting to improve processes, decrease costs, and ultimately provide the world with better, cleaner water.
Drain on resources
GWT’s SAP data didn’t always flow into the analytics systems as freely as it does today. In the beginning, analytics drained resources away from the source SAP system. This impacted daily operations and the source was designed for many small parallel transactions, not large batch analytics, so the analytic results slowly dripped out. This clearly didn’t scale, so ETL processes (Business Objects Data Services – BODS) were used to periodically suck up large volumes of data and transport it to another Oracle database reporting instance. This worked well for a while, certainly better than trying to always buy a bigger box. But challenges started to surface.
- Bulk extracts put a massive load on the source SAP transactional database
- Data needed to be fresher; ETL latency was too long and only worked for historical reporting, not real-time analytics
- Detecting deletes was labor intensive and inconsistent resulting in some bad data
- Analytics on Oracle was slow because tables contained a very large number of rows and columns
A State-of-the-Art Approach: Three Phases
Phase One: Moving to State-of-the-Art
After a detailed and extensive vetting and testing process GWT decided on two new state-of-the-art technologies to better handle increasing data volumes on the target system. With its ability to handle massive data queries, Vectorwise was the chosen database for analytics and with its ability to handle massive data volumes while having little to no impact on the source system, HVR was chosen for real-time data replication.
But problems remained:
- BODS bulk ETL extracts still had to be used to extract data from critical SAP cluster and pool tables because HVR did not yet support these complicated, compacted SAP tables. This continued to cause a heavy load on the source when extracting these tables.
- There were Data quality issues because BODS bulk ETL extracts missed data and resulted in incorrect assumptions about past restated data.
Phase Two: Here Comes the Cloud
While the analytics platform performed much better that the previous incarnation it still required resources to manage and scale. A hosted cloud service was eventually chosen to replace it that could also handle massive data queries: Amazon (AWS) Redshift. With HVR’s commitment to support the latest in cloud technologies was able to help GWT expedite the testing of several cloud options and was again chosen to move ever-increasing volumes of data in real-time for the following reasons:
- Flexibility with analytics – Support for heterogeneous platforms enables multi-terabyte data centers to span on-premises and the cloud
- Near zero overhead load on the source SAP transactional database or all tables. Cluster and pool tables support was also added.
- Data trust – Unique in the industry, HVR has built-in data validation and repair capabilities
- Security – Industry best practices to encrypt data in transit
However, there was still that one lingering issue clogging up the data flow: the BODS ETS process still caused a heavy load on the source and would often miss data putting some of the analytics into question.
Phase Three: What it flows like today
Several years ago when HVR first started support SAP clustered and pool tables GWT was among the first to adopt this unique capability. This greatly simplified the data flow and eliminated the BODS ETL overhead on the source freeing up source SAP resources to be allocated to users instead of data extraction. And with HVR’s continued innovation in SAP data movement using SAP HANA as both a data target and source are possibilities being explored as are other new cutting edge technologies that will give GWT an advantage in a world with growing water needs.
The 7 benefits that GWT gained from picking HVR early and choosing to evolve with HVR over the years include:
- Near zero overhead load on the source SAP transactional database for all tables. Load once, stream changes. Cluster and pool tables supported.
- Lower latency between source and target – Change data queued off box and updated on target once per hour.
- Improved data quality – Log-based CDC guaranteed zero change data loss – including deletes and transient updates.
- Flexibility with analytics – Support for heterogeneous platforms enables the high volume data centers to span on-premises and in the cloud.
- Data trust – Built-in data validation and repair capabilities.
- Security – Industry best practices to encrypt data in transit.
- HVR able to evolve with best of breed analytics.
How can you use HVR to better leverage your SAP Data?
- Connectivity to SAP dictionaries to explore table definitions, including full support for any custom ZZ columns.
- Target table creation based on the SAP application table definition, and initial data load for transparent, cluster and pool tables.
- Log-based CDC from transparent, cluster and pool tables.
- Data compare for transparent, cluster and pool tables.
- Log-based CDC support for SAP HANA.
Interested in learning more about the benefits of log-based CDC for moving data out of SAP? Contact us for more information, we are happy to discuss your use case and show you a demo.
Joseph has nearly twenty years experience in the data integration and high availability market delighting thousands of customers by producing and delivering innovative real-time data integration products and solutions .