7 Major Benefits of Using Log-Based CDC for Data Replication from SAP
What a Global Water Technology Company Achieved with Real-Time Data Replication
The world’s water resources are stretched beyond all previous limits, and the toughest industrial water process challenges still lay ahead. A Global Water Technologies and Solutions company (referred to as GWT) is taking on the challenge of reducing costs, meeting environmental regulations, and preparing for an evolving future in part with data from their SAP systems.
Drain on Resources
GWT’s SAP data didn’t always flow into the analytics systems as freely as it does today. In the beginning, analytics drained resources away from the source SAP system. This impacted daily operations and the source was designed for many small parallel transactions, not large batch analytics, so the analytic results slowly dripped out. This clearly didn’t scale, so ETL processes (Business Objects Data Services – BODS) were used to periodically extract large volumes of data and transport it to another Oracle database reporting instance. This worked well for a while, certainly better than trying to always buy a bigger box. But challenges started to surface.
1. Bulk extracts put a massive load on the source SAP transactional database
2. Data needed to be fresher; ETL latency was too long and only worked for historical reporting, not real-time analytics
3. Detecting deletes was labor-intensive and inconsistent resulting in some bad data
4. Analytics on Oracle was slow because tables contained a very large number of rows and columns
A State-of-the-Art Approach: Three Phases
Phase One: Moving to State-of-the-Art
After a detailed and extensive vetting and testing process GWT decided on two new state-of-the-art technologies to better handle increasing data volumes on the target system. With its ability to handle massive data queries, Actian Vector (formerly known as VectorWise) was the chosen database for analytics. With its ability to handle massive data volumes while having little to no impact on the source system, HVR was chosen for real-time data replication.
But problems remained:
– BODS bulk ETL extracts still had to be used to extract data from critical SAP cluster and pool tables because HVR did not yet support these complicated, compacted SAP tables. This continued to cause a heavy load on the source when extracting these tables.
– BODS bulk ETL missed data, resulting in data quality issues.
Phase Two: Here Comes the Cloud
While the analytics platform performed much better than the previous incarnation it still required resources to manage and scale. A hosted cloud service, Amazon (AWS) Redshift, was chosen as a replacement because it could handle massive data queries.
Since HVR supports the latest in cloud technologies it was able to help GWT expedite the testing of several cloud options. In addition to its ability to move ever-increasing volumes of data in real-time, HVR was selected for the following reasons:
Flexibility with analytics – Support for heterogeneous platforms enables multi-terabyte data centers to span on-premises and the cloud.
Near-zero overhead load on the source SAP transactional database or all tables. Cluster and pool tables support was added. Decoding of the cluster and pool table data is performed downstream in the data replication flow, away from the SAP system.
Data trust – Unique in the industry, HVR has built-in data validation and repair capabilities.
Security – Industry best practices to encrypt data in transit and the use of a proxy to avoid the need to open the firewall directly into the on-premises SAP database system.
However, there was still that one lingering issue clogging up the data flow: the BODS ETS process still caused a heavy load on the source and would often miss data putting some of the analytics into question.
Phase Three: What it Flows Like Today
Several years ago when HVR first started support SAP clustered and pool tables GWT was among the first to adopt this unique capability. This greatly simplified the data flow and eliminated the BODS ETL overhead on the source freeing up source SAP resources to be allocated to users instead of data extraction.
And with HVR’s continued innovation in SAP data movement using SAP HANA as both a data target and source are possibilities being explored as are other new cutting edge technologies that will give GWT an advantage in a world with growing water needs.
The 7 benefits that GWT gained from picking HVR early and choosing to evolve with HVR over the years:
1. Near-zero overhead load on the source SAP transactional database for all tables. Load once, stream changes. Cluster and pool tables supported.
2. Lower latency between source and target – Change data queued off box and updated on target once per hour.
3. Improved data quality – Log-based CDC guaranteed zero change data loss – including deletes and transient updates.
4. Flexibility with analytics – Support for heterogeneous platforms enables the high volume data centers to span on-premises and in the cloud.
5. Data trust – Built-in data validation and repair capabilities.
6. Security – Industry best practices to encrypt data in transit.
7. HVR is able to evolve with best of breed analytics.
How can you use HVR to better leverage your SAP Data?
1. Connectivity to SAP dictionaries to explore table definitions, including full support for any custom ZZ columns.
2. Target table creation based on the SAP application table definition, and initial data load for transparent, cluster and pool tables.
3. Log-based CDC from transparent, cluster and pool tables.
4. Data compare for transparent, cluster and pool tables.
5. Log-based CDC support for SAP HANA.
Interested in learning more about the benefits of log-based CDC for moving data out of SAP? Contact us for more information, we are happy to discuss your use case and show you a demo.