Several TB of Data Changes Per Day? No problem.
How HVR enables high volume data replication from systems such as SAP into Snowflake
Snowflake is a modern data platform built for any cloud. With storage separate from computing resources, it provides sheer unlimited scalability, on-demand, with no need to tune the system. Its flexibility to ingest data of virtually any kind easily, makes it a very popular database for organizations worldwide. And its unique Secure Data Sharing technology revolutionizes how organizations leverage and share data.
Snowflake is one of many target technologies that HVR supports. With the increase in popularity for the platform, why do customers use HVR’s heterogeneous data replication to ingest data into it?
In this post, I share some of the common reasons customers such as Pitney Bowes chose HVR for data replication from SAP HANA® to Snowflake.
Large organizations often use the SAP ERP suite. For years SAP has been running a campaign “The best-run businesses run SAP“, and its blog adopted this phrase as its title. At present, SAP ECC (Enterprise Core Components), released in 2004 but still supported today, is the most commonly deployed SAP version. SAP ECC is supported on a number of different database technologies including Oracle, SQL Server, DB2, Sybase (now owned by SAP), and SAP HANA. SAP’s release after ECC is called S/4HANA and is only supported on the SAP HANA database. SAP is working with its customers to migrate ECC deployments to S/4HANA by the end of 2027 when regular support for ECC ends.
Data Replication from SAP
HVR can replicate data from SAP. Typically HVR discovers table definitions from the source databases’ dictionary, but for SAP, there is integration with the SAP dictionary that is used by the application. As a result, HVR obtains current pool and cluster table definitions, including any custom Z-columns. Similar to transparent tables, the raw data for pool and cluster data is also captured using log-based CDC, with decompressing and decoding of the data taking place on the integration side, away from the source SAP system.
Sooner or later most organizations will move the application suite to S/4HANA, and with that, the underlying database technology becomes HANA. HVR’s unique log-based CDC support for SAP HANA enables customers to continue to replicate SAP data into Snowflake.
Data replication and integration from most common sources to Snowflake using HVR:
Optimal change delivery
Identifying what changed on the source is only part of the problem. To apply change data with optimum performance and efficiency into Snowflake, HVR utilizes either Snowflake’s internal or explicit external data staging. In AWS, the staging area is S3. In Microsoft Azure, it is Blob Storage, and in the Google Cloud Platform, it is Google Cloud Storage (GCS). Data is subsequently bulk loaded through the copy command. The final stage of applying the changes to the target tables is performed using set-based SQL server merge operations, maintaining transactional consistency from the source.
Out-of-the-box, HVR provides three key capabilities to facilitate downstream ELT/ETL:
– “Soft delete” is a transformation to mark a row as deleted when it was physically deleted on the source. The soft-deleted row can now easily be identified on the target ODS as a delete that must be processed by ELT/ETL downstream. Without the soft delete, the alternative would be to use a resource-intensive query to identify what data is currently available in the target but no longer in the source.
– The ability to include metadata from the source like the commit sequence number from the source, or the commit timestamp, based upon which downstream ELT/ETL can determine what data set to process next.
– Support for the so-called agent plugin, which is a call-out to a database stored procedure, or a script or a program at certain moments during the data delivery, such as at the end of the integration cycle, when the target database is transactionally consistent. Customers often use an agent plugin to call their ELT/ETL routines.
End-to-end data replication
Organizations adopting Snowflake need a solution to set up end-to-end data replication between their transaction processing data source(s) and Snowflake.
For Snowflake integration, HVR supports:
– Discovery of table definitions on the source, including the ability to capture DDL changes from several source database technologies.
– Automatic mapping of data types to compatible, loss-less data types, with the ability to create the tables.
– One-time load through efficient copy commands, including the ability to parallelize load across tables, and to split large tables into multiple slices. The one-time load is integrated with CDC and incremental data integration.
– Log-based CDC from several database technologies, and continuous data replication into cloud-based data warehouses.
– Monitoring through HVR Insights, a browser-based environment providing a historical view into metrics aggregated by time, as well as a single topology overview.
– Automatic alerting to enable the system to operate without continuous monitoring.
– A Graphical User Interface (GUI) to configure data replication.
Cloud-based data integration requires strong security. At-rest Snowflake provides this, but you must ensure data security during data transfer and ingestion. HVR provides enterprise-grade security features to avoid data breaches:
– AES256 encryption throughout the entire data replication pipeline, using industry best practices including a wallet and unique encryption certificates.
– Support for OS-level and LDAP authentication, as well as custom plugin-based authentication.
– 2-Step verification support with unique certificates, avoiding a so-called man-in-the-middle attack.
– Pre-defined authorizations with full rights to make modifications, execute-only privileges, and read-only access.
– Auditing/logging of all operations that may affect data replication.
Time for a Test Drive
How can you most efficiently, and with the lowest possible latency, get data into Snowflake?
That is where HVR can help. Supporting customers generating multiple TBs of changes per day on a single system, HVR is well-suited to help large organizations populate Snowflake from operational systems with log-based CDC support on various commonly-used relational database technologies.
Interested in experiencing HVR’s high-volume data replication technology for yourself? Take HVR for a Test Drive.