Skip to main content

HVR is now a Fivetran enterprise solution. LEARN MORE.

/blog Data Replication: SAP HANA to Snowflake

by MarkVan de Wiel

Several TB of Data Changes Per Day? No problem.

How HVR enables high volume data replication from systems such as SAP into Snowflake


snowflake-logomarkSnowflake is a modern data platform built for any cloud. With storage separate from computing resources, it provides sheer unlimited scalability, on-demand, with no need to tune the system. Its flexibility to ingest data of virtually any kind easily, makes it a very popular database for organizations worldwide. And its unique Secure Data Sharing technology revolutionizes how organizations leverage and share data.

Snowflake is one of many target technologies that HVR supports. With the increase in popularity for the platform, why do customers use HVR’s heterogeneous data replication to ingest data into it?

In this post, I share some of the common reasons customers such as Pitney Bowes chose HVR for data replication from SAP HANA® to Snowflake.

Change Data Capture

Large organizations (still) perform a lot of their primary business processes through traditional transaction processing databases. These databases, critical business operations, have grown large over time, and now process large volumes of changes. Organizations have a need to include data from these systems in their cloud data warehouse or data lake in Snowflake but cannot afford a significant incremental load on the database. Log-based Change Data Capture (CDC), in which changes are identified by parsing the database’s transaction logs, is generally considered the approach with the lowest overhead to individual transactions, but also to the system overall.

HVR performs log-based CDC on a number of different database platforms, with a recommendation to deploy agents for optimum performance and efficiency and to distribute the load. To address many organizations’ preference to avoid any extra load on the mission-critical transaction processing database systems, depending on the database technology, HVR supports alternative approaches to perform log-based CDC including capture on a standby system, or by just parsing the backups of the transaction log on a separate system altogether.


SAP to Snowflake 

Large organizations often use the SAP ERP suite. For years SAP has been running a campaign “The best-run businesses run SAP“, and its blog adopted this phrase as its title. At present, SAP ECC (Enterprise Core Components), released in 2004 but still supported today, is the most commonly deployed SAP version. SAP ECC is supported on a number of different database technologies including Oracle, SQL Server, DB2, Sybase (now owned by SAP), and SAP HANA. SAP’s release after ECC is called S/4HANA and is only supported on the SAP HANA database. SAP is working with its customers to migrate ECC deployments to S/4HANA by the end of 2027 when regular support for ECC ends.

At a high level the SAP ECC suite features three different kinds of tables:

1) Transparent tables, for which an application table maps one-to-one to a database table and a database-level query retrieves the actual data visible to the application.

2) Pool tables, mapping multiple application tables to a single database table. In the database, the actual data for a pool table is stored in a compressed and encoded format, so a SQL query retrieves unusable binary data.

3) Cluster tables, mapping one or more application tables to a single database table. Again, in the database, the actual data for the cluster table is also stored in a compressed and encoded format.

With ECC some of the most important business data is stored in so-called cluster tables. S/4HANA has largely moved away from the use of cluster and pool tables.

Since its inception SAP developed its own programming language ABAP (Advanced Business Application Programming) to interface with its application tables, with BAPIs (Business Application Programming Interfaces) code fragments to make remote function calls. A challenge with these interfaces is that there is no CDC interface, and the BAPI is often limited in the amount of information it provides, with also no access to custom Z-columns. On top of that ABAP and BAPIs run through the SAP Application Servers, typically the heaviest-loaded part of the SAP infrastructure. ABAP and BAPIs are available on both ECC and S/4HANA.


Data Replication from SAP

HVR can replicate data from SAP. Typically HVR discovers table definitions from the source databases’ dictionary, but for SAP, there is integration with the SAP dictionary that is used by the application. As a result, HVR obtains current pool and cluster table definitions, including any custom Z-columns. Similar to transparent tables, the raw data for pool and cluster data is also captured using log-based CDC, with decompressing and decoding of the data taking place on the integration side, away from the source SAP system.

Sooner or later most organizations will move the application suite to S/4HANA, and with that, the underlying database technology becomes HANA. HVR’s unique log-based CDC support for SAP HANA enables customers to continue to replicate SAP data into Snowflake.


Data replication and integration from most common sources to Snowflake using HVR:

Optimal change delivery
Identifying what changed on the source is only part of the problem. To apply change data with optimum performance and efficiency into Snowflake, HVR utilizes either Snowflake’s internal or explicit external data staging. In AWS, the staging area is S3. In Microsoft Azure, it is Blob Storage, and in the Google Cloud Platform, it is Google Cloud Storage (GCS). Data is subsequently bulk loaded through the copy command. The final stage of applying the changes to the target tables is performed using set-based SQL server merge operations, maintaining transactional consistency from the source.

Out-of-the-box, HVR provides three key capabilities to facilitate downstream ELT/ETL:

– “Soft delete” is a transformation to mark a row as deleted when it was physically deleted on the source. The soft-deleted row can now easily be identified on the target ODS as a delete that must be processed by ELT/ETL downstream. Without the soft delete, the alternative would be to use a resource-intensive query to identify what data is currently available in the target but no longer in the source.

– The ability to include metadata from the source like the commit sequence number from the source, or the commit timestamp, based upon which downstream ELT/ETL can determine what data set to process next.

– Support for the so-called agent plugin, which is a call-out to a database stored procedure, or a script or a program at certain moments during the data delivery, such as at the end of the integration cycle, when the target database is transactionally consistent. Customers often use an agent plugin to call their ELT/ETL routines.

End-to-end data replication
Organizations adopting Snowflake need a solution to set up end-to-end data replication between their transaction processing data source(s) and Snowflake.

For Snowflake integration, HVR supports:

– Discovery of table definitions on the source, including the ability to capture DDL changes from several source database technologies.
– Automatic mapping of data types to compatible, loss-less data types, with the ability to create the tables.
– One-time load through efficient copy commands, including the ability to parallelize load across tables, and to split large tables into multiple slices. The one-time load is integrated with CDC and incremental data integration.
– Log-based CDC from several database technologies, and continuous data replication into cloud-based data warehouses.
– Monitoring through HVR Insights, a browser-based environment providing a historical view into metrics aggregated by time, as well as a single topology overview.
– Automatic alerting to enable the system to operate without continuous monitoring.
– A Graphical User Interface (GUI) to configure data replication.

Cloud-based data integration requires strong security. At-rest Snowflake provides this, but you must ensure data security during data transfer and ingestion. HVR provides enterprise-grade security features to avoid data breaches:

– AES256 encryption throughout the entire data replication pipeline, using industry best practices including a wallet and unique encryption certificates.
– Support for OS-level and LDAP authentication, as well as custom plugin-based authentication.
– 2-Step verification support with unique certificates, avoiding a so-called man-in-the-middle attack.
– Pre-defined authorizations with full rights to make modifications, execute-only privileges, and read-only access.
– Auditing/logging of all operations that may affect data replication.

Time for a Test Drive

How can you most efficiently, and with the lowest possible latency, get data into Snowflake?

That is where HVR can help. Supporting customers generating multiple TBs of changes per day on a single system, HVR is well-suited to help large organizations populate Snowflake from operational systems with log-based CDC support on various commonly-used relational database technologies.

Interested in experiencing HVR’s high-volume data replication technology for yourself? Take HVR for a Test Drive.

Related Resources:
Transforming SAP Data into Insights with HVR and Snowflake (Part 1)
Modeling SAP Data in Snowflake (Part 2)

What to read next

Data Security and SOC 2 Certification
Data is one of the most important assets to modern organizations. Organizations digitally transform to [...]
Read on
HVR SAP AppConnect
HVR stands for High Volume Replication. The technology is renowned for helping enterprise customers implement [...]
Read on