FAQs

HVR Real Time Data Integration

What sources and destinations are supported?
A rich set of commonly-used sources and destinations is supported, including Oracle, SQL Server and DB2 on commonly-used platforms. File systems including sftp and HDFS are supported as source and destination as well. For a complete overview check the Platform Support page.

What OS platforms are supported?
Commonly-used operating systems are supported including Linux and Windows, Solaris, AIX and HP-UX (virtual or physical). On most of these, based on supported databases available on these platforms, HVR is available for 32 and 64-bit architectures.

Do I need any on-premises software?
When you obtained a license from HVR you will be given access to software downloads on our website. All functionality of HVR is contained in a single product in one download and installation.

Where do I run the HVR GUI?
The HVR GUI is part of the standard HVR installation. If you install the HVR hub on-premises on a Windows server, you can run the HVR GUI there too and operate it using a remote desktop connection. Alternatively, you can install the HVR GUI on a PC (using the standard HVR installation) and connect with it to a hub running on any OS (Windows, Unix ….). The GUI is supported on Unix (X) too.

Does the GUI run on my Mac?
At present the GUI does not run natively on a Mac. You must run a Windows or Linux-based virtual machine on a Mac, or use X-forwarding or a remote desktop to run the GUI on the Mac.

What are the network and firewall requirements?
If the recommended architecture is implemented with an installation of HVR on or very near the source and destination data stores then the HVR executables will be sending data and changes over TCP/IP on the port of your choice. The hub installation of HVR always initiates the communication which means that relative to the hub firewalls must be opened (on the port you select; default is 4343) to initiate connectivity. (i.e. if the hub resides on-premises and the target is in the cloud then a firewall port must be opened to reach the cloud installation of HVR from the on-premises hub. If on the other hand for the same scenario the hub resides in the cloud then the firewall must be opened to reach the on-premises server from the cloud instance.)If you setup HVR to use a remote database connection then the database listener port must be open to allow the connection.HVR supports the use of a proxy server to route requests and act as an additional security buffer.

Is network encryption supported?
Yes, bi-directional SSL using custom-generated SSL public/private keys can be setup to secure data transfers.

What is the architecture?
HVR uses a distributed architecture. In any setup one of the HVR installations must be nominated to be the hub. The architecture is flexible and modular which means the hub can be collocated with the source, with the destination, or the hub can be running on its own server or virtual machine. The hub requires a connection to a database (the hub database). Setting up (real-time) data integration in HVR is done through a GUI connection with the hub. It is highly recommended to have an installation of HVR on the source database server, and on or close to the target. Such a setup optimizes resource utilization and generally minimizes latency. In some cases, an installation on the source database server is a requirement to support log-based change data capture. This type of installation is called an HVR remote listener agent and acts as a slave from the hub – no local configuration.

How will HVR help manage my replication?
HVR will do most of the management tasks itself. Its robust communication will automatically recover from most errors. HVR can send alerts by email or SNMP if recovery takes too long or an unrecoverable error occurs. In the HVR GUI, graphs and reports are available on the past and current replication status.

Back to Top

Oracle Replication

What versions of Oracle Database does HVR support?
Oracle Database 9.2 and above are supported by HVR.

What edition of the Oracle Database does HVR support?
HVR supports all editions of the Oracle Database, i.e. Express Edition, Standard Edition, Standard Edition One, and Enterprise Edition.

Does HVR support Oracle RAC (Real Application Clusters) and ASM (Automatic Storage Manager)?
Yes, HVR supports all combinations of versions and editions, clustered and non-clustered, with the many options to store database files.

Does HVR support Oracle Exadata as a source and target?
Yes, from HVR’s perspective Oracle Exadata is an Oracle RAC on Linux environment.

What operating systems does HVR support for Oracle Capture and integration?
Linux, Windows, Solaris, AIX and HP-UX, both on physical and virtual environments.

How does HVR connect to the Oracle Database?
HVR relies on Oracle Client libraries to connect to the Oracle Database, and to ASM if it is used. To establish the connection HVR requires access to Oracle Client libraries. With that HVR supports connections to a local database and remote connections using Oracle’s TNS. For a RAC environment a connection can be made through the SCAN listener.

Can HVR be set up to be highly available in an Oracle RAC environment?
Yes, HVR services can be enrolled in Oracle Clusterware in order to be highly available in a RAC environment.

Does HVR support pluggable databases?
Yes, HVR can be used for log-based change data capture from and delivery into pluggable databases, as well as of course traditional non-pluggable database.

How does HVR perform transactional data capture from the Oracle Database?
HVR directly accesses the Oracle Database transaction logs on the file system, including when the data resides in ASM. In rare cases HVR may (transparently) retrieve data from the database using an SQL statement.

Does HVR support Oracle Transparent Data Encryption?
Yes, HVR integrates with the Oracle Wallet and supports all flavors of Transparent Data Encryption in Oracle.

Do I have to install HVR on the Oracle Database server(s)?
For optimum performance, HVR recommends a local installation on the Oracle Database server. However, depending on the setup, there are options for remote capture on a different server running the same operating system using remote TNS connects, or file sharing e.g. using NFS. HVR also supports a so-called archive log only mode to run capture on a different server that does not run any database processing.

What are the minimum requirements to run HVR on an Oracle Database?
At the database level the DBA must enable minimal supplemental logging using an alter database statement. In addition, HVR will create supplemental log groups on all tables that will be replicated in order to capture at least the primary key columns for updates. The HVR solution connects to the database using a database user account with elevated privileges.

Can HVR run change data capture on a standby database?
Yes, HVR can run directly on a data guard physical standby database. The physical standby can be an active standby database but doesn’t have to be. Note that for the initial load (so-called refresh in HVR) HVR will have to run against the primary database if the standby is not open for read-only. Also, supplemental logging has to be enabled on the primary database.

Does HVR provide an archive log only capture mode?
Yes, HVR can be run on a separate server (or virtual machine) in an archive log only mode with archives copied to the server or made accessible through a file share. Note the operating system on the capture machine must match the database operating system, and supplemental logging has to be enabled through a connection to the actual source database. Also, the initial load has to be performed directly from the source database.

How does HVR ensure that no changes are lost when capturing from an Oracle Database?
On the capture side HVR will position a capture process, per thread, to a position in the transaction log based on the initialization time. From the starting point forward HVR will capture any changes against the tables that are part of the setup. Following the initial positioning the capture process keeps track of the log sequence number of the oldest open transaction it was tracking, and the relative byte address within that. If the capture is restarted for whatever reason then HVR simply goes back to the point in the logs where it left off when it last checkpointed. This type of recovery is fundamental to HVR. On the integration side, HVR uses a state table to ensure recoverability. As part of every transaction HVR applies to the target database it will process an update to the state table. In case of any interruptions, HVR will rely on the state table in the destination database, and the fact that Oracle performs transactional processing, to ensure not change are lost and changes are not applied more than once.

Does HVR support active/active replication between multiple Oracle Databases?
Yes, by default HVR will not capture changes applied by HVR, and with that setting up active/active replication is straightforward. A quick video on how to setup active/active replication on Oracle is here: https://www.hvr-software.com/resource/setup-multi-active-active-environment/

What Oracle data types does HVR support?
HVR supports almost all scalar data types through log-based change data capture, including large objects (CLOB, NCLOB and BLOB). Data types that are not natively supported (including XML data types, Spatial and user-defined data types) can typically be included in the replication using a capture expression to retrieve the data through a SQL expression.

Does HVR support DDL replication out of an Oracle Database?
Yes, HVR supports DDL replication but only DDL against tables, and changes to the primary key. Other DDL like secondary indexes, triggers, and DDL related to the creation of other database objects like PL/SQL stored objects, data types, users, tablespaces etc. is ignored. With the changes captured HVR supports delivery of the DDL changes in a heterogeneous environment i.e. against any of the supported targets, even if there are transformations in the setup.

What database objects does HVR replicate?
At present HVR only supports tables and their primary keys, and database sequences. Any other database objects are ignored during replication.

Can HVR capture changes from a view?
No, changes to the view are recorded against the underlying tables that make up the view definition. In order to capture changes against a view you should replicate the underlying tables, and re-create the view on the target database. HVR does support replication of changes against a materialized view given the materialized view is implemented using its own table.

Why would I use HVR instead of Oracle Data Guard?
Oracle Data Guard is included with the Enterprise Edition Database, providing a disaster recovery solution for the Oracle Database. With that an Oracle Data Guard standby database – in physical standby mode which is the most commonly used mode – has to be the same version of the source database. Also, the entire database is replicated and there is no flexibility for any transformations, or to replicate only one schema or a subset of the tables from the database. However all database objects are duplicated. The Data Guard destination database is however closed for DML and unless the extra-paid option for Active Data Guard is in place there is not even read-only access on the destination database. HVR on the other hand provides logical database replication with the ability to filter tables, columns and even rows, as well as the ability to deliver changes in a heterogeneous environment. The target database for data replication is always open for DML and DDL, so for example any requirement to implement a custom indexing strategy can be implemented on the HVR target database. Also, HVR is not restricted to the Enterprise Edition database but also supports all other Oracle Database editions.

Why would I use HVR instead of Oracle GoldenGate?
Oracle GoldenGate is Oracle’s data replication technology. It is a very powerful data replication solution similar to HVR’s that comes at a significant cost. Recent versions of Oracle GoldenGate use integrated components in the database which means that bug fixes or enhancements often require a database patch. With similar Change Data Capture capabilities and richer support for heterogeneous environments, HVR is generally more cost-effective than Oracle GoldenGate. In addition, the HVR solution provides powerful capabilities to perform table creation (in a heterogeneous environment), initial data load (again heterogeneously), compare/repair, as well as a graphical user interface an automatic monitoring, all in a single user-interface. Oracle provides products for data integration and management, but they are all different, not necessarily integrated but separately priced, tools and options (Data Integrator for heterogeneous initial loads, Veridata for compare/repair, Enterprise Manager plugin for graphical monitoring, and GoldenGate Studio for a GUI on top of Oracle GoldenGate).

Can I filter tables when replicating from the Oracle Database?
Yes, HVR can be set up to replicate a subset of the tables from the Oracle Database. Even better, per table a subset of the columns can be replicated, and even a subset of the rows can be replicated. In addition, extra columns can be defined with values populated during the replication.


Back to Top

 

Azure

Getting Started

How do I get started?
You may access the getting started documentation here: Azure getting started

How do I start my first replication setup (channel)?
See the getting started guide in the wiki and watch the video. In the demo directory in the HVR installation you’ll find some sample channels which you can load into your HVR setup using the GUI.

Where do I obtain a license?

If you are just getting started and wish to get data into Azure from 1-5 sources, then you can acquire HVR from the Azure marketplace here. Alternatively, you can request a free trial.

Where do I find additional documentation?

When obtaining the license and on-premises software, you will be given access to the documentation wiki. With the software comes on-line help and a pdf manual. Besides this FAQ, a quick start guide is provided with the Azure Image in the marketplace. Finally, a forum is available.

Systems and Setup


What sources and destinations are supported?

A rich set of commonly-used sources and destinations is supported, including Oracle, SQL Server and DB2 on commonly-used platforms. File systems including sftp and HDFS are supported as source and destination as well. For a complete overview check the Platform Support page. In Azure any of the supported sources and destinations can be used, including the Azure SQL Database.

What OS platforms are supported?
Commonly-used operating systems are supported including Linux and Windows, Solaris, AIX and HP-UX (virtual or physical). On most of these, based on supported databases available on these platforms, HVR is available for 32 and 64-bit architectures.

Do I need any on-premise software? If so, where do I get it?
When you obtained a license from HVR you will be given access to software downloads on our website. All functionality of HVR is contained in a single product in one download and installation.

Where do I run the HVR GUI?
The HVR GUI is part of the standard HVR installation. If you install the HVR hub on-premises on a Windows server, you can run the HVR GUI there too and operate it using a remote desktop connection. Alternatively, you can install the HVR GUI on a PC (using the standard HVR installation) and connect with it to a hub running on any OS (Windows, Unix ….). The GUI is supported on Unix (X) too.

Does the GUI run on my Mac?
The HVR GUI is also available on the Mac. You can find it in the download area on our website, which you have been given access to when you obtained a license. Note that on MacOS, just the GUI is available, the HVR Hub runs on Windows/Unix/Linux systems only.

What are the network and firewall requirements?
If the recommended architecture is implemented with an installation of HVR on or very near the source and destination data stores then the HVR executables will be sending data and changes over TCP/IP on the port of your choice. The hub installation of HVR always initiates the communication which means that relative to the hub firewalls must be opened (on the port you select; default is 4343) to initiate connectivity. I.e. if the hub resides on-premises and the target is in the cloud then a firewall port must be opened to reach the cloud installation of HVR from the on-premises hub. If on the other hand for the same scenario the hub resides in the cloud then the firewall must be opened to reach the on-premises server from the cloud instance.

If you setup HVR to use a remote database connection then the database listener port must be open to allow the connection.

HVR supports the use of a proxy server to route requests and act as an additional security buffer.

In what region should I run HVR?
Generally, HVR should be deployed in the same region as your Azure data sources. If this region is far away from you, you will appreciate HVR’s efficient and robust communications. Connecting from the HVR Image to data in another region will involve communication using the slower and more expensive database protocol. If you have data in several regions, consider deploying the HVR agent multiple times (by deploying HVR Image for Azure in several regions) or deploy a HVR hub in one of your regions and connect from there to the other region.

Is network encryption supported?
Yes, bi-directional SSL using custom-generated SSL public/private keys can be set up to secure data transfers.

Installation


How much time should I set aside for installation?
Installation in Azure takes approximately 5 minutes, though the provisioning within Azure may take up to 15 minutes (which is an unattended process). In that 15 minutes you can install HVR on-premises, provided all prerequisites are met. After that you will need to open a remote desktop connection the Azure VM just created to obtain the SSL certificate.

How do I perform the initial data load?
HVR refresh can be used to perform the initial load. Depending on source and target other options are available to perform the initial load including native database utilities. The advantage to use HVR for the initial load is the tight integration between the initial load and ongoing real-time replication.

How do I connect to HDFS / HDinsight on Azure?
The HVR Image for Azure currently does not support Azure HDFS locations. To connect to HDFS /HDInsight, deploy a Linux VM in Azure and install HVR there as described in the Unix installation chapter manual on the wiki.

Can I transform the data from source to target?
Yes, several transformations are possible:

  • Data type mappings between source and target are automatically handled by HVR.
  • Renaming of schemas, tables and columns is supported.
  • Replication of subsets of data (horizontal or vertical) is possible.
  • Column-level transformations and lookups can be defined.
  • Special commonly-used options like soft deletes in which an extra column marks a row as deleted, and auditing tables are supported out of the box.

Since data comes in as a flow of changes it is not recommended to perform heavy transformations that involve table joins and aggregations.


I want to learn more about HVR


What is HVR?
HVR stands for High Volume Replicator. Across platforms, including Azure, our software offers real-time heterogeneous data replication software. With HVR you get everything you need for data replication including schema creation, initial data load, real-time change data capture and delivery, and compare/repair. A typical implementation requires an installation of the software on the source and on the target server. The cloud image (HVR Image for Azure) as found in the Azure marketplace can be used if source or target is cloud.

Why would I use HVR in Azure?
The most efficient and secure way to transfer data between two systems is to use a copy of HVR close to the source and close to the destination. If your source and/or destination for real-time replication with HVR is in the Azure cloud then you should use HVR in the Cloud. The image saves you the effort to install the software and includes commonly used database drivers.

What is HVR for Azure?
HVR for Azure is a prepackaged Azure image of a Windows VM, offered in the Azure Marketplace. It contains a standard HVR installation which is preconfigured to be used immediately to connect to Azure with HVR. With a few clicks, the Azure platform is ready to receive or send data through HVR.

HVR image for Azure takes care of the firewall settings, necessary database drivers and required HVR components on Azure. Connecting from on-premises to Azure normally does not require any additional installation as the package contains all the drivers needed to connect to Azure SQL, Microsoft SQL server and Oracle.

What are common Azure cloud real-time replication use cases?
The two most common use cases are:

  1. Real-time reporting, from an on-premises or other cloud-based system into a database in the Azure Cloud.
  2. Database migration. Real-time replication software is used to minimize the downtime for the migration, and can be used to reverse the flow of data post migration to minimize the risk of the migration. On top of that with HVR data can be validated before systems are migrated.

What is the architecture?
HVR uses a hub and spoke architecture. In any setup, one of the HVR installations must be nominated to be the hub. The architecture is flexible and modular which means the hub can be collocated with the source, with the destination, or the hub can be running on its own server or virtual machine. The hub requires a connection to a database (the hub database). Setting up (real-time) data integration in HVR is done through a GUI connection with the hub.

It is highly recommended to have an installation of HVR on the source database server, and on or close to the target. Such a setup optimizes resource utilization and generally minimizes latency. In some cases an installation on the source database server is a requirement to support log-based change data capture. This type of installation is called an HVR remote listener agent and acts as a slave from the hub – no local configuration.

How will HVR help manage my replication?
HVR will do most of the management tasks itself. Its robust communication will automatically recover from most errors. HVR can send alerts by email or SNMP if recovery takes too long or an unrecoverable error occurs. In the HVR GUI, graphs and reports are available on the past and current replication status.


Back to Top

Database Replication Software Solution

Can HVR replicate data from a single source to multiple targets?
Yes, in fact that is one of the advantages of HVR’s architecture. HVR can capture from a single Oracle instance, queue the captured changes on the hub, and then integrate those changes to as many targets as needed. HVR does not have any limitation on the number of targets.

Can HVR replicate data from a multiple source to a single target?
Yes, HVR can be configured to capture data from many sources and then replicate to a single target. Many data warehousing solutions require data to be collected from any number of sources to be either combined into a single target warehouse database or into separate target schemas. Some applications and data are designed so that there will not be any conflicts on primary key constraints. If that is not the case for your scenario, then HVR offers you the option to add extra columns and set to values stored in the metadata to make sure you don’t have any conflicting primary keys.

Does my source and target layouts need to have the same structure and layout?
No, tables do not need to have the same layout. You can instruct HVR to ignore certain columns or populate extra columns during replication. Column values can also be changed through transformations as well as enriched with the results querying other tables, either on the source or the target. HVR also makes additional transactional metadata values to be available to be mapped to columns, such as source timestamps or transaction identifiers.

To minimize any impact to our network, can we compress the change data before it sent over the network?
Yes, in fact HVR already compresses the database by default before sending over the network using an internal algorithm which achieves very high compression rates. The impressive compression ratio reduces impact on your corporate network while using little overhead on the source.

When instantiating the target database, does the user have to pre-create the target tables, or can HVR help with that?
The initial load of the target tables takes place by running an HVR Refresh operation. The Refresh can create all the target tables if they don’t already exists. The target tables are created based on the DDL of the source tables in conjunction with any column re-mapping that the user has configured in the replication channel.

Can HVR convert all insert, update, and delete operations and insert them into a time-based journal or history table?
Yes, HVR Integrate provides a feature known as TimeKey which converts all changes (inserts, updates, and deletes) into inserts into separate tables. HVR will log both the before and after image for update operations, the after image for insert operations, and the before image for delete operations. HVR also logs additional transaction metadata to provide more time based details for every row replicated. HVR will also automatically create the tables with the preferred structure for timekey integration.

Can the data replication software keep current with transaction log generation?
Dependencies include the transaction rate (lots of small transactions or fewer larger ones), the percentage of transaction log that is relevant for replication, how many database objects must be tracked, what data types are involved, etc. If two replication products can easily keep up with transaction log generation volume, it is more relevant to talk about other aspects such as how many resources do products use (CPU, memory, IO) to perform at the “speed limit”, or how much effort does it take an administrator to implement the technology? The one aspect of performance that may be relevant is how long does it take to get to current if replication is running behind.

How fast can the software capture and replicate transactions?
HVR is a replication technology that can capture and replicate transactions at amazing speed. The team at HVR recently ran a TPC-C workload on a local Oracle database at about 800 transactions per second that were captured and replicated into a SQL Server database with at most a couple of seconds of latency, all running on a  two- year-old laptop.

How much resources will the data replication software consume?
Resource consumption includes CPU, memory, storage and network resources.The HVR engineers decided that it was best by default to trade CPU resources for network resources and always compress transactions across the wire. This is probably the right default for hybrid cloud environments or for replication across a wide-area network in which network bandwidth is typically severely limited. On the other hand it may not be the best default for replication within a data center on an extremely busy database system that already averages at 90% CPU utilization. Likewise HVR does not store transaction files on the source system saving storage and IO resources.

How much ongoing maintenance is required? How long does it take to get up and running?
Consider the effort it takes to setup and maintain the environment. Of course there is the setup of real-time replication, but that is certainly not all of it. In my Oracle to SQL Server environment aspects like DDL generation and initial load are extremely important to get up and running quickly. Also, it is comforting to be able to compare these heterogeneous environments to know whether the databases are in sync. HVR provides capabilities like these out-of-the box within a single offering but not every replication tool does. On top of that HVR provides a GUI to setup and maintain real-time replication. A lot of these kinds of capabilities relate to the cost of implementation and ongoing maintenance.

Back to Top

Data Lakes

How would building date lake in HDFS contrast with building it on S3?
S3 is the central storage environment for AWS services. A lot of the AWS services start with data in S3 like EMR, Athena, Redshift, and also partner services like Qubole and Snowflake. S3 has sheer infinite scalability, and uses a pay as you go model.

HDFS on the other hand is a storage environment on Hadoop. It is scalable, but an increase in size requires and increase in servers hosting the cluster (and rebalancing the storage which happens automatically). HDFS is a central storage environment for the Hadoop ecosystem with access to lots of technologies like Hive, Pig, HBase etc.

Technologies available on top of HDFS are generally available on AWS, with AWS supporting other services not available on HDFS.

Would HDFS be a good fit for a Data Lake?
HVR is agnostic to database vendors and big data platforms, meaning that we support a variety of big data technologies such as HDFS, Hive, and others. Whether HDFS is a good choice for any organization depends on its familiarity with the platform, but of course it can well be.


Back to Top

Performance

Does HVR deals with segments for parallel data upload?
Yes, HVR supports so-called sliced refreshes to enable a single table to be split into multiple smaller segments. Depending on the target technology these segments can be loaded in parallel. Note HVR generally performs direct loads into the target technology so database software that only supports one direct path load per target table cannot take advantage of parallel segment loads.

What are the minimum requirements for 50 GB of replication (real time replication) between source and destination on daily basis?
The most important factor that determines sizing of an HVR environment is the transaction volume. Generally on the source the HVR process to capture the changes is only a small percentage of the resources in use when the database is busy. The hub – that may be co-located with source or target – requires enough disk space to store compressed transaction files for some period of time in case integration does not proceed. With a very efficient compression algorithm the transaction file volume is generally less than one tenth of the transaction log volume, also depending on the percentage of database changes replicated. On the target the most resource-intensive operation is generally the database applying changes to the target. In the case of a file system target like S3 or HDFS, or in the case of Kafka it is HVR’s formatting of the rows that takes up up to one CPU core per integration process (again depending on how busy the system is).

Is it possible to activate real-time replication against complete source database (e.g. 350 GB size online db)?
Existing database volume is almost irrelevant to HVR but rather the incremental transaction volume is. Of course the data must be applied to the target system but for a large database it is often a matter of ensuring the target database system is properly sized and tuned to process transactions.

What is the minimum Latency and Maximum TPS (Transactions Per Second) supported in Active Active Homogeneous replication setup?
By default HVR’s minimum latency is between one and two seconds due to optimizations to make most efficient use of system resources like network and CPU. Note these optimizations can be tuned so lower latency is possible but in many cases not worth the resource utilization.

Maximum TPS depends on many factors including transaction size, table size(s) of tables changed in the transaction, data types, and more. In addition, on an active/active setup, HVR may or may not be requested to perform conflict detection and resolution (note it is always preferred to ensure the application is as prepared as it can be for active/active replication and avoid data collisions as much as possible). HVR’s conflict detection can have a very significant impact on TPS. When without conflict detection TPS may be hundreds if not thousands, TPS may go down to tens or hundreds (also depending on the setup and the number of tables requiring conflict detection).

Does HVR commonly handle high volume algorithmic trading types of applications?
HVR supports replication across all industries for many use cases. Most customers require little if any transformations on the database beyond schema name changes, and we don’t always know the source application. We do run log-based CDC for multiple customers who generate over 1 TB of transaction logs in a day.

What would be expected latency and transfer rate, in terms of number of rows and TXs per second?
Expected latency and transfer rate, both in rows/s and transactions per second, always vary based on a number of factors, including the size of the transactions on the source (how many tables and rows are modified in a single transaction), table data types, what percentage of database changes have to be replicated, what is the target technology (columnar and/or scale out databases have different characteristics than single-instance row-based databases), and more.

HVR’s solution is designed to provide high volume replication with near real-time results. Once a transaction is committed on the source, the changes are routed to our hub server generally within a second. Then in the default “continuous” mode that works very well on an indexed OLTP database the data is often processed in about another second, which means end-to-end data is replication within seconds. A single channel can handle 100+GBs/hour of transaction logs which can be integrated across as many integrate jobs as need. Channels can be scaled for higher volume scenarios.

From the seminar, we learnt that expected latency stays around single digit seconds, and HVRreplication supports replicating over 1TB/day of oracle redo logs, which are both impressive.
Our concern is, how much performance degradation may hit us, if XML/LOB columns are also included in replication?
HVR performance varies based on data types, table widths, and distribution of DML across tables with different data types. High volume tables with XML and/or LOB columns man slow down replication rates depending on the operations on the able (inserts, updates, deletes) and other configuration considerations.

How does source and target side agents respond to very large, long-running transactions?
HVR will always capture transactions as they start on the source database. Transactions are kept in memory until we see a commit, unless memory thresholds (per transaction) are exceeded and we spill information to disk. Only once the commit is seen will the transaction be propagated to the hub, and from there to the target. A large transaction that spilled to disk may take a measureable amount of time to make it across the wire to hub, and onto the target.

Please note that for the continuous mode there is an option to limit the size of the transaction on the target by setting option TxBundleSize as part of the Integrate action.


Our source database is 5.5TB with 16,000 connections hammering it daily. Will this present any issue in syncing with our targets?
The initial load on a large, busy database can be a challenge, especially if some of the biggest tables get the most transactions. Depending on the source database technology the database may have a hard time maintaining a read consistent snapshot on the busiest tables in the system for the duration of the table load. There are workarounds available to manage the initial load: (1) slice the load into multiple smaller chunks (available with the latest HVR version), or (2) run the initial load from a standby database that is temporarily put in a steady state, or (3) use another mechanism for the initial load e.g. backup/restore.

Note HVR refresh aligns the initial load and the incremental changes. For data refreshes outside of HVR a so-called resilient mode has to be used that merges ongoing changes into the target table to deal with an overlap between the initial load and incremental changes.

Back to Top

Connectivity

Can HVR handle SAP and IBM's DB2 data sources also?
Yes, HVR has many existing customers that source from SAP as well as IBM’s DB2 on LUW or iSeries servers. HVR even supports decoding of cluster and pool tables in SAP ECC, as well as log-based CDC from SAP HANA.

Does HVR support capture/integration of LOBs?
Yes, by default HVR captures full values of LOBs to be integrated into the destination database. Of course processing large LOBs will limit the number of rows per second processed.

Does integration speed vary for different destination file types?
HVR supports multiple different file types as a target: XML, CSV, Avro, JSON and Parquet. Performance for different file types will vary slightly due to the file format, how verbose the output is (particularly relevant for XML), and in the case of for example Parquet whether the data compression is bottlenecked on CPU processing. In general HVR can write thousands to tens of thousands of rows per second into any one file format, depending on a variety of factors including data types, row width, number of columns and more.

Does HVR read the Online Tlog for SQL Server 2016?
HVR supports online transaction log reading from SQL Server 2016. Please see HVR’s platform compatibility matrix: https://www.hvr-software.com/wiki/HVR_Platform_Compatibility_Matrix .

How do you address data type differences between Oracle and SQL Server?
HVR uses an internal data type representation that is mapped to every supported database’s data type representation. As a result HVR can ensure loss-less data transfers between databases in a heterogeneous environment. For example a DATE data type in Oracle has a time component so in SQL Server HVR would create the target table with a DATETIME or DATETIME2 data type given the DATE data type in SQL Server does not include a time component.

Can HVR replicate from MySQL to SQL?
MySQL capture is planned to be available in the second half of 2018. Generally HVR supports all targets for any supported source technology, so as soon as MySQL as a source is available it will be supported for all database technologies as a target. Our website shows the supported platforms at https://www.hvr-software.com/product/platform-support/ .

What is an alternative to running agent on SAP system (non HANA) for tracking changes?
HVR has flexible topologies in that we can run the capture process from a middle-tier replication server. We can also process changes on a physical standby database, or capture from only the backup logs on a separate system (depending on the underlying database).

Is HVR able to read from multi-dimensional data such as Infocubes from BW system?
HVR does not currently support multi-dimensional databases. I.e. unless the dimensionality is implemented as metadata on top of a (currently supported) relational database (so-called ROLAP) HVR will not be able to support CDC from the database.

Does the declustering feature of SAP (i.e. for ATAB, RFBLG) work with Teradata as the target?
Yes, declustering will work for any supported target. However for Teradata as a target we recommend interested organizations evaluate Teradata’s out-of-the-box Analytics Enterprise Applications solution.

Is there a source agent plugin for databases that are not on your list?
Yes, a source agent plugin can be written for databases that are not on the list.

Is HVR integrated with RMAN and Data Guard? Will it affect my RMAN backups, in particular the deletion of archivelogs that have not been applied to the target yet?
HVR is not integrated with RMAN or Data Guard. However HVR provides a utility hvrlogrelease that can be used to manage archive files (available not just for Oracle as a source database) so that they are available if needed by HVR log–based capture jobs. Note HVR can capture from a Data Guard physical standby database directly, as well as from a so-called archive log only source (with the archive logs possibly made available through RMAN or Data Guard).

For CDC, what types of permissions are required on the source database? Is any special configuration required on the source database to enable CDC?
Required database privileges vary from database to database. Please refer to the documentation at https://www.hvr-software.com/wiki/Main_Page to review the requirements for the various platforms.

Do you support materialized views as source on Oracle side?
Yes, the storage object for a materialized view in Oracle is a table and DML against the table is logged in the transaction log. Note that in some cases it may make more sense to replicate the tables used to define the materialized view and create an identical materialized view on the target. If HVR replicates materialized view data then the target will (by default) see a table rather than a materialized view and Oracle features like query rewrite will either not work, or require a different level of query rewrite integrity.

What is replicated for Oracle database? All objects, or only tables?
Table DML, basic table DDL (add/drop/modify column, add/drop table), changes to the primary key, table truncates, and sequences are captured. The focus for HVR is on real-time integration so HVR does not capture for example secondary indexes, triggers, stored procedures and other objects. Note that once captured these changes can be delivered to a heterogeneous target.

It is possible to use Data Guard environment as source for the replication?
Yes, with an Oracle source database you can use a Data Guard Physical Standby database as a source (both active and passive). You can also configure an off-box capture that reads only archived logs.

Equivalent capabilities are available for SQL Server also. HVR can capture from a standby node in the AlwaysOn cluster, and HVR can also capture from a separate system that only receives the transaction log backups.

Back to Top

AWS

How does HVR work with RDS, S3 and Redshift?

  • RDS is a platform service offered by Amazon to host a relational database service such as Oracle, SQL Server, PostgreSQL or MySQL.
  • S3 is a service that provides storage. There is no direct relationship between RDS and S3.
  • Redshift is a data warehouse platform available on AWS. Data loads into Redshift are most efficient when data is staged in S3.
HVR recommends customers to run HVR (as agent, or as hub) in the same availability zone as the target, whether the target is RDS, S3 (the zone where the bucket was created) or Redshift. Any communication between HVR executables takes advantage of compression, large block transfers, and optionally encryption.

Can HVR use Amazon RDS as a target? What about Aurora? What about Redshift?
HVR supports all RDS database flavors in AWS as a target, including PostgreSQL compatible Aurora and MySQL compatible Aurora. Redshift is also supported as a native target through ODBC. Note that to move the data into Redshift HVR will always first write data into S3 following which – transparently – data will be moved into Redshift using a copy command.

When migrating from on premises to for example AWS, do you need a complete copy of the database. How is this achieved?
Most database migrations start with an initial data load, followed by ongoing CDC (for some time), and possibly reverse replication for some time post switch over as a fallback in case the new setup is not adequate for the load on the system.

HVR provides all capabilities to perform these steps, and more, most notably the ability to compare data between source and target, in a heterogeneous environment, to ensure data consistency and accuracy before switching over to perform the migration.

Can HVR support the following approach for migration: first full load, then during a certain period replicating from on-prem to AWS. After cut-over reverse replication? We cannot miss any object created in the source database once replication is going on.
Fundamentally HVR can support this model as long as the database on either side of the replication stream is supported. If the source database is Oracle or SQL Server then HVR will be able to capture table-level DDL including create table, and replicate the result to the target (even in a heterogeneous scenario).

Note if the AWS database is an RDS system then we must use log-reading through SQL because direct access to transaction logs is not available on RDS.

Back to Top

Usage

Can we use a data load utility that is not HVR for the initial load, such as a backup of the database as of some consistent point in time?
Yes, HVR can exactly align an initial load performed by an external utility with ongoing CDC. This is is particularly useful if the external utility took a consistent snapshot of the entire data set in which case HVR can be instructed to start emitting the changes after this system commit number.

HVR can also work with an initial load that was not as a whole transactionally consistent, but for such a scenario so-called resilient processing has to be enabled temporarily to handle the overlap between the initial load and on-going CDC. Resilient processing results in changes being merged into the target data set. Note resilient processing is only available on database targets.


Can a captured workload be used to perform a load test on a QA or test system?
HVR keeps track of a rich set of statistics to demonstrate the usage profile of the database. Transaction files can automatically be journaled and quite easily be replayed again.

Do note that HVR’s load pattern on the database is different from the way most applications use a database. Most applications will handle a multi-user workload with many concurrent transactions. HVR will generally run as a single session (coming from a single source) with only one open transaction at any one point in time. Also HVR bundles small transactions to more efficiently apply the load to the target.

How long does a typical HVR implementation take?
The amount of time it takes to implement HVR varies greatly depending on the scope of the project and the ambitions, and to some extent on the data volume. Some organizations implement one to one replication in less than a week. On the other hand other organizations continuously evolve their data integration infrastructure, continue to add source systems to their data lake, and then decide to shift to the cloud. Projects like these take many months and initially targets and then sources shifting to the cloud are almost ongoing implementations.

Can HVR be called on the command line or through a scripting language?
At present HVR can be called through a command line interface. Various organizations use this to complement scripts they created using the operating system’s shell environment (e.g. sh or bash on Linux or PowerShell on Windows). Note the GUI will often show command line equivalents to achieve the same result, and the command reference in the documentation is complete.

Future versions of HVR will expose REST interfaces.

In your experience working with clients, which product is seen as the most preferable target system?
At HVR we want to enable our customers to perform the data integration projects to support their business using the technologies they prefer to use and/or own. HVR is used for a variety of use cases and nowadays cloud targets are very common. HVR does not recommend any specific target technologies.

What happens if the initial load does not succeed? Will it start loading the batch from start or from last failed job?
The HVR Refresh (initial load) can be configured to run in parallel jobs, either on-demand or scheduled. In other words, you can run online or in batch for a set of tables. A job that fails will automatically restart, but it will restart from the beginning. To avoid redoing a large load we make it easy for the user to create small bundles of tables in a single job so that jobs finish relatively quickly and if one of them restarts then the amount of work that is redone is limited. How many jobs run in parallel can be controlled through the scheduler.

Can I save the output of Compare automatically on a scheduled process?
HVR compare jobs can be run by the scheduler i.e. on a defined schedule, and their output is stored in the hvr.out log (as well as in the channel log, and in the job log). Detailed output currently has to be retrieved from the log directly.

How can I save the output from a scheduled process and re-direct it to a text file?
HVR jobs can easily be run from the command line, and as a result as a scheduled job outside of HVR. This makes redirecting the output to a file easy, but of course one loses the overview of the job in the HVR GUI.

Where does the HVR Repository reside?
The HVR repository is stored in one of HVR supported databases. The database can be local to the hub, or remote (e.g. in a relational database service, or physically on a different machine). For a remote database connectivity libraries for the database flavor must be installed.

Can we load just specific tables?
The HVR developer creates so-called channels to indicate which tables should be replicated. As part of the channel definition the HVR developer indicates what actions are relevant to every table, and during initial load only few of many tables can be refreshed.

Can the bulk data load be performed while the target table is in use? What if the source table is in use?
Yes. Different databases behave differently when certain operations are performed, but in cases where a bulk (append) load would not be possible HVR supports a so-called row-wise load. On the source side HVR simply performs a non-intrusive SQL select from the table, so as long as that select is not blocked by some other session the source table can also be in use by multiple transactions.

Do you support wildcards for table filtering (to automatically capture new objects in the schema and add them into replication without reconfigure replication process)?
Yes, the AdaptDDL action provides the ability to automatically add new tables to the channel based on source to target schema mapping.

What kind of metadata can I add to captured records behind the scene (SCN for Oracle source, e.g.) which can be used on target to insert as extra column values when converting DML into operation logging instead of applying real DMLs on the target?

HVR provides a number of environment values that can be included as part of an integrated row. Commonly used examples include:
  • {hvr_op} to indicate the operation type (truncate, insert, update (before non-pk change/before pk change/after), delete)
  • {hvr_cap_tstamp} to store the commit timestamp on the source.
  • {hvr_integ_tstamp} to store the integration timestamp.
  • {hvr_integ_seq} a unique, monotonously increasing value that is built up from the commit sequence number on the source and the order of the sequence in the transaction (i.e. row changes are made in the order of hvr_integ_seq).
These and more environment variables are documented in the /IntegrateExpression at https://www.hvr-software.com/wiki/ColumnProperties .

What DDLs are supported? Can we add new tables and/or columns to replications? Can we alter column length, for example, from varchar(100) to varchar(200)?

HVR generally supports table-level DDL, and certainly everything listed. More details can be found here:

Does HVR support replicating only a subset of columns for a specific table? Can we have different indexes on source and target sides?
 Yes. By default HVR will replicate all columns, but you can define replication channel to exclude any columns you want. Because the target database is open for reads and writes you can define as many indices as needed on your target.

The compare/repair feature sounds nice, but how expensive (time and resource consumption-wise) is this operation? We have tables contain billions of rows and hundreds of columns…..
We can help work with you on best practices for establishing data validation services to meet your needs. For example we can filter compare jobs so that generally only incremental data sets are validated and not historical data.

How flexible is HVR replication? Can it suppress certain deletes, or filter some replicated DML?
Yes, HVR is flexible in that you filter data at many points, such as when data is captured, integrated, refreshed, or compared. The number of filtering options is very extensive.

Can we replicate to multiple targets simultaneously? We have 3 report server targets that need refreshing nightly.
Yes, with HVR it is trivial to set up capture once, and deliver to multiple targets.

Back to Top

Architecture

How does HVR handle long-running transactions on the source? If CDC stops in the middle of capturing a long-running transaction, can it be restarted post the large transaction?
HVR always captures changes as they happen and modify tables that will be replicated. HVR will however not propagate the changes until the transaction is committed.

Starting with version 5.3 HVR will automatically checkpoint the capture state at an interval to avoid having to go back into the log to re-read the entire long-running transaction.

Note HVR’s goal is to keep system transactionally consistent which means by default no transactions will be skipped. An HVR operator can always intervene and reset the capture time to skip over transactions for a time period.

Does the HVR hub provide a high availability setup?
HVR supports running the hub in a clustered environment for high availability.

Will HVR honor enforced foreign key constraints? Will enforcing constraints on the target have a performance impact?
If HVR is processing changes in its default continuous mode then all transactions will go into the target in the order they were applied on the source, and within a transaction changes are also applied in order. As a result – assuming source and target are in sync – integrity constraints can be enabled and enforced on the target. Of course the database must preform extra work to validate integrity constraints so fundamentally comparing data integration with enforced integrity constraints will always be slower than integration without enforced integrity constraints. On a well-indexed database the difference should be low, and depending on the change data volume latency may be the same i.e. extremely low.

Note if HVR is processing changes in so-called burst mode then when applying changes to the target tables integrity constraints will be violated. By default this constraint violation only happens inside of a transaction with the transaction boundary always exposing a consistent data representation. Integrity constraints could be enabled and enforced in this model only if the constraint validation is performed at the end of the transaction i.e. deferred. Not many databases support deferrable constraints (Oracle does).

What are the implications of enabling replication on the database transaction log?
Like any other log-based data replication product HVR needs to know the row identifier in order to replicate the changes to the target. Databases will generally do this regardless for insert and delete operations, but not for updates (since the core recovery concepts use internal row identifiers to replay the updates). As a result HVR will generally activate so-called supplemental logging at the table level to be able to get the row identifier for updates. The impact this will have on the transaction log depends on a number of factors including the percentage of tables in the database that will be replicated (only replicated tables need supplemental logging), the percentage of updates relative to inserts and updates on the replicated tables, the size of the key columns or if the entire row is logged (some databases only support all column logging), the row width, and the number of columns that is generally not touched as part of an update.

With an increase in transaction log generation the transaction log and and transaction log backups will increase in size and log rollover frequency may increase. Note that in the case of an Oracle Database not only the transaction log rollover frequency will increase, but also undo tablespace utilization will increase for tables with supplemental logging enabled.

Is a primary key on the tables I want to replicate mandatory on both source and target?
No, HVR will use all non-LOB columns to identify a row if there is no primary or unique key on the database. HVR will also work fine if there are duplicate rows in the table.

How do you sequence the load with foreign key constraints?
In continuous data integration mode HVR maintains the sequence of transactions, as well as the order of row changes within the transaction. As a result – in continuous data integration mode – HVR will apply changes successfully even with enabled and enforced foreign key constraints on the database.

During the initial load (refresh) HVR will by default automatically disable foreign key constraints knowing that the initial load will temporarily result in an inconsistent state. However once continuous replication catches up to the point in the log when the system is consistent again HVR will again automatically re-enable all constraints it previously disabled.

There is a special mode to run an initial load that enables a customer with deferrable constraints on the tables to perform the entire initial load as a single transaction with the constraints set to deferred for the session that performs the data load and validating at commit time. This capability is only available for an Oracle Database target, and it requires a consistent snapshot is taken from the source database (either because the database is idle, or by using a flashback query (only available on Oracle) as of the same SCN for all selects as part of the initial load).

Does HVR require the SQL server CDC option on the table?
In order to get the row identifier for updates HVR needs so-called supplemental logging enabled on the source database. SQL Server does not currently support a simple alter table command to enable supplemental logging but two SQL Server native capabilities have the same effect of enabling supplemental logging (unfortunately there are other side effects that HVR does not care about and actively tries to disable): (1) CDC tables, and (2) Articles (as part of SQL Server replication). CDC tables on SQL Server have fewer limitations than Articles (e.g. CDC tables don’t require a primary key on the table, and CDC tables don’t have to be dropped in order to alter the table) so HVR will prefer to create CDC tables over Articles. Note even though the tables are created HVR prevents populating the tables because HVR parses the log itself.

Does HVR require the SQL server replication turned on?
No, HVR does not require SQL Server replication to be turned on. However on SQL Server Standard Edition, prior to 2016 Service Pack 1 (when CDC tables became available on standard edition), HVR will want to use Articles to enable supplemental logging. To be able to create the Articles SQL Server replication must be installed, but it does not have to be turned on.

If there are changes in source system tables, will HVR loads to target database/tables fail and will need immediate adjustment to HVR and target system or it can be done at a later point?
HVR is extremely flexible on the initial load process, known as Refresh. The Refresh will automatically create the target tables (optional) and load the data directly from the source. The Refresh is integrated with the Capture/Integrate jobs which will then keep the tables synchronized.

Does HVR offer functionality to aggregate and filter data when transferring it to target system? Is there a transformation layer to add business logic and if yes, what would be the programming language?
Yes, HVR provides filtering/restrict conditions that can be set for capturing data, integrating data, or while performing refresh/compare. These conditions are written in SQL.

To populate extra columns there is support for a SQL fragment or the output of a stored function call.

We also support extensions to write your own logic using agent plug-ins. The agent plugin supports any programming language, script, or database procedure.

Will the compare function take a filter condition into account for record counts?
When HVR compares the data it will honor a filter on the data and report the row-count accordingly, as well as the extent to which the data is in sync, with row-wise compare providing more details than bulk compare.

Can we have our own database triggers on the target tables?
Yes, we are applying DML to the target table in the same way as any other application, so you can have triggers, etc. on the target table. You just need to be aware of the effect of how it affects the DML being applied. We have come across scenarios when clients reported out-of-sync conditions only to realize that changes were introduced by triggers on the target. Also, by default during the initial load the triggers will be disabled, but there is an option to leave them enabled (which behind the scenes on most databases disables the direct path load that otherwise would be done to accomplish the initial load i.e. the load will take significantly longer).

In the hub connection screen, I don't see a connection to SAP. Can you explain how the connection to SAP ECC or HANA system works? Is HVR pointing to the database directly?
HVR does not currently support SAP HANA as a repository database, but SAP HANA is supported as a replication end point (source or target). The connection to SAP HANA is set up in the Location Configuration, after the connection to a repository database is made. For both SAP ECC systems and SAP HANA HVR will perform log-based CDC. Certain SAP ECC tables contain compressed and encoded data for so-called cluster and pool tables, which HVR’s SAP transform dynamically will decompress/decode on the way into the target database. Note HVR does not use ABAP code when accessing HANA or ECC, so the software makes no use of the SAP application servers.

Supplemental logging is given as an option when configuring HVR against an Oracle database. Does this imply that we can choose not to activate supplemental logging. Does this mean that supplemental logging is not necessarily needed?
Supplemental logging is always needed. However to add supplemental logging to tables the database user HVR uses to connect to the database needs elevated privileges, and DBAs don’t always grant these. The option to not activate supplemental logging is there as a performance optimization for the initialization process if supplemental logging is already on the tables (either because the DBA did this, or because initialize was run before and no table level changes were made to the channel since that run).

Are we able to encrypt data as it moves from source to target?
Yes, HVR provides the ability to encrypt the data over the network using SSL certificates. For certain platforms (like S3) we also support the choice between HTTP and HTTPS calls into the target (optionally with client-side encryption i.e. encrypt before the data leaves the server).

Can we skip the initial load and warm start at a specific point for DB2 and MS SQLl? We already have source and target sides kept in sync through existing mechanism.
Yes. Capture allows you to rewind the starting point into the logs based on a specific date and time as long as those transaction logs are still available on disk. You can also control the point from which Captures starts to emit those changes which would be based on the time in which your target was last synchronized using your previous replication technology. We also provide a built-in exception handling mechanism to deal with any overlap of time.

Sounds like HVR does provide HA/rapid-failover support mechanism, especially when shared storage is not available.
Correct, the HVR_CONFIG location contains state information that is required to resume on a different environment without missing a beat. We have helped customers set up a shared nothing DR environment and would be happy to discuss how we can implement this for your scenario.

Does all the data reside within our network or does it go to your cloud? If it goes to your cloud are you able to offer HIPAA compliance?
Data only moves between the servers that you configure, and not through HVR’s cloud.

Back to Top

© 2018 HVR

Live Demo Contact Us