HVR replicates transactions between databases that HVR calls 'locations'. Each change it captures is applied to the target locations. It can also replicate between directories (file locations) or replicate between databases and directories.
The HUB DATABASE is a small database from which HVR controls replication of the other databases. It can be either an Oracle schema (a username within an instance), Ingres database, SQL Server database or a DB2 database. It is created especially for the HVR installation and can have any name (in most examples in this manual it is called hubdb). It contains HVR catalog tables that hold all specifications of replication such as the names of the replicated databases, the replication direction and the list of tables to be replicated. These catalog tables are created in the hub database during installation (see also sections Installing HVR on Unix or Linux and Installing HVR on Windows).
A LOCATION is a database that HVR will replicate to or from. A location can also be a directory (a 'file location') from which HVR replicates files or a Salesforce endpoint.
A CHANNEL is an object in HVR that groups together the locations that should be connected together and the tables that should be replicated. It also contains actions that control how replication should be done. For example, to capture changes a Capture action must be defined on a database location. Channels are defined in the hub database. They can be configured for replication between various databases, or between files, or between databases and files. As well as replicating changes, channels can also be used to refresh data. Refresh means all data is read from a source and loaded into another database, without replication.
Process Architecture and Network Connections
The hub machine contains the HVR hub database and an HVR scheduler (which controls the replication jobs) and all the log files. Locations can be either local (i.e. on the hub machine) or remote.
To access a remote location HVR normally connects to an HVR installation on that remote machine using a special TCP/IP port number. If the remote machine is Unix then the INETD daemon is configured to listen to this TCP/IP port.
If it is a Windows machine then HVR listens with its own HVR Remote Listener (a Windows Service). Alternatively, HVR can connect to a remote database location using the DBMS protocol such as Oracle TNS.
The HVR Scheduler on the hub machine starts capture and integrate jobs that connect out to the remote location and either capture or apply ('integrate') changes to the remote location. HVR on a remote machine is quite passive; the executables are acting as slaves for the hub machine. Replication is entirely controlled from the hub machine.
Overview of Steps to Setup a Channel
HVR must first be installed on the various machines involved. For the installation steps, see sections Installing HVR on Unix or Linux, Installing HVR on Windows or Installing HVR on MacOS. These installation steps also create a hub database containing empty catalog tables.
Once HVR is installed, it can be managed using a Graphical User Interface (GUI). The GUI can just run directly on the hub machine if the hub machine is Windows or Linux. Otherwise, it should be run on the user's PC and connect to the remote hub machine.
To start the GUI double click on its shortcut or command hvrgui on Linux. See section Hvrgui for more information. The HVR GUI allows a channel to be defined in the hub database. The channel must contain at least two locations (e.g. an Oracle schema, or an Ingres or SQL Server database). It must further contain location groups. This is a collection of locations belonging to a channel. HVR actions are typically defined on the channel's location group. HVR channels can be for database replication or for replicating files.
To make a channel for database replication, choose the Table Explore option in the GUI to import a list of tables from a database location. Action Capture is defined on the source database to capture database changes and Integrate is defined on the target database to apply ('integrate') changes. HVR behavior can be reconfigured by specifying other parameters on the actions or by adding other actions. The quick start steps for setting up database replication is available in Quick Start Guides.
Managed File Transfer
A file replication channel is built with file locations. A file location can be a directory or a tree of directories on a machine where HVR is installed. It can also be a location that HVR can access using FTP, SFTP, WebDAV, HDFS or S3 protocols.
HVR can either replicate new files from one file location to a different file location or it can replicate between file locations and database locations. If HVR replicates between file locations it treats these files simply as a stream of bytes. But if a channel has database and file locations then each file is interpreted as containing database changes, by default in HVR's XML format.
When HVR is capturing changes from a file location's directory it can either move each file (delete it after it is captured) or make a copy to the other location.
The runtime replication system is generated by command HVR Initialize in the GUI or hvrinit from the command line on the hub machine. HVR Initialize checks the channel and then creates the objects needed for replication, plus replication jobs in the HVR Scheduler. Also for trigger based capture (as opposed to log based capture), HVR creates database objects such as triggers (or 'rules') for capturing changes.
Once HVR Initialize has been performed, the process of replicating changes from source to target location occurs in the following steps:
- Changes made by a user are captured. In case of log based capture these are automatically recorded by the DBMS logging system. For trigger based capture, this is done by HVR triggers inserting rows into capture tables during the user's transaction.
- When the 'capture job' runs, it transports changes from the source location to router transaction files on the hub machine. Note that when the capture job is suspended, changes will continue being captured (step 1).
- When the 'integrate job' runs, it reads from the router transaction files and insert, update and delete statements on the target location to mimic the original change made by the user.
HVR Initialize creates jobs in suspended state. These can be activated using the GUI by right clicking on a channel and select Start. Like other operations in the HVR GUI, starting jobs can also be done from the command line. See section Hvrstart.
The HVR Scheduler collects output and errors from all its jobs in several log files in directory $HVR_CONFIG/log/hubdb. Each replication job has a log file for its own output and errors, and there are also log files containing all output and only errors for each channel and for the whole of HVR.
A job's errors can be viewed by clicking on the job underneath the scheduler in the tree-view and click View Log. These log files are named chn cap loc.out or chn integ loc.out and can be found in directory $HVR_CONFIG/log/hubdb.
Refresh and Compare
As well as actual replication (capturing each change and applying it to another database), HVR also supports refresh and compare. HVR Refresh copies all existing data from one location to the second location and HVR Compare checks whether two locations have identical rows.
There are two flavors of refresh and compare - bulk and row-wise.
- Bulk refresh: HVR extracts the data from all the tables in one database, compresses it, brings it across the network, uncompresses it and loads that data into the target database. After the data has been bulk loaded into the target database, indexes are reinitialized.
- Row-wise refresh (repair): HVR compares the copies of each table, identifies which rows are different, applies those changes (as repair). Internally HVR selects the data from one database, compresses it, pipes it the target machine and compares those changes row by row with the rows in the target table. For each difference detected a 'repair' SQL statement is performed; an insert, update or delete.
- Bulk compare: HVR performs checksums on each of the tables in the replication channel. The actual data does not travel across the network so this is very efficient for large databases over a WAN.
- Row-wise compare: HVR extracts the data from one database, compresses it and on the target database it compares the data row by row with the data in the target database. For each difference detected an SQL statement is written; an insert, update or delete.