Welcome to your guide!

Real-Time Data Replication Test Drive

Replicating Changes

With the correct channel definition in place, you will now promote the Definition into a run time environment. After that, you will perform the initial data load and start the jobs to synchronize the databases.

Initialize

The step to deploy the Channel Definition in a runtime replication environment is called HVR Initialize. This step is always required for a new Channel Definition, but also after changes have been made to an already existing Channel Definition and are ready to be implemented in the runtime environment.

1. Below your Channel Definition in the tree on the left side of your window, right click on TARGET in order to show the context menu.

2. Select HVR Initialize.

3. Since this is the first time you will initialize the channel, all options are checked.

More on this topic

  • State Tables will create state tables in the target databases. These are used to keep track of the replication so that, in case of a loss of the network connection to the target, or some crash, HVR can exactly resume the replication where it left off. By default, state tables are also used to prevent a loopback of transactions in an active/active setup.
  • Transaction Files and Capture Time will create a capture state file that maintains the position in the log and where to read from. By default, when selecting this option, the capture time will be Now, but capture rewind is available if during testing you want to go back or forward through the transaction stream (as long as the backups of the transaction log are still around).
  • Table Enrollment queries the source database’s dictionary to map table and column names to object IDs since transaction logs work with object IDs and not table names, and HVR parses the log to obtain transaction information.
  • Supplemental Logging will verify if the source tables have supplemental logging – i.e. additional logging of key values for updates – defined already, and if not, add it as needed.
  • Scripts and Jobs creates the Capture and Integrate jobs under the Scheduler node that will eventually be run to keep systems in sync.

1. Click Initialize with all defaults checked.

2. Wait for the initialization to complete.

3. When Initialize finishes, a pop-up shows that three jobs have been created and how they could be started through the command line should that be desired. Later you will start the jobs through the GUI.

4. Click OK on the pop-up.

5. Click Close on the Initialize dialog. (Or read more about this topic in the section below before closing the dialog).

More on this topic

Before closing the HVR Initialize dialog, notice how in the status bar HVR shows the equivalent command hvrinit that could be executed on the hub to achieve the same result. As you check/uncheck options, HVR will change the command to help operators who prefer to operationalize their environment through scripts to learn the commands.

Start the Capture Job

1. Navigate to the Scheduler node for your hub click the plus button to expand the tree.

2. Right click on orcl2mdbpg-cap-orcl to select Start. to start the job and confirm the “really start…” message.

3. Since this is the first job you want to start with the Scheduler not yet running, HVR will prompt whether you want to start the Scheduler. Click START.

Do not yet start the integrate jobs at this time, because the target databases do not yet have the target table definitions, let alone the data from the initial load. You may notice, however, that changes are made available for the integration targets to pick up once they are ready with the source database actively processing transactions.

More on this topic

At this point Change Data Capture has been initialized, and HVR will capture any changes in the transaction log that are relevant to the channel from this point forward, irrespective of whether they occur as part of a new transaction or are in the middle of an ongoing transaction. In this trial environment the source system runs a quite typical OLTP (On-Line Transaction Processing) workload that consists of only short-running transactions.

In a production environment with long-running transactions you must consider that HVR will only capture new changes from the initialization point forward. However, also bear in mind that HVR will not be able to successfully capture changes on tables that don’t yet have supplemental logging enabled. This could mean – especially on a database like Oracle where it is fine to have very long—running transactions in the database—that you must enable supplemental logging on the tables some time before doing the initial load of the data to ensure you don’t lose any transactions.

Production systems, as well as this trial environment, don’t keep the archive log files around forever. Hence, as a best practice, start the capture job following the initialize so that HVR can capture any changes of interest and start queuing these on the hub for integrate to pick up following the initial load.

Refresh and Repair

HVR uses the term Refresh to perform a one-time data load. Refresh is often used for an initial load because HVR automatically aligns the one-time load and incremental CDC to ensure system synchronization. Refresh also automatically maps data types from the source to compatible, loss-less data types on the target, in a heterogeneous environment. Refresh can also be used to create the tables in the target database.

1. In the HVR GUI, under the Channel Definition of your orcl2mdbpg Channel in the tree, pull up the context menu.

2. Select HVR Refresh to pull up the Refresh dialog.

3. On the top left, select Location: orcl.

4. On the top right, select both mdb and pg.

5. In the Table Name box, check the All Tables box.

More on this topic

HVR retrieves information from the Channel Definition to make assumptions about what location should be source and target for the Refresh. These can be changed if needed for a scenario where, for example, one target is used to populate another target.

For a Channel that has been already initialized, HVR will, by default, not include any tables to safeguard you from accidentally refreshing tables that may already be in sync.

Also, by default, HVR will Refresh Data (as opposed to just the table definition) as part of the Refresh. The default method to refresh the data is through a bulk method using a fast, database-specific, direct path loading approach into the target.

Row-wise mode is equivalent to Repair because it first computes and then applies the difference between source and target table.

The bulk Refresh method is often faster than row-wise mode, because no difference has to be computed and the direct load is fast. However, the bulk mode is more intrusive on systems that already had data in the target table.

6. Leave the default option Bulk Granularity selected.

7. To the right, there are options for parallelism to speed up the load. Select Parallelism for Locations value 2, and Parallelism for Tables value 3.

8. By default, HVR will align the initial load and ongoing replication. The option, Online Refresh, controls this behavior and is generally correctly populated by default.

9. Make sure Online Refresh is checked, with the option Only Skip Previous Integration selected.

This is because you have two targets that are populated independently, and skipping on the capture may result in data loss.

More on this topic

On the far right the Refresh dialog shows options to manipulate Foreign Key Constraints which we leave default, as well as Select Moment, which you will leave untouched.

Select Moment is useful for scenarios when an exact moment in time for the select from the source is important for example when doing an initial load into a filesystem-based data lake (like S3, Azure Data Lake Store or Blob Storage, or HDFS). Select Moment on Oracle leverages Oracle’s flashback capability.

10. Toward the bottom of the dialog, check the box for Create Absent Tables. This option instructs HVR to discover any existing table definitions on the target system(s), and adjust them as needed.

11. Leave the default option With Index checked, resulting in a primary/unique index on the key column(s), or a non-unique index on tables without a unique key on the source, in order to speed up data integration.

More on this topic

Again, note that the status bar at the bottom of the dialog has been populated with a command that can be executed on the hub to achieve the same result.

HVR Refresh provides additional capabilities to assist with refreshing table definitions and data:

  • Scheduling, in its own tab, to create jobs that are run by the scheduler (either one-time or using a schedule created by the Schedule Action). Jobs running through the scheduler have their output appended to the scheduler log, and in case of a job failure, the scheduler will automatically retry the job.
  • Slicing is also defined on the Scheduling tab, to sub-divide a large table into multiple smaller increments that, depending on the target, can be run in parallel. There are GUI options to use slicing on a numeric column by simple division, using ranges or using a list of values, with additional flexibility when defining slicing through the command-line interface.
  • The Contexts tab is used for more complex scenarios with actions defined using the parameter /Context. For example, a filter to restrict data Refresh only for the year 2019 using the Restrict Action may have been created using /Context=refresh2019. In that case, selecting the Context, refresh2019, would result in activating the filter, when a Refresh without activating the Context would refresh all data (unless another filter would be activated in that case).
  • A Refresh run may use variables that are defined as part of the Channel Definition. For example, a filter condition using a Restrict Action may filter based on a date range with lower and upper boundaries as parameters to be provided at runtime (using a specific Context or not).

12. Leave all options on the tabs Scheduling and Contexts default.

13. Click Refresh to run create the tables and load the data interactively.

14. Once the Refresh is done, close the pop-up and the Refresh dialog.

Start Integrate Jobs

With the initial load finished, you can now start the integrate jobs:

1. In the tree on the left, navigate to the Scheduler node. Click on Scheduler.

2. Below Scheduler navigate to the folder orcl2mdbpg-integ with two jobs underneath.

3. Right click on that folder to bring up the context menu and select Start to start both jobs at the same time.

4. Confirm the Really start… message.

Checkpoint

Starting from the previous checkpoint with a complete configuration, you should now have replication running.

1. In the HVR GUI, select the node Scheduler for your hub, and look at the state on the right-hand side. You should have three jobs running and regularly see data flowing.

2. If this is not the case, go back through the last few sections. If something went wrong during Initialize or Refresh, you can go back and redo the steps to get to a working setup at this stage.

Test drive
Contact us