Comparing Data

Last updated on May 04, 2022

Contents

Option Compare Data allows to compare data in two or more locations (databases) in a channel. It compares not only the table structures, but also the data in the tables itself. You can compare a single source location with multiple target locations. You can compare an entire database (all tables) or specific tables only. HVR also supports comparing data in heterogeneous DBMSes when source and target data types may not match and have different character encoding. The HVR’s Compare functionality standardizes data types internally, coercing data to fit the internal representation. In the channel definition, you can control how HVR deals with coercion errors. By default, a coercion error results in a fatal error. 

Incompatible data types, differing data structures and alternative naming can be compared too, by applying transformations using actions that control specific compare behaviour. For example, to compare lowercase data, the ColumnProperties action with the CaptureExpression parameter can be defined on one of the locations  specifying the lowercase conversion operation. Actions can be defined per table, location, channel or installation.

HVR allows to compare file locations. For more information on that, see Direct File Compare.

Option Compare Data is equivalent to the hvrcompare CLI command. 

This option is available on pages the following pages:

  • Channel Details: the Compare Data button at the top right of the page.
  • Locations: the Compare Data option under the More Options menu  at the top right of a page.
  • Location Details: the Compare Data option under the More Options menu  related to each channel in the Channel Membership pane. 
  • Tables: the Compare Data option under the More Options menu  at the top right of a page.
  • Table Details: the Compare Data option under the More Options menu  at the top right of a page. 
  • Event Details: (related to a compare event): the Repeat Compare button at the to right of the page.
  • Jobs: the New Compare option under the More Options menu  related to a compare job.

The Compare Data option may appear disabled in certain cases, for example, on the Channel Details page, if no locations are added to a channel. On the Tables page, you need to select one or more tables to enable the option, etc. When you hover over the disabled option, a tooltip will appear with an appropriate explanation.

The option opens the Compare Data dialog allowing you to choose specific locations and tables to be compared and configure different options to customize the compare operation. For detailed information about each of the options available in the dialog, see section Compare Options below.

Compare Options

OptionDescription

Locations

Select the source and target location(s) in which the tables will be compared. In HVR, the location selected in the SOURCE field is also called a 'read' location, while the location in the TARGET field is called a 'write' location.

The 'read' and 'write' concepts are used to define the behavior of the compare operation, in particular the location where the actual data comparison is performed. For example, if you are comparing data between different types of DBMSes, an ambiguity may occur due to certain data type coercions. The HVR's coercion feature maps an empty string from certain DBMSes into a null value in the Oracle varchar data type. For example, if an Ingres location (ing) contains an empty string mapped to a null in an Oracle location (ora), then should HVR report that these tables are the same or different? The Compare Data option allows both behaviors by applying the sensitivity of the 'write' location rather than the 'read' location. The 'read' location is passive: the data is piped from the ‘read' location to the ‘write’ location, and the work of comparing the data is performed in the ‘write’ location. This means that comparing location ing as a source and location ora as a target will report that the tables are identical, but comparing ora as a source and ing as a target will say the tables are different.

Tables

Select the specific tables to be compared.

Table Checksums Only

Enables the bulk compare mode. This comparison mode computes a checksum for every row based on all data values, with a final checksum computed across all the checksums for every row. The compare result is determined by whether or not the checksum from the source location matches the checksum from the target location. HVR will report a difference between the source and target tables if the checksum is different, irrespective of whether the ­reported row count is identical. The bulk mode runs on a database server and only passes the checksum to the hub so it is very efficient in terms of network traffic. However, the bulk mode provides limited detail on the difference. Only the row count is reported, and whether or not the checksum values came out identical.

If this option is not enabled, the defaultcompare mode is row by row. In this mode, HVR extracts the data from a source (read) location, compresses it and transfers the data to a target (write) location(s) to perform a row by row comparison. Each individual row is compared to produce a 'diff' result.

Keep Difference Files

Verbose. This option creates binary diff files containing individual differences detected.

Section Analyzing Diff File explains how to view and interpret the contents of a diff file.

This option is only displayed for the row by row compare mode (not available when option Table Checksums Only is selected).

Online Compare

Performs live compare between locations where data is rapidly changing. 

This option is only displayed for the row by row compare mode (not available when option Table Checksums Only is selected).

The online compare has the following modes:

  • Combine Initial Differences with Changes Captured while Compare is Running

    Compare tables once and combine differences with captured transactions. Performs a regular compare which produces a result (also known as diff). This result (diff) is compared with the transaction files (which are continuously created by the Capture job) to identify and remove the pending transaction from the result (diff) for producing the final compare result. This compare mode is supported for comparing databases and files.

    If this online compare job is suspended, HVR will continue to accumulate tx files with each capture cycle, which will lead to higher disk space usage. For the list of pending events associated with the compare job, use hvreventview. To cancel the job, use hvreventtool. Scheduling a new event driven compare with the same task name will also cancel the previous event automatically.

  • Do Compare Twice and Report Only Differences which Occur Twice
    Delay Between First and Second Compares
    • Wait
      Compare tables twice with a delay in between. Capture and Integrate jobs are not required for performing this mode of online compare. This online compare mode is similar to Flush Capture and Integrate Jobs, however, with one difference: HVR waits for a specified amount of time, instead of waiting for the completion of the Capture and Integrate cycle in between compares. HVR performs a regular compare which produces a result (referred to as diff). HVR then waits for the specified time, after which it again performs the regular compare and produces a second result (diff). The compare results generated in the first and second compares are combined to produce a final compare result.
    • Flush Capture and Integrate Jobs

      Compare tables twice with a Capture and Integrate flush in between. Performs a regular compare which produces a result (also known as diff). HVR then waits for the completion of the full Capture and Integrate cycle after which it again performs the regular compare which produces a second result (diff). The compare results generated in the first and second compare operations are combined to produce a final compare result. This compare mode is only supported for comparing databases.

  • Compare Oracle Flashback Moment of Source with Time-Restricted View of Target

    Select data from each table of source from same consistent moment in time. Options can be one of the following:

    • Now: If this option is selected, then a new "SCN time" will be retrieved each time the compare job is run. So if Now is run on Monday, and the compare job it creates starts running at 10:00 Tuesday and runs again 10:00 on Wednesday, then the first compare will do a flashback query (for all tables) with an SCN corresponding to Tuesday at 10:00 and the second job run will use flashback query with an SCN corresponding to Wednesday at 10:00.

    • Specific Time: Flashback query with select … as of timestamp.
    • hvr_tx_seq: Value from HVR column hvr_tx_seq is converted back to an Oracle SCN number (by dividing by 65536) and used for flashback query with select … as of scn. Value is either in decimal or in hex (when it starts with 0x or contains hex digits).
    • Oracle SCN: Flashback query with select … as of scn. Value is an Oracle SCN number, either in decimal or in hex (when it starts with 0x or contains hex digits).

Parallel Sessions

Parallelism for sessions. Perform compare for different tables in parallel using sub-processes. The job will start processing tables in parallel; when the first of these is finished the next table will be processed, and so on.

File Prereaders per Table

File prereaders per table. Define the number of prereader subtasks per table while performing direct file compare. For more information, see section Direct File Compare.

This option is only displayed if the source or target location is a file location.

Contexts

This controls whether actions defined with parameter Context are effective or are ignored.

Defining an action with Context can have different uses. For example, if action Restrict with parameters CompareCondition="{id}>22" and Context=qqq is defined, then normally all data will be compared, but if context qqq is enabled (-Cqqq), then only rows where id>22 will be compared. Variables can also be used in the restrict condition, such as "{id}>{hvr_var_min}". This means that hvrcompare -Cqqq -Vmin=99 will compare only rows with id>99. To supply variables for restrict condition use option -V.

Action ColumnProperties with parameter Context can also be defined on . This can be used to define parameter CaptureExpression parameters which are only activated if a certain context is supplied. For example, to define a context for case-sensitive compares.

For more information, see the concept page - Refresh and Compare Contexts.

This option is only displayed if you have an action (CollisionDetect, ColumnProperties, Environment, FileFormat, Integrate, Restrict, TableProperties, Transform) defined with parameter Context.

Variables

Supply value for compare restrict condition or add a new variable.

Slicing

Slicing section allows you to configure the slicing options. For more information, see the Slicing concept page.

Table Slicing dialog can be invoked by clicking the Add Table button. In this dialog, you can choose the slicing separately for each table in your source.

Choose Table

Table for which the slicing is to be configured.

Choose Type

Types of slicing that can be applied to the table:

  • Modulo
  • Boundary
  • Count
  • Series

Choose Column


Table will be sliced by the chosen column's data.

Slicing on a non-key column can cause errors if an update moves a row between slices.

The Distinct Values field is shown if dbms_stats gathering is enabled for the database. The Distinct Values column is only available for Oracle.

This option is available when Modulo or Boundary slicing type is chosen.

Number of Slices

Number of slices the table will be divided into.

Boundaries

Set the boundaries for each slice. To set them, you have to know the data in your table.

This option is available when Boundary slicing type is chosen.


Data Type

Choose the data type for slicing. To set it, you have to know the data in your table.

This option is available when Boundary slicing type is chosen.


Series Values

Set values for each slice. To set them, you have to know the data in your table.

This option is available when Series slicing type is chosen.

Slice Selection

Choose which slices you want to perform the job on.

For example, you have chosen Boundary slicing and have set the boundaries to 1000, 3500, and 6000. In this case, you will have 5 slices: with values 0 to 999, 1000 to 3499, 3500 to 5999, and 6000 to the end of the table.

You can choose only second and third slices so that the Refresh job is performed on rows with values 1000 to 5999.

Slicing Suggestions dialog can be invoked by:

  • clicking the Suggest Slicing button in the Slicing section;
  • or by clicking the Suggest button in the Table Slicing dialog.

In the Slicing Suggestions dialog, you can configure the slicing settings for the current job.


Slicing Suggestions

Repeat a previous slicing job or base it upon certain data:

  • Repeat slicing from last refresh. Repeats slicing from the last Refresh event.
  • Repeat slicing from last compare. Repeats slicing from the last Compare event.
  • Based on row-count from last refresh. Inspects previous Refresh events and suggests the new slicing based on its result.
  • Based on row-count from last compare. Inspects previous Compare events and suggests the new slicing based on its result.
  • Based on row-counts from DB statistics. Gets the row count from the Oracle package.

Tuning Preferences

Rows per slice – sets the number of rows per slice. Set to 10 million by default.

Max slices per table – sets the maximum number of suggested slices per table. By default, the number is set to 5.

Compare Task Name

Specify an alternative name for a compare task to be used for naming scripts and jobs. The task name must start with a 'c'.

When this option is not defined, the default task name is cmp, so the compare jobs are named chn-cmp-l1-l2.

Scheduling Options

Schedule the time to run the compare job. Available options are:

  • Start Immediately: Invoke the compare job immediately after clicking the Compare Data button.
  • Schedule Once at: Schedule the compare job to run at a specific time once.
  • Schedule Repeatedly at: Schedule the compare job to run at specific times repeatedly.
  • Delay Running Compare Job: Schedule invocation of the compare job by leaving it in the SUSPEND state.

Show Equivalent HVR Command Line

The equivalent command in the command line interface (CLI) to those options configured/selected in the dialog.

In cases when the command line equivalents are different for Linux/Unix and Windows, both options are shown.

Viewing Compare Results

Clicking the Compare Data button in the Compare Data dialog will start the compare job. You will see the following notification at the top of the page.

Once the compare job has started, the following notification will appear at the top of the page. Click the View Compare event link to open the Event Details page displaying detailed information about the compare event.

On the Channel Details page, the compare job state is displayed on the Jobs pane. To open the Event Details, click the More Options icon  related to the compare job and select Go To Event

The following is an example of the Event Details page showing a compare event.

  • The top pane shows information about the channel and location(s) related to the compare event, the event state, the job name associated with the event, the time the event was started and the event duration.
  • The middle pane shows additional details related to the configuration parameters set in the Compare Data dialog, such as granularity, the number of parallel sessions, etc.
  • The Results pane shows the compare statistics for each table involved in the compare event, such as the number of rows in source and target that were compared, rows that differ, and others. The View file link opens the View Diff File dialog that allows to inspect a diff file containing the list of differences detected. Section Analyzing Diff File explains how to view and interpret the contents of the diff file. For details on each parameter in the Results pane, see section Refresh and Event-Driven Compare Results.