Hvrcompare

From HVR
Jump to: navigation, search

Name

hvrcompare – Compare data in tables.

Synopsis

hvrcompare [–options] hubdb chn

Description

Command hvrcompare compares the data in data in different locations of channel chn. The locations must be databases, not file locations.

The first argument hubdb specifies the connection to the hub database; this can be an Oracle, Ingres, SQL Server, DB2, DB2 for I, PostgreSQL or Teradata database depending on its form. See further section Calling HVR on the Command Line.

Table compare can be performed row–by–row or as a bulk operation, depending on which –g option is supplied. Bulk compare involves calculating the checksum for each object and reporting whether the replicated tables are identical.

Row–wise compare causes data to be piped from one location to the other location where each individual row is compared. This results in a list of a minimal number of inserts, updates or deletes needed to resynchronize the tables.

SC-Hvr-Compare.png

Options

Parameter Description
–Cctx Enable context ctx. This controls whether actions defined with parameter /Context are effective or are ignored. Defining an action with /Context can have different uses. For example, if action Restrict /CompareCondition="{id}>22" /Context=qqq is defined, then normally all data will be compared, but if context qqq is enabled (–Cqqq), then only rows where id>22 will be compared. Variables can also be used in the restrict condition, such as "{id}>{hvr_var_min}". This means that hvrcompare –Cqqq –Vmin=99 will compare only rows with id>99.

Parameter /Context can also be defined on action ColumnProperties. This can be used to define /CaptureExpression parameters which are only activated if a certain context is supplied. For example, to define a context for case–sensitive compares.
SC-Hvr-Compare Context.png

–d Remove (drop) scripts and scheduler jobs & job groups generated by previous hvrcompare command.
–gx Granularity of compare operation in database locations. Valid values of x are:
b Bulk compare using checksums. This is the default.
r Row–wise compare of tables.
–hclass Specify hub database. Valid values are oracle, ingres, sqlserver, db2, db2i, postgresql, and teradata. See also section Calling HVR on the Command Line.
–jnum_jobs
  Since    v5.3.1/6  
Sets quota_run compare job group attribute. It defines a number of jobs which can be run simultaneously. The option cannot be used without scheduling turned on (-s)
–lx Target location of compare. The other (read location) is specified with option –r. If this option is not supplied then all locations except the read location are targets. Values of x maybe one of the following:
loc Only location loc.
l1l2 All locations that fall alphabetically between l1 and l2 inclusive.
!loc All locations except loc.
!l1l2 All locations except for those that fall alphabetically between l1 and l2 inclusive.

Several –lx instructions can be supplied together.

–mmask Mask (ignore) some differences between the tables that are being compared. Valid values of mask can be:
d Mask out delete differences.
u Mask out update differences.
i Mask out insert differences.

Letters can be combined, for example –mid means mask out inserts and deletes. If a difference is masked out, then the verbose option (–v) will not generate SQL for it. The –m option can only be used with row–wise granularity (option –gr).

–Mmoment Select data from each table of source from same consistent moment in time. Value moment can be one of the following:
time Flashback query with select … as of timestamp. Valid formats are YYYY–MM–DD [HH:MM:SS] (in local time) or YYYY–MM–DDTHH:MM:SS+TZD or YYYY–MM–DDTHH:MM:SSZ or today or now[[+|]SECS] or an integer (seconds since 1970–01–01 00:00:00 UTC). Note that if a symbolic time like -Mnow is supplied then a new "SCN time" will be retrieved each time the compare job is run (not only when the hvrcompare command is called. So if hvrcompare -Mnow is run on Monday, and the compare job it creates starts running at 10:00 Tuesday and runs again 10:00 on Wednesday, then the first compare will do a flashback query (for all tables) with an SCN corresponding to Tuesday at 10:00 and the second job run will use flashback query with an SCN corresponding to Wednesday at 10:00.
scn=val Flashback query with select … as of scn. Value is an Oracle SCN number, either in decimal or in hex (when it starts with 0x or contains hex digits).
hvr_tx_seq=val Value from HVR column hvr_tx_seq is converted back to an Oracle SCN number (by dividing by 65536) and used for flashback query with select … as of scn. Value is either in decimal or in hex (when it starts with 0x or contains hex digits).

This parameter only affects the selects of the leftmost (source) database, not any selects on the rightmost (target) database.

–n numtabs
  Since    v5.3.1/6  
Create ‘sub-jobs’ which each compare a bundle of no more than numtabs tables.

For example, if a channel contains 6 tables then option -n1 will create 6 jobs whereas were option -n4 to be used on the same channel then only 2 jobs will be created (the first with 4 tables, the last with just 2). If tables are excluded (using option -t) then these will not count for the bundling.

Jobs are named by adding a number (starting at 0) to the task name which defaults cmp (although the task name can always be overridden using option -T).  Normally the first slice’s job is named chn-cmp0-x-y but numbers are left-padded with zeros, so if 10 slices are needed the first is named chn-cmp00-x-y instead.

One technique is to generate lots of jobs for compare of big channel (using this option and option -s) and add ‘scheduler attribute’ quota_run to the job group (named CHN-CMP) so that only a few (say 3) can run simultaneously. Scheduler attributes can be added by right-clicking on the job group and selecting Add Attribute.

Another technique to manage the compare of a channel with thousands of tables is use this option along with options -R (ranges) and -T (task name) to do ‘power of ten’ naming and bundling, in case a single table encounters a problem. The following illustrates this technique;

First use [-n100] so each job tries to compare 100 tables.

If one of these jobs fails (say job chn-cmp03-x-y) then use options [-n10 -R30-39 -Tcmp03] to replace it with 10 jobs which each do 10 tables.

Finally if one of those jobs fail (say chn-cmp037-x-y) then then use options [-n1 -R370-379 -Tcmp037] to replace it with 10 ‘single table’ jobs.

–O
  Since    v5.3.1/6  
Only show OS command implied by options -n (jobs for bundles of tables) or -S (table slices), instead of executing them. This can be used to generate a shell script of ‘simpler’ hvrcompare commands;

For example if channel only contains tables tab1, tab2, tab3 and tab4 then this command;

hvrcompare -rcen -O -n2 myhub/pwd mychn

will only generate this output;

hvrcompare -rcen -Tcmp0 -ttab1 -ttab2 myhub/pwd mychn
hvrcompare -rcen -Tcmp1 -ttab3 -ttab4 myhub/pwd mychn
–pN Perform compare on different locations in parallel using N sub–processes. This cannot be used with option –s.
–PM Perform compare for different tables in parallel using M sub–processes. The compare will start processing M tables in parallel; when the first of these is finished the next table will be processed, and so on.
–Q No compare of database sequences matched by action DbSequence. If this option is not specified, then the database sequence in the source database will be compared with matching sequences in the target database. Sequences that only exist in the target database are ignored.
–rloc Read location. This means that location loc is passive; the data is piped from here to the other location(s) and the work of comparing the data is performed there instead.
–R rangeexpr
  Since    v5.3.1/6  
Only perform certain ‘sub jobs’ implied by either options -N (job for bundles of tables) or -S (table slices). This option cannot be used without one of those options.

Value rangeexpr should be a comma-separated list of one of the following:

N Only perform ‘sub job’ number N. Note that these jobs are numbered starting from zero (e.g the first is chn-cmp0-rloc-wloc).
N-M Perform from jobs from N to M inclusive.
N- Perform from jobs from N onwards.
-M Perform from jobs from the first job until job M.

For example, if a channel contains 20 tables then option -n1 would cause 20 jobs to be created (with names chn-cmp00-x-y,  chn-cmp01-x-y, chn-cmp02-x-y… chn-cmp19-x-y) but options -n1 -R0,10 would restrict job creation to only 11 jobs (named chn-cmp00-x-y, then chn-cmp10-x-y,  chn-cmp11-x-y … chn-cmp19-x-y).

–s Schedule invocation of compare scripts using the HVR Scheduler. Without this option the default behavior is to perform the compare immediately. The jobs created by this option are named chncmpl1l2. These jobs are initially suspended. These jobs can be invoked using command hvrstart as in the following example:
$ hvrstart –u –w hubdb chn–cmp

The previous command unsuspends the jobs and instructs the scheduler to run them. Output from the jobs is copied to the hvrstart command's stdout and the command finishes when all jobs have finished. Jobs created are cyclic which means that after they have run they go back to PENDING state again. They are not generated by a trig_delay attribute which means that once they complete they will stay in PENDING state without getting retriggered.

Once a compare job has been created with option –s then it can also be run on the command line (without HVR Scheduler) as follows:

$ hvrstart –i hubdb chn–cmp–loc1–loc2

SC-Hvr-Compare Schedule.png

–S sliceexpr
  Since    v5.3.1/6  
Process large table using slices. These slices are controlled using value sliceexpr which affects action Restrict /CompareCondition. That action must be defined on the table (at least on the read location) and must contain a relevant {hvr_var_slice_*} substitution. In HVRGUI, the options for slicing is available under the Scheduling tab.

Value sliceexpr must have one of the following forms:

col%num Slicing using modulo of numbers. This slicing affects the substitution {hvr_var_slice_condition} which must be mentioned in Restrict /CompareCondition for this table.

If -Sabc%3 is supplied then the conditions for the three slices are:

 mod(round(abs(coalesce(abc, 0)), 0), 3)= 0
 mod(round(abs(coalesce(abc, 0)), 0), 3)= 1
 mod(round(abs(coalesce(abc, 0)), 0), 3)= 2

Note that the use of extra SQL functions (e.g. round(), abs() and coalesce()) ensure that slicing affect fractions, negative numbers and NULL too. Modulo slicing can only be used on a column with a numeric data type.

col<b1[<b2]… [<bN] Slicing using boundaries. If N boundaries are defined then N+1 slices are implied. This slicing affects the substitution {hvr_var_slice_condition} which must be mentioned in Restrict /CompareCondition for this table.

If -Sabc<10<20<30 is supplied then the conditions for the four slices are:

abc <= 10
abc > 10 and abc <= 20
abc > 20 and abc <= 30
abc > 30 or abc is null

Note that strings can be supplied by adding quotes around boundaries, i.e. -Sabc<'x'<'y'<'z'.

For very large tables consider the DBMS query execution plan. If the DBMS decides to 'walk' an index (with a lookup for each matched row) but this is not optimal (i.e. a 'serial-scan' of the table would be faster) then either use DBMS techniques ($HVR_SQL_SELECT_HINT allows Oracle optimizer hints) or consider modulo slicing (col%num) instead.

For this type of slicing, HVR can suggest boundaries by using the Oracle's dbms_stats package. Click the browse ("...") button for "Boundaries" type of slicing and then click "Suggest Values" in "Boundaries for Table" dialog. Number of slices can be also specified.

Gathering column histogram statistics is required for this functionality to work. This can be done by calling the dbms_stats.gather_table_stats stored procedure

Examples:

  1. Gathers statistics including column histograms, for table 'table_name', using all table rows, for all columns, and maximum of 254 histogram buckets (therefore up to 254 slice boundaries can be suggested).
  2. exec dbms_stats.gather_table_stats('schema_name', 'table_name', estimate_percent=>100, method_opt=>'for all columns size 254);
    
  3. Gathers statistics including column histograms, for table 'table_name', using all table rows, for all indexed columns, and default number of histogram buckets.
  4. exec dbms_stats.gather_table_stats('schema_name', 'table_name', estimate_percent=>100, method_opt=>'for all indexed columns);
    
  5. Gathers statistics including column histograms, for table 'table_name', using 70% of table rows, for column 'table_column', and maximum of 150 histogram buckets (therefore up to 150 slice boundaries can be suggested).
  6. exec dbms_stats.gather_table_stats('schema_name', 'table_name', estimate_percent=>70, method_opt=>'for columns table_column size 150); 
    
  7. Gathers statistics including column histograms, for table 'table_name', for all columns, and maximum 254 histogram buckets. This is an obsolete way to generate statistics and there are much less options supported.
  8. analyze table table_name compute statistics for all columns size 254;
    
num Numbered slices. The number of each slice is assigned to substitution {hvr_var_slice_num} which must be mentioned in Restrict /CompareCondition for this table. Substitution {hvr_var_slice_total} is also assigned to the total number of slices.

Example:

    In heterogeneous environments doing normal modulo slicing is not always possible because of different syntax of modulo operation. For example, in Oracle modulo operation is mod(x,y) and in Teradata it is x mod y. Also negative numbers are handled differently on these two databases. For this scenario, two Restrict actions can be defined, one for the capture location (Oracle) and other for the integrate location (Teradata):
    Location Action
    ora Restrict /CompareCondition="abs(mod({hvr_var_slice_col}, {hvr_var_slice_total}) = {hvr_var_slice_num})" /Context=slice
    terad Restrict /CompareCondition="abs({hvr_var_slice_col} mod {hvr_var_slice_total} = {hvr_var_slice_num})" /Context=slice

    If options -S3 -Vslice_col=abc are supplied then the conditions for the three slices are:

    Capture Location Integrate Location
    abs(mod(abc, 3))= 0
    abs(mod(abc, 3))= 1
    abs(mod(abc, 3))= 2
    abs(abc mod 3)= 0
    abs(abc mod 3)= 1
    abs(abc mod 3)= 2
val1[;val2]… Slicing using a list of values. Values are separated by semicolons. Each slice has its value assigned directly into substitution {hvr_var_slice_value} which must be mentioned in Restrict /CompareCondition for this table.

Example; A large table with column country which splits the rows into thee partitions (US, UK and NL) can be sliced using action Restrict /CompareCondition="country = '{hvr_var_slice_value}'" /Context=slice

If option -SUS;UK;NL is supplied then the conditions for the three slices are:

country = 'US'
country = 'UK'
country = 'NL'

Slicing can only be used for a single table (defined with option -t).

As with option -n (bundles of tables), jobs are named by adding a number (starting at 0) to the task name which defaults refr (although this task name can always be overridden using option -T). Normally the first slice’s job is named chn-refr0-x-y but numbers are left-padded with zeros, so if 10 slices are needed the first is named chn-refr00-x-y instead.

Note that if an on-line refresh is done (option -q) and no ‘select moment’ is specified (option -M) then only value no (resilience) is allowed, not rw (skip during capture and integrate) or wo (skip during integrate only).

The column used to slice a table must be 'stable', because if it is updated then a row could 'move' from one slice to another while the refresh is running. The row could be refreshed in two slices (which will cause errors) or no slices (data-loss). If the source database is Oracle then this problem can be avoided using a common 'select moment' (option -M).

It is recommend that any Restrict /CompareCondition defined for slicing is also given a /Context parameter so it can be easily disabled or enabled.

–ty Only compare tables specified by y. Values of y may be one of the following:
tbl Only compare table name tbl.
t1t2 Compare all table codes that fall alphabetically between t1 and t2 inclusive.
!tbl Compare all table names except tbl.
!t1t2 Compare all table codes except for those that fall alphabetically between t1 and t2 inclusive.

Several –ty instructions can be supplied together.

–Ttsk Specify alternative task for naming scripts and jobs. The default task name is cmp, so for example without this –T option jobs are named chncmpl1l2.
–uuser [/pwd] Connect to hub database using DBMS account user. For some databases (e.g. SQL Server) a password must also be supplied.
–v Verbose. This causes row–wise compare to display each difference detected. Differences are presented as SQL statements. This option requires that option –gr (row–wise granularity) is supplied.
–Vnm=val Supply variable for restrict condition. This should be supplied if a Restrict /CompareCondition parameter contains string {hvr_var_nm}. This string is replaced with val.
Note: If hvrcompare is connecting between different DBMS types, then an ambiguity can occur because of certain datatype coercions. For example, HVR's coercion maps an empty string from other DBMS's into a null in an Oracle varchar. If Ingres location ing contains an empty string and Oracle location ora contains a null, then should HVR report that these tables are the same or different? Command hvrcompare allows both behaviors by applying the sensitivity of the 'write' location, not the 'read' location specified by –r. This means that comparing from location ing to location ora will report the tables as identical, but comparing from ora to ing will say the tables are different.

Example

For bulk compare of table order in location cen and location decen:

$ hvrcompare –rcen –ldecen –torder hubdb/pwd sales

Files

Folder-icon.png HVR_CONFIG
└─ Folder-icon.png job Directory containing all scripts generated by hvrcompare.
└─ Folder-icon.png hubdb
└─ Folder-icon.png chn
└─ chn–cmp–loc1–loc2 Script to compare loc1 with loc2.

See Also

Commands hvrcrypt, hvrgui and hvrproxy.