Hvrrefresh

From HVR
Jump to: navigation, search
Commands
Previous: hvrproxy
Next: hvrremotelistener
Command Reference

Name

hvrrefresh – Refresh the contents of tables in the channel.

Synopsis

hvrrefresh [–options] hubdb chn

Description

Command hvrrefresh performs the refresh of tables in a channel chn from a source location to target location(s). The source must be a database location, but the targets can be databases or file locations.

The first argument hubdb specifies the connection to the hub database; this can be an Oracle, Ingres, SQL Server, DB2, DB2 for I, PostgreSQL or Teradata database depending on its form. See further section Calling HVR on the Command Line.

For database targets, the refresh can be performed row–by–row or as a bulk operation, depending on which –g option is supplied. Bulk refresh means that the target object is truncated and then bulk copy is used to refresh the data from the read location. During bulk refresh table indexes and constraints will be temporarily dropped or disabled and will be reset after the refresh is complete. Row–wise refresh causes data to be piped from the read location to the write location whereby each individual row is compared. This results in a list of a minimal number of inserts, updates or deletes needed to resynchronize the tables. These SQL statements are then applied to the target database to affect the refresh.

An HVR channel can be defined purely for HVR Refresh, instead of being used for replication (capture and integrate jobs). In this case the channel must still be defined with actions Capture and Integrate, even though HVR Initialize will never be called.

SC-Hvr-Refresh.png

Options

Parameter Description
–cS Instruct hvrrefresh to create new tables. Only 'basic' tables are created, based on the information in the channel. A basic table just has the correct column names and datatypes without any extra indexes, constraints, triggers or tables spaces. Value S can be one of the following:
b Create basic (absent) tables only (mandatory).
e Keep existing structure.
f Force recreation (always recreate) of tables.
k Create index (unique key or non–unique index). If the original table does not have a unique key, then a non–unique index is created instead of a unique key.
o Only create tables, do not refresh any data into them.
p Preserve (keep) existing data on recreate.
r Recreate (drop and create) if the column names do not match.
u Use Unicode datatypes such as nvarchar instead of varchar.

Several –cS instructions can be supplied together, e.g. –cbkr which causes hvrrefresh to create new tables in the target database if they do not already exist and re–create if they exist but have the wrong column information. Parameter DbObjectGeneration /RefreshCreateTableClause can be used to add extra SQL to the Create Table statement which HVR will generate.

–Cctx Enable context ctx. This controls whether actions defined with parameter /Context are effective or are ignored. Defining an action with /Context can have different uses. For example, if action Restrict /RefreshCondition="{id}>22" /Context=qqq is defined, then normally all data will be refreshed, but if context qqq is enabled (–Cqqq), then only rows where id>22 will be refreshed. Variables can also be used in the restrict condition, such as "{id}>{hvr_var_min}". This means that hvrrefresh –Cqqq –Vmin=99 will only refresh rows with id>99.

Parameter /Context can also be defined on action ColumnProperties. This can be used to define /CaptureExpression parameters which are only activated if a certain context is supplied. For example, to define a bulk refresh context where SQL expressions are performed on the source database (which would slow down capture) instead of the target database (which would slow down bulk refresh).
SC-Hvr-Refresh Context.png

–d Remove (drop) scripts and scheduler jobs & job groups generated by previous hvrrefresh command.
–f Fire database triggers/rules while applying SQL changes for refresh.

Normally for Oracle and SQL Server, HVR disables any triggers on the target tables before the refresh and re-enables them afterwards. On Ingres, the refresh avoids firing databases rules using statement set no rules. This option prevents this, so if refresh does an insert statement then it could fire a trigger. But note that HVR's refresh often uses a bulk-load method to load data, in which case database triggers will not be fired anyway. Other ways to control trigger firing are described in Database Session Names and Recapturing. For integration jobs into Ingres and SQL Server, action Integrate /NoTriggerFiring can also be used.

–Fk Behavior for foreign key constraint in the target database which either reference or are referenced by a table which should be refreshed. Value for k is one or more of these letters:
i Ignore foreign key constraints. Normally this would cause foreign key constraint errors. This cannot be combined with other letters.
x Disable all such constraints before refresh and re-enable them at the end. If the DBMS does not support disable/re-enable syntax (e.g. Ingres) then constraints are instead dropped before refresh and recreated at the end. Note that for on-line refresh (option -q) without a select moment supplied (option -M) the actual re-enabling of disabled foreign key constraints is not done by the refresh itself but is instead delayed until the end of next cycle of integration.
d For a target with 'deferred' or 'deferrable' foreign key constraints, perform entire refresh in a single (very large) transaction which is committed right at the end of refresh. This can use a lot of DBMS resources. It is also slower because HVR uses SQL delete and insert statements, because both truncate and 'direct path load' both imply a commit. This is only supported for Oracle (because other DBMSs do not support deferring of foreign key constraints). If this letter is combined with letter x (disable) then only non-deferrable constraints are disabled and then re-enabled. Deferred constraint handling cannot be used with table parallelism (option -P)

If this option (-F) is not supplied then all foreign-key constraints will be disabled before refresh and re-enabled afterwards (letter x), unless HVR has no capability for foreign keys at all (letter i).

–gx Granularity of refresh in database locations. Valid values of x are:
b Bulk refresh using bulk data load. This is the default.
r Row–wise refresh of tables.
–hclass Specify hub database. Valid values are oracle, ingres, sqlserver, db2, db2i, postgresql, and teradata. See also section Calling HVR on the Command Line.
–jnum_jobs
  Since    v5.3.1/25  
Sets quota_run refresh job group attribute. It defines a number of jobs which can be run simultaneously. The option cannot be used without scheduling turned on (-s)
–lx Target location of refresh. The other (read location) is specified with option –r. If this option is not supplied then all locations except the read location are targets. Values of x maybe one of the following:
loc Only location loc.
l1l2 All locations that fall alphabetically between l1 and l2 inclusive.
!loc All locations except loc.
!l1l2 All locations except for those that fall alphabetically between l1 and l2 inclusive.

Several –lx instructions can be supplied together.

–mmask Mask (ignore) some differences between the tables that are being compared. Valid values of mask can be:
d Mask out delete differences.
u Mask out update differences.
i Mask out insert differences.

Letters can be combined, for example –mid means mask out inserts and deletes. If a difference is masked out, then the verbose option (–v) will not generate SQL for it and hvrrefresh will not rectify it. The –m option can only be used with row–wise granularity (option –gr).

–Mmoment Select data from each table from same consistent moment in time. Value moment can be one of the following:
time Flashback query with select … as of timestamp. Valid formats are YYYY–MM–DD [HH:MM:SS] (in local time) or YYYY–MM–DDTHH:MM:SS+TZD or YYYY–MM–DDTHH:MM:SSZ or today or now[[+|]SECS] or an integer (seconds since 1970–01–01 00:00:00 UTC). Note that if a symbolic time like -Mnow is supplied then a new "SCN time" will be retrieved each time the refresh job is run (not only when the hvrrefresh command is called. So if hvrrefresh -Mnow is run on Monday, and the refresh job it creates starts running at 10:00 Tuesday and runs again 10:00 on Wednesday, then the first refresh will do a flashback query (for all tables) with an SCN corresponding to Tuesday at 10:00 and the second job run will use flashback query with an SCN corresponding to Wednesday at 10:00.
scn=val Flashback query with select … as of scn. Value is an Oracle SCN number, either in decimal or in hex (when it starts with 0x or contains hex digits).
hvr_tx_seq=val Value from HVR column hvr_tx_seq is converted back to an Oracle SCN number (by dividing by 65536) and used for flashback query with select … as of scn. Value is either in decimal or in hex (when it starts with 0x or contains hex digits).
serializable Select all data from source database using a single transaction (therefore a single session) which has SQL isolation level serializable. This cannot be used with table parallelism (option -P).
snapshot Select all data from source database using a single transaction (therefore a single session) which has SQL isolation level snapshot (SQL Server only). Using snapshot isolation level requires enabling ALLOW_SNAPSHOT_ISOLATION database option in SQL Server. This cannot be used with table parallelism (option -P).

This parameter only affects the selects of the leftmost (source) database, not any selects on the rightmost (target) database.

–n numtabs
  Since    v5.3.1/6  
Create ‘sub-jobs’ which each refresh a bundle of no more than numtabs tables.

For example, if a channel contains 6 tables then option -n1 will create 6 jobs whereas were option -n4 to be used on the same channel then only 2 jobs will be created (the first with 4 tables, the last with just 2). If tables are excluded (using option -t) then these will not count for the bundling.

Jobs are named by adding a number (starting at 0) to the task name which defaults refr (although the task name can always be overridden using option -T).  Normally the first slice’s job is named chn-refr0-x-y but numbers are left-padded with zeros, so if 10 slices are needed the first is named chn-refr00-x-y instead.

One technique is to generate lots of jobs for refresh of big channel (using this option and option -s) and add ‘scheduler attribute’ quota_run to the job group (named CHN-REFR) so that only a few (say 3) can run simultaneously. Scheduler attributes can be added by right-clicking on the job group and selecting Add Attribute.

Another technique to manage the refresh a channel with thousands of tables is use this option along with options -R (ranges) and -T (task name) to do ‘power of ten’ naming and bundling, in case a single table encounters a problem. The following illustrates this technique;

First use [-n100] so each job tries to refresh 100 tables.

If one of these jobs fails (say job chn-refr03-x-y) then use options [-n10 -R30-39 -Trefr03] to replace it with 10 jobs which each do 10 tables.

Finally if one of those jobs fail (say chn-refr037-x-y) then then use options [-n1 -R370-379 -Trefr037] to replace it with 10 ‘single table’ jobs.

–O
  Since    v5.3.1/6  
Only show OS command implied by options -n (jobs for bundles of tables) or -S (table slices), instead of executing them. This can be used to generate a shell script of ‘simpler’ hvrrefresh commands;

For example if channel only contains tables tab1, tab2, tab3 and tab4 then this command;

hvrrefresh -rcen -O -n2 myhub/pwd mychn

will only generate this output;

hvrrefresh -rcen -Trefr0 -ttab1 -ttab2 myhub/pwd mychn
hvrrefresh -rcen -Trefr1 -ttab3 -ttab4 myhub/pwd mychn
–pN Perform refresh on different locations in parallel using N sub–processes. This cannot be used with option –s.
–PM Perform refresh for different tables in parallel using M sub–processes. The refresh will start by processing M tables in parallel; when the first of these is finished the next table will be processed, and so on.
–qd Online refresh of data from a database that is continuously being changed. This requires that capture is enabled on the source database. The integration jobs are automatically suspended while the online refresh is running, and restarted afterwards. The target database is not yet consistent after the online refresh has finished. Instead, it leaves instructions so that when the replication jobs are restarted, they skip all changes that occurred before the refresh and perform special handling for changes that occurred during the refresh. This means that after the next replication cycle consistency is restored in the target database. If the target database had foreign key constraints, then these will also be restored.

Valid values for d are:

wo Write only. Changes before the online refresh should only be skipped on the write side (by the integrate job), not on the read side (by the capture job). If changes are being replicated from the read location to multiple targets, then this value will avoid skipping changes that are still needed by the other targets.
rw Read/Write. Changes before the online refresh should be skipped both on the read side (by the capture job) and on the write side (by the integrate job). There are two advantages to skipping changes on the capture side; performance (those changes will not be send over the network) and avoiding some replication errors (i.e. those caused by an alter table statement). The disadvantage of skipping changes on the capture side is that these changes may be needed by other replication targets. If they were needed, then these other integration locations need a new 'online' refresh, but without –qrw, otherwise the original targets will need yet another refresh.
no No skipping. Changes that occurred before the refresh are not skipped, only special handling is activated for changes that occurred during the refresh. This is useful for online refresh of a context–sensitive restriction of data (–Cctx and Restrict /RefreshCondition /Context).

Internally online refresh uses 'control files' to send instructions to the other replication jobs (see command hvrcontrol). These files can be viewed using command hvrrouterview and option –s.
Online refresh (with option –q) can give errors if duplicate rows (/DuplicateRows) are actually changed during the online refresh.

–Q No refresh of database sequences matched by action DbSequence. If this option is not specified, then the database sequence in the source database will be compared with matching sequences in the target database. Sequences that only exist in the target database are ignored.
–rloc Read location. This means that data will be read from location loc and written to the other location(s).
–R rangeexpr
  Since    v5.3.1/6  
Only perform certain ‘sub jobs’ implied by either options -N (job for bundles of tables) or -S (table slices). This option cannot be used without one of those options.

Value rangeexpr should be a comma-separated list of one of the following:

N Only perform ‘sub job’ number N. Note that these jobs are numbered starting from zero (e.g the first is chn-refr0-rloc-wloc).
N-M Perform from jobs from N to M inclusive.
N- Perform from jobs from N onwards.
-M Perform from jobs from the first job until job M.

For example, if a channel contains 20 tables then option -n1 would cause 20 jobs to be created (with names chn-refr00-x-y,  chn-refr01-x-y, chn-refr02-x-y… chn-refr19-x-y) but options -n1 -R0,10 would restrict job creation to only 11 jobs (named chn-refr00-x-y, then chn-refr10-x-y,  chn-refr11-x-y … chn-refr19-x-y).

–s Schedule invocation of refresh scripts using the HVR Scheduler. Without this option the default behavior is to perform the refresh immediately. The jobs created by this option are named chnrefrl1l2. These jobs are initially created in SUSPEND state. They can be invoked using command hvrstart as in the following example:
$ hvrstart –u –w hubdb chn–refr

The previous command unsuspends the jobs and instructs the scheduler to run them. Output from the jobs is copied to the hvrstart command's stdout and the command finishes when all jobs have finished. Jobs created are cyclic which means that after they have run they go back to PENDING state again. They are not generated by a trig_delay attribute which means that once they complete they will stay in PENDING state without getting retriggered. Once a refresh job has been created with option –s then it can only be run on the command line (without HVR Scheduler) as follows:

$ hvrstart –i hubdb chn–refr–loc1loc2

SC-Hvr-Refresh Schedule.png

–S sliceexpr
  Since    v5.3.1/6  
Process large table using slices. These slices are controlled using value sliceexpr which affects action Restrict /RefreshCondition. That action must be defined on the table (at least on the read location) and must contain a relevant {hvr_var_slice_*} substitution. In HVRGUI, the options for slicing is available under the Scheduling tab.

Value sliceexpr must have one of the following forms:

col%num Slicing using modulo of numbers. This slicing affects the substitution {hvr_var_slice_condition} which must be mentioned in Restrict /RefreshCondition for this table.

If -Sabc%3 is supplied then the conditions for the three slices are:

 mod(round(abs(coalesce(abc, 0)), 0), 3)= 0
 mod(round(abs(coalesce(abc, 0)), 0), 3)= 1
 mod(round(abs(coalesce(abc, 0)), 0), 3)= 2

Note that the use of extra SQL functions (e.g. round(), abs() and coalesce()) ensure that slicing affect fractions, negative numbers and NULL too. Modulo slicing can only be used on a column with a numeric data type.

col<b1[<b2]… [<bN] Slicing using boundaries. If N boundaries are defined then N+1 slices are implied. This slicing affects the substitution {hvr_var_slice_condition} which must be mentioned in Restrict /RefreshCondition for this table.

If -Sabc<10<20<30 is supplied then the conditions for the four slices are:

abc <= 10
abc > 10 and abc <= 20
abc > 20 and abc <= 30
abc > 30 or abc is null

Note that strings can be supplied by adding quotes around boundaries, i.e. -Sabc<'x'<'y'<'z'.

For very large tables consider the DBMS query execution plan. If the DBMS decides to 'walk' an index (with a lookup for each matched row) but this is not optimal (i.e. a 'serial-scan' of the table would be faster) then either use DBMS techniques ($HVR_SQL_SELECT_HINT allows Oracle optimizer hints) or consider modulo slicing (col%num) instead.

For this type of slicing, HVR can suggest boundaries by using the Oracle's dbms_stats package. Click the browse ("...") button for "Boundaries" type of slicing and then click "Suggest Values" in "Boundaries for Table" dialog. Number of slices can be also specified.

Gathering column histogram statistics is required for this functionality to work. This can be done by calling the dbms_stats.gather_table_stats stored procedure

Examples:

  1. Gathers statistics including column histograms, for table 'table_name', using all table rows, for all columns, and maximum of 254 histogram buckets (therefore up to 254 slice boundaries can be suggested).
  2. exec dbms_stats.gather_table_stats('schema_name', 'table_name', estimate_percent=>100, method_opt=>'for all columns size 254);
    
  3. Gathers statistics including column histograms, for table 'table_name', using all table rows, for all indexed columns, and default number of histogram buckets.
  4. exec dbms_stats.gather_table_stats('schema_name', 'table_name', estimate_percent=>100, method_opt=>'for all indexed columns);
    
  5. Gathers statistics including column histograms, for table 'table_name', using 70% of table rows, for column 'table_column', and maximum of 150 histogram buckets (therefore up to 150 slice boundaries can be suggested).
  6. exec dbms_stats.gather_table_stats('schema_name', 'table_name', estimate_percent=>70, method_opt=>'for columns table_column size 150); 
    
  7. Gathers statistics including column histograms, for table 'table_name', for all columns, and maximum 254 histogram buckets. This is an obsolete way to generate statistics and there are much less options supported.
  8. analyze table table_name compute statistics for all columns size 254;
    
num Numbered slices. The number of each slice is assigned to substitution {hvr_var_slice_num} which must be mentioned in Restrict /RefreshCondition for this table. Substitution {hvr_var_slice_total} is also assigned to the total number of slices.

Example:

    In heterogeneous environments doing normal modulo slicing is not always possible because of different syntax of modulo operation. For example, in Oracle modulo operation is mod(x,y) and in Teradata it is x mod y. Also negative numbers are handled differently on these two databases. For this scenario, two Restrict actions can be defined, one for the capture location (Oracle) and other for the integrate location (Teradata):
    Location Action
    ora Restrict /RefreshCondition="abs(mod({hvr_var_slice_col}, {hvr_var_slice_total}) = {hvr_var_slice_num})" /Context=slice
    terad Restrict /RefreshCondition="abs({hvr_var_slice_col} mod {hvr_var_slice_total} = {hvr_var_slice_num})" /Context=slice

    If options -S3 -Vslice_col=abc are supplied then the conditions for the three slices are:

    Capture Location Integrate Location
    abs(mod(abc, 3))= 0
    abs(mod(abc, 3))= 1
    abs(mod(abc, 3))= 2
    abs(abc mod 3)= 0
    abs(abc mod 3)= 1
    abs(abc mod 3)= 2
val1[;val2]… Slicing using a list of values. Values are separated by semicolons. Each slice has its value assigned directly into substitution {hvr_var_slice_value} which must be mentioned in Restrict /RefreshCondition for this table.

Example; A large table with column country which splits the rows into thee partitions (US, UK and NL) can be sliced using action Restrict /RefreshCondition="country = '{hvr_var_slice_value}'" /Context=slice

If option -SUS;UK;NL is supplied then the conditions for the three slices are:

country = 'US'
country = 'UK'
country = 'NL'

Slicing can only be used for a single table (defined with option -t).

As with option -n (bundles of tables), jobs are named by adding a number (starting at 0) to the task name which defaults refr (although this task name can always be overridden using option -T). Normally the first slice’s job is named chn-refr0-x-y but numbers are left-padded with zeros, so if 10 slices are needed the first is named chn-refr00-x-y instead.

Note that if an on-line refresh is done (option -q) and no ‘select moment’ is specified (option -M) then only value no (resilience) is allowed, not rw (skip during capture and integrate) or wo (skip during integrate only).

The column used to slice a table must be 'stable', because if it is updated then a row could 'move' from one slice to another while the refresh is running. The row could be refreshed in two slices (which will cause errors) or no slices (data-loss). If the source database is Oracle then this problem can be avoided using a common 'select moment' (option -M).

Running bulk refresh (option -gb) for multiple slices in parallel is not supported for relational database targets. Run them one at the time instead and use Restrict /RefreshCondition with a filter such as {hvr_var_slice_condition} to protect rows on the target that will not be refreshed.

It is recommend that any Restrict /RefreshCondition defined for slicing is also given a /Context parameter so it can be easily disabled or enabled.

–ty Only refresh objects referring to table codes specified by y. Values of y may be one of the following:
tbl Only refresh table name tbl.
t1t2 Refresh all table codes that fall alphabetically between t1 and t2 inclusive.
!tbl Refresh all table names except tbl.
!t1t2 Refresh all table codes except for those that fall alphabetically between t1 and t2 inclusive.

Several –ty instructions can be supplied together.

–Ttsk Specify alternative task for naming scripts and jobs. The default task name is refr, so for example without this –T option the generated jobs and scripts are named chnrefrl1l2.
–uuser [/pwd] Connect to hub database using DBMS account user. For some databases (e.g. SQL Server) a password must also be supplied.
–v Verbose. This causes row–wise compare and refresh to display each difference detected. Differences are presented as SQL statements. This option requires that option –gr (row–wise granularity) is supplied.
–Vname=value Supply variable into refresh restrict condition. This should be supplied if a /RefreshCondition parameter contains string {hvr_var_name}. This string is replaced with value.
Note: The effects of hvrrefresh can be customized by defining different actions in the channel. Possible actions include Integrate /DbProc (so that row–wise refresh calls database procedures to make its changes) and Restrict /RefreshCondition (so that only certain rows of the table are refreshed). Parameter /Context can be used with option –C to allow restrictions to be enabled dynamically. Another form of customization is to employ SQL views; HVR Refresh can read data from a view in the source database and row–wise refresh can also select from a view in the target database, rather than a real table when comparing the incoming changes.

If row–wise hvrrefresh is connecting between different DBMS types, then an ambiguity can occur because of certain datatype coercions. For example, HVR's coercion maps an empty string from other DBMS's into a null in an Oracle varchar. If Ingres location ing contains an empty string and Oracle location ora contains a null, then should HVR report that these tables are the same or different? Command hvrrefresh allow both behavior by applying the sensitivity of the 'write' location, not the 'read' location specified by –r. This means that row–wise refreshing from location ing to location ora will report the tables were identical, but row–wise refreshing from ora to ing will say the tables were different.

Examples

For bulk refresh of table order from location cen to location decen:

$ hvrrefresh –rcen –ldecen –torder hubdb/pwd sales

To only send updates and insert to a target database without applying any deletes use the following command:

$ hvrrefresh –rcen –md –gr hubdb/pwd sales

Files

Folder-icon.png HVR_CONFIG
└─ Folder-icon.png job Directory containing all scripts generated by hvrrefresh.
└─ Folder-icon.png hubdb
└─ Folder-icon.png chn
└─ chn–refr–loc1–loc2 Script to refresh loc1 with loc2.

See Also

Commands hvrcompare, hvrgui and hvrcrypt.