Action Capture instructs HVR to capture changes from a location. Various parameters are available to modify the functionality and performance of capture.
For a database location HVR captures changes straight from the DBMS logging system, unless /TriggerBased is defined.
When defined on a file location this action instructs HVR to capture files from a file location's directory. Changes from a file location can be replicated both database and file location if the channel contains table information. In this case any files captured are parsed (see action FileFormat.)
If Capture is defined on a file location without table information then each file captured is treated as a 'blob' and is replicated to the integrate file locations without HVR recognizing its format. If such a 'blob' file channel is defined with only actions Capture and Integrate (no parameters) then all files in the capture location's directory (including files in subdirectories) are replicated to the integrate location's directory. The original files are not touched or deleted, and in the target directory the original file names and subdirectories are preserved. New and changed files are replicated, but empty subdirectories and file deletions are not replicated.
Bi-directional replication (replication in both directions with changes happening in both file locations) is not currently supported for file locations. File deletion is not currently captured by HVR.
If Capture is defined on a file location without parameter /DeleteAfterCapture and action LocationProperties /StateDirectory is used to define a state directory outside of the file location's top directory, then HVR's file capture becomes read only; write permissions are not needed.
|/IgnoreSessionName||sess_name||This action instructs the capture job to ignore changes performed by the specified session name. Multiple ignore session names can be defined for a job, either by defining /IgnoreSessionName multiple times or by specifying a comma separated list of names as its value.
Normally HVR's capture avoids recapturing changes made during HVR integration by ignoring any changes made by sessions named hvr_integrate. This prevents looping during bidirectional replication but means that different channels ignore each other's changes. The session name actually used by integration can be changed using Integrate /SessionName. For more information, see section SESSION NAMES AND RECAPTURING below. If this parameter is defined for any table with log based capture, then it affects all tables captured from that location.
|/Coalesce||Causes coalescing of multiple operations on the same row into a single operation. For example an insert and an update can be replaced by a single insert; five updates can be replaced by one update, or an insert and a delete of a row can be filtered out altogether. The disadvantage of not replicating these intermediate values is that some consistency constraints may be violated on the target database.|
|/NoBeforeUpdate||Do not capture 'before row' for an update. By default when an update happens HVR will capture both the 'before' and 'after' version of the row. This lets integration only update columns which have been changed and also allows collision detection to check the target row has not been changed unexpectedly. Defining this parameter can improve performance, because less data is transported. But that means that integrate will update all columns (normally HVR will only update the columns that were actually changed by the update statements and will leave the other columns unchanged). If this parameter is defined for any table with log based capture, then it affects all tables captured from that location.|
|/NoTruncate||Do not capture SQL truncate table statements such as truncate in Oracle and modify mytbl to truncated in Ingres. If this parameter is not defined, then these operations are replicated using hvr_op value 5. This parameter is not supported for Microsoft SQL Server.|
|method||Specify what action should be performed to enable supplemental logging for tables. Supplemental logging should be enabled to make log-based capture of updates possible. Valid values
|method||Select method of reading changes from the DBMS log file. Valid values
For MySQL, the default method of reading changes from the DBMS log file is SQL.
For PostgreSQL the SQL method does not support bi-directional replication because changes will be re-captured and replicated back.
For SQL Server the DIRECT method requires Windows Administrator privileges and reduced permission models are not supported. The SQL method supports reduced permission models but it may require incomplete row augmenting.
For Oracle, the SQL method enables capture from LogMiner.
|method||Specify who advances SQL Server transaction log truncation point (truncates the log). Valid values are;
|/AugmentIncomplete||col_type||During capture, HVR may receive partial/incomplete values for certain column types. The /AugmentIncomplete option tells HVR to perform additional steps to retrieve the full value from the source database, this is called augmenting. Valid values for col_type are:
The default value is NONE.
In certain situations the default behavior changes and setting /AugmentIncomplete can only override the behavior with a 'stronger' value.
|/ArchiveLogPath||dir||Instruct HVR to search for the transaction log archives (backups) in the given directory.
For Oracle, HVR will search for the log archives in the directory dir in addition to the 'primary' Oracle archive directory. If /ArchiveLogOnly parameter is enabled then HVR will search for the log archives in the directory dir only. Any process could be copying log archive files to this directory; the Oracle archiver (if another LOG_ARCHIVE_DEST_N is defined), RMAN, hvrlogrelease or a simple shell script. Whoever sets up copying of these files must also arrange that they are purged periodically, otherwise the directory will fill up (a benefit of hvrlogrelease is that it can purge files that it copies automatically when they are no longer needed).
For SQL Server, HVR normally locates the transaction log backup files by querying the backup history table in the msdb database. Specifying this parameter tells HVR to search for the log backup files in the dir folder instead. When this parameter is defined, the /ArchiveLogFormat parameter must also be defined.
For HANA, HVR will search for the log backups in the directory dir instead of the default log backup location for the source database.
|/ArchiveLogFormat||format||Describes the filename format (template) of the transaction log archive (backup) files stored in the directory specified by the /ArchiveLogPath parameter. The list of supported format variables and the default format string are database-specific.
For Oracle, this parameter accepts the same set of format variables that the LOG_ARCHIVE_FORMAT parameter of the Oracle DBMS accepts (see the Oracle's documentation for the detailed syntax description). When this parameter is not defined the HVR will query the database for the LOG_ARCHIVE_FORMAT DBMS parameter and use it by default.
For SQL Server, this parameter accepts the following format variables:
HVR uses the %Y, %M, %D, %h, %m, %s and %n values to sort and processes the log backup files in the correct (chronological) order. The combinations of the %Y, %M, %D and %h, %m, %s values are expected to form valid date and time values, however no validation is performed. Any value that is missing from the format string is considered to be 0. When sorting the files comparison is done in the following order: %Y, %M, %D, %h, %m, %s, %n.
For SQL Server this parameter has no default and must be specified if /ArchiveLogPath parameter is defined.
For HANA, this parameter accepts the following format variables:
The %s, %e and %t format variables are mandatory.
For HANA, this parameter is optional, the default format value is log_backup_%v_%p_%s_%e.%t.
|/ArchiveLogOnly||Capture data from archived redo files in directory defined by /ArchiveLogPath only and do not read anything from online redo files or the 'primary' archive destination.
This allows the HVR process to reside on a different machine than the Oracle DBMS or MS SQL and read changes from files that are sent to it by some remote file copy mechanism (e.g. FTP). The capture job still needs an SQL connection to the database for accessing dictionary tables, but this can be a regular connection.
Replication in this mode can have longer delays in comparison with 'online' one. To control the delays, it is possible to force Oracle to issue an archive once per predefined period of time. For MS SQL Server it possible to control this delay by HVR_MSSQL_ZREAD_ARCHIVE_SCAN_INTERVAL environment variable
On RAC systems, delays are defined by the slowest or the less busy node. This is because archives from all threads have to be merged by SCNs in order to generate replicated data flow.
|/XLogDirectory||dir||Directory containing current PostgreSQL xlog files|
|schema.journal||Capture from specified DB2 for i journal. Both the schema (library) of the journal and the journal name should be specified (separated by a dot). This parameter is mandatory for DB2 for i. All tables in a channel should use the same journal. Use different channels for tables associated with different journals. If this parameter is defined for any table, then it affects all tables captured from that location.|
|/LogJournalSysSeq||Capture from journal using *SYSSEQ.|
||secs||Checkpointing frequency in seconds for long running transactions, so the capture job can recover quickly when it restarts. Without checkpoints, capture jobs must rewind back to the start of the oldest open transaction, which can take a long time and may require access to many old DBMS log files (e.g. archive files).
Value secs is the interval (in seconds) at which the capture job creates checkpoints. The default frequency (when /CheckpointFrequency is not defined) is 300 (5 minutes). Value 0 means no checkpoints are written.
The checkpoints are written into directory $HVR_CONFIG/capckp/hub/chn. If a transaction continues to make changes for a long period then successive checkpoints will not rewrite its same changes each time; instead the checkpoint will only write new changes for that transaction; for older changes it will reuse files written by earlier checkpoints.
Only long-running transactions are saved in the checkpoint. For example if the checkpoint frequency is each 5 minutes but users always do an SQL commit within 4 minutes then checkpoints will never be written. If however some users keep transactions open for 10 minutes, then those transactions will be saved but shorter-lived ones in the same period will not.
The frequency with which capture checkpoints are written is relative to the capture jobs own clock, but it decides whether a transaction has been running long enough to be checkpointed by comparing the timestamps in its DBMS logging records. As consequence, the maximum (worst-case) time that an interrupted capture job would need to recover (rewind back over all its open transactions) is its checkpoint frequency (default 5 minutes) plus the amount of time it takes to reread the amount of changes that the DBMS can write in that period of time.
When a capture job is recovering it will only use checkpoints which were written before the 'capture cycle' was completed. This means that very frequent capture checkpointing (say every 10 seconds) is wasteful and will not speed up capture job recovery time.
||STOR||Storage location of capture checkpoint files for quick capture recovery. Value STOR can be one of the following;
Checkpoints are saved in directory $HVR_CONFIG/capckp/ (either on the capture machine or the hub).
If capture job is restarted but it cannot find the most recent checkpoint files (perhaps the contents of that directory have been lost during a failover) then it will write a warning and then rewind back to the start of the oldest open transaction.
|/TriggerBased||Capture changes through by generating DBMS triggers, instead of using log-based capture.|
|/QuickToggle||Allows end user transactions to avoid lock on toggle table. The toggle table is changed by HVR during trigger based capture. Normally all changes from user transactions before a toggle is put into one set of capture tables and changes from after a toggle are put in the other set. This ensures that transactions are not split. If an end user transaction is running when HVR changes the toggle then HVR must wait, and if other end user transactions start then they must wait behind HVR. Parameter /QuickToggle allows these other transactions to avoid waiting, but the consequence is that their changes can be split across both sets of capture tables. During integration these changes will be applied in separate transactions; in between these transactions the target database is not consistent. If this parameter is defined for any table, then it affects all tables captured from that location.|
For Ingres, variable ING_SET must be defined to force readlock=nolock on the quick toggle table.
|/ToggleFrequency||secs||This parameter instructs HVR trigger based capture jobs to wait for a fixed interval before toggling and reselecting capture tables, instead of dynamically waiting for a capture trigger to raise a database alert. Raising and waiting for database alerts is an unnecessary overhead if the capture database is very busy. If this parameter is defined for any table, then it affects all tables captured from that location.|
|/KeyOnlyCaptureTable||Improve performance for capture triggers by only writing the key columns into the capture table. The non key columns are extracted using an outer join from the capture table to the replicated table. Internally HVR uses the same outer join technique to capture changes to long columns (e.g. long varchar). This is necessary because DBMS rules/triggers do not support long datatypes. The disadvantage of this technique is that 'transient' column values can sometimes be replicated, for example if a delete happens just after the toggle has changed, then the outer join could produce a NULL for a column which never had that value.|
|/IgnoreCondition||sql_expr||Ignore (do not capture) any changes that satisfy expression sql_expr. This logic is added to the HVR capture rules/triggers and procedures. This parameter differs from the Restrict /CaptureCondition as follows:
|/IgnoreUpdateCondition||sql_expr||Ignore (do not capture) any update changes that satisfy expression sql_expr. This logic is added to the HVR capture rules/triggers and procedures.|
|int||This implies that Ingres capture tables have a hash structure. This reduces the chance of locking contention between parallel user sessions writing to the same capture table. It also makes the capture table larger and I/O into it sparser, so it should only be used when such locking contention could occur. Row level locking (default for Oracle and SQL Server and configurable for Ingres) removes this locking contention too without the cost of extra I/O.|
|col_list||Specify different key for capture table hashing. The default hash key is the replication key for this table. The key specified does not have to be unique; in some cases concurrency is improved by choosing a non unique key for hashing.|
|/DeleteAfterCapture||Delete file after capture, instead of capturing recently changed files.
If this parameter is defined, then the channel moves files from the location. Without it, the channel copies files if they are new or modified. If this parameter is defined for any table, then it affects all files captured from that location.
|/Pattern||pattern||Only capture files whose names match pattern. The default pattern is '**/*' which means search all subdirectories and match all files.|
Possible patterns are:
On Unix and Linux, file name matching is case sensitive (e.g. *.lis does not match file FOO.LIS), but on Windows and SharePoint it is case insensitive. For FTP and SFTP the case sensitivity depends on the OS on which HVR is running, not the OS of the FTP/SFTP server.
|/IgnorePattern||pattern||Ignore files whose names match pattern. For example, to ignore all files underneath subdirectory qqq specify ignore pattern qqq/**/*. The rules and valid forms for /IgnorePattern are the same as for /Pattern, except that 'named patterns' are not allowed. If this parameter is defined for any table, then it affects all files captured from that location.|
|/IgnoreUnterminated||pattern||Ignore files whose last line does not match pattern. This ensures that incomplete files are not captured. This pattern matching is supported for UTF 8 files but not for UTF 16 file encodings. If this parameter is defined for any table, then it affects all files captured from that location.|
|/IgnoreSizeChanges||Changes in file size during capture is not considered an error which capturing for a file location.|
|/AccessDelay||secs||Delay read for secs seconds to ensure writing is complete. HVR will ignore this file until its last create or modify timestamp is > secs seconds old. If this parameter is defined for any table, then it affects all files captured from that location.|
|/UseDirectoryTime||When checking the timestamp of a file, check the modify timestamp of the parent directory (and its parent directories), as well as the file's own modify timestamp.
This can be necessary on Windows when /DeleteAfterCapture is not defined to detect if a new file has been added by someone moving it into the file location's directory; on Windows file systems moving a file does not change its timestamp. It can also be necessary on Unix/Windows if a subdirectory containing files is moved into the file location directory.
The disadvantage of this parameter is that when one file is moved into a directory, then all of the files in that directory will be captured again. This parameter cannot be defined with /DeleteAfterCapture (it is not necessary).
Database Session Names and Recapturing
Replication recapturing is when changes made by integration are captured again. Recapturing is controlled using session names. Depending on the situation recapturing can be useful or unwanted. The following are some examples:
Bi directional replication.
- During bidirectional replication if integrated changes are captured again then they can boomerang back to the original capture database. This would form an infinite loop. For this reason capture triggers check the session name and avoid recapturing integration.
- Cascade replication is when changes from one channel are captured again by a different channel and replicated onto a different group of databases. Recapturing is necessary for cascade replication. Recapturing can be configured using action Integrate /SessionName so that integration is not recognized by the cascade channel's capture triggers.
Batch work to purge old data.
- Sometimes replication of large blocks of batch work is too expensive. It may be more efficient to repeat the same batch work on each replicated database. Capturing of the batch work can be disabled by setting the session name so that the capture triggers see the changes as belonging to an integrate session. For log based capture the session name must be set using the 'current user name' approach (see below), as other approaches are not recognized for this.=
Application triggering during integration.
- Sometimes an application will have database triggers on the replicated tables. These triggers will have already been fired on the capture database so firing them again during HVR integration is unnecessary and can cause consistency problems. For Ingres databases this rule firing can be avoided with action Integrate /NoTriggerFiring, but for other DBMS's the application triggers can be modified so that they have no affect during replication integration.
HVR has multiple implementations for session names because of variations in the DBMS features and because some implementations can only be recognized by trigger based capture jobs and others can only be recognized by log based capture.
- Current user name
- For all capture DBMSs, capture jobs (both trigger based and log based) also recognize the current DBMS user name as the session name. End user sessions can therefore make changes without trigger based or log based capture being activated by connecting to the database using a specially created user name (e.g. making database changes as user name hvr_integrate). An application database trigger can be prevented from firing on the integrate side by changing it to include the following clause: where user <> 'hvr_integrate'.
- Integrate state table
- For all capture DBMSs, transactions performed by an integrate job also contains an update to an integrate state table (named hvr_stin* or hvr_stis*) which also has a column session_name containing the session name. This allows log based capture jobs to recognize the session name defined by an integrate job, but this is ignored by trigger based capture jobs.
- Oracle client information.
- For Oracle trigger-based capture only, the integrate job always sets the session name by calling set_client_info(). This is recognized by trigger based capture but not by log based capture jobs (although log based channels still set this session name during integrate). End user sessions can therefore make changes without trigger based capture being activated by calling dbms_application_info.set_client_info('hvr_integrate'). An application database trigger can be prevented from firing on the integrate side by changing it to include the following clause: where userenv('CLIENT_INFO') <> 'hvr_integrate'.
- Ingres role
- For Ingres trigger-based capture only, the integrate job always connects to the target database using an Ingres role, which is the name of the session (e.g. hvr_integrate). This is recognized by trigger based capture but not by log based capture jobs (although log based channels still set this session name during integrate). End user sessions can therefore make changes without trigger based capture being activated by connecting to the database with sql option –R. An application database rule can be prevented from firing on the integrate side by changing it to include the following clause: where dbmsinfo('role') != 'hvr_integrate'.
- SQL Server marked transaction name
- For SQL Server, log based capture jobs also recognize the marked transaction name as the session name. End user sessions can therefore make changes without log based capture being activated by executing their DML statements inside a marked transaction with a specially created name (e.g. hvr_integrate or hvr_refresh). The following SQL statement can be used to start a marked named transaction:
BEGIN TRANSACTION hvr_integrate WITH MARK;
- SQL Server application name
- For SQL Server trigger-based capture only, the integrate job always connects to the target dataset using an application name which is the session name (e.g. hvr_integrate). This is recognized by trigger based capture but not by log based capture jobs (although log based channels still set this session name during integrate). End user sessions can therefore make changes without trigger based capture being activated by also connecting to the database using a specific application name. An application database trigger can be prevented from firing on the integrate side by changing it to include the following clause: where app_name() <> 'hvr_integrate'.
Changes made to databases A1 and A2 must replicate via hub X to B and then cascade via hub Y to C1 and C2. Changes made to B must replicate to A1 and A2 and to C1 and C2 but must not boomerang back to B. Normally changes from A1 and A2 to B would not be cascade replicated onto C1 and C2 because they all use the same session name. This is solved by adding parameters /SessionName and /IgnoreSessionName to the channel in hub Y.
Writing Files While HVR is Capturing Files
It is often better to avoid HVR capturing files until is completely written. One reason is to prevent HVR replicating an incomplete version of the file to the integrate machine. Another problem is that if /DeleteAfterCapture is defined, then HVR will attempt to delete the file before it is even finished.
Capture of incomplete files can be avoided by defining /AccessDelay or /IgnoreUnterminated.
Another technique is to first write the data into a filename that HVR capture will not match (outside the file location directory or into a file matched with /IgnorePattern) and then move it when it is ready to a filename that HVR will match. On Windows this last technique only works if /DeleteAfterCapture is defined, because the file modify timestamp (that HVR capture would otherwise rely on) is not changed by a file move operation.
A group of files can be revealed to HVR capture together by first writing them in subdirectory and then moving the whole subdirectory into the file location's top directory together.
- If column hvr_op is not defined, then it default to 1 (insert). Value 0 means delete, and value 2 means update.
- Binary values can be given with the format attribute (see example above).
- If the name attribute is not supplied for the <column> tag, then HVR assumes that the order of the <column> tags inside the <row> matches the order in the HVR catalogs (column col_sequence of table hvr_column).
Rows can be read from Salesforce locations using action Capture and integrated into any database location. They can also be sent to a file location where they are written as XML files.
A capture job reads all rows from the Salesforce table instead of just detecting changes. This means that the capture job should be scheduled to run continuously. Instead it can be run manually or periodically with action Scheduling /CaptureStartTimes.
A channel with Capture must have table information defined; it cannot be used with a 'blob' file channel. The Salesforce 'API names' for tables and columns are case–sensitive and must match the 'base names' in the HVR channel. This can be done by defining TableProperties /BaseName actions on each of the capture tables and ColumnProperties /BaseName actions on each column.
All rows captured from Salesforce are treated as inserts (hvr_op=1). Deletes cannot be captured.
Salesforce locations can only be used for replication jobs; HVR Refresh and HVR Compare are not supported.
A capture restriction can be defined on a Salesforce location in SOQL using action Restrict /CaptureCondition.