Integrate

From HVR
Jump to: navigation, search
Actions
Previous: Capture
Next: TableProperties
Action Reference

Description

Action Integrate instructs HVR to integrate changes into a database table or file location. Various parameters are available to tune the integration functionality and performance.

If integration is done on file location in channel with table information then any changes are integrated as records in either XML, CSV, AVRO or Parquet format. For details see action FileFormat

Alternatively, a channel can contain only file locations and no table information. In this case, each file captured is treated as a 'blob' and is replicated to the integrate file locations without HVR recognizing its format. If such a 'blob' file channel is defined with only actions Capture and Integrate (no parameters) then all files in the capture location's directory (including files in subdirectories) are replicated to the integrate location's directory. The original files are not touched or deleted, and in the target directory, the original file names and subdirectories are preserved. New and changed files are replicated, but empty subdirectories and file deletions are not replicated.

If a channel is integrating changes into Salesforce, then the Salesforce 'API names' for tables and columns (case–sensitive) must match the 'base names' in the HVR channel. This can be done by defining TableProperties /BaseName actions on each of the tables and ColumnProperties /BaseName actions on each column.

Parameters

Parameter Argument Description
/Burst Integrate changes into target table using Burst algorithm. All changes for the cycle are first sorted and coalesced, so that only a single change remains for each row in the target table (see parameter /Coalesce). These changes are then bulk loaded into 'burst tables' named tbl_ _b. Finally a single set wise SQL statement is done for each operation type (insert, update and delete). The end result is the same as normal integration (called continuous integration) but the order in which the changes are applied is completely different from the order in which they occurred on the capture machine. For example, all changes are done for one table before the next table. This is normally not visible to other users because the burst is done as a single transaction, unless parameter /BurstCommitFrequency is used. If database triggers are defined on the target tables, then they will be fired in the wrong order. This parameter cannot be used if the channel contains tables with foreign key constraints. If this parameter is defined for any table, then it affects all tables integrated to that location.

This parameter cannot be used for file locations. A similar effect (reduce changes down to one per row) can be defined with parameter /ReorderRows=SORT_COALESCE.

/BurstCommitFrequency freq Frequency for committing burst set wise SQL statements. Available options for freq are:
  • CYCLE (default): All changes for the integrate job cycle are committed in a single transaction. If this parameter is defined for any table then it affects all tables integrated to that location.
  • TABLE: All changes for a table (the set wise delete, update and insert statements) are committed in a single transaction.
  • STATEMENT: A commit is done after each set wise SQL statement.
/Coalesce Causes coalescing of multiple operations on the same row into a single operation. For example, an insert and an update can be replaced by a single insert; five updates can be replaced by one update, or an insert and a delete of a row can be filtered out altogether. The disadvantage of not replicating these intermediate values is that some consistency constraints may be violated on the target database.

Parameter /Burst performs a sequence of operations including coalescing. Therefore this parameter should not be used with /Burst.

/ReorderRows mode Control order in which changes are written to files. If the target file-name depends on the table name (for example parameter /RenameExpression contains substitution {hvr_tbl_name} ) and if the change-stream fluctuates between changes for different tables; then keeping the original order will cause HVR to create lots of small files (a few rows in a file for tab1, then a row for tab2, then another file for tab1 again). This is because HVR does not reopen files after it has closed them. Reordering rows during integration will avoid these 'micro-files'.

Available options for mode are:

  • NONE: Write all rows in their original order, even if it means creating micro-files per table. This is the default if /RenameExpression does not contain a substitution which depends on the table name, which is common for XML format.
  • BATCH_BY_TABLE: Reorder interleaved rows of different tables in batches, with best effort. This mode both prevents micro-files and avoids a full-sort, so performance is good and resource usage is minimal. If there is lots of data then multiple (larger) files can be created for a single table. This is the default if /RenameExpression does contain a substitution which depends on the table name, as is common for CSV and AVRO format.
  • ORDER_BY_TABLE: Reorder rows of different tables so there is only 1 file created per table. This value makes HVR buffer all changes during the integrate cycle before it writes them.
  • SORT_COALESCE: Sort and coalesce all changes for the cycle, so that only a single change remains for each row in the target file. This is equivalent to /Burst for database locations. A side effect of this value is that key updates (hvr_op 3 then 2) are replaced with a delete and insert (hvr_op 0 then 1) and a 'before' value for a regular update (hvr_op 4) is not written.

A column can be added using ColumnProperties /Extra which contains the type of operation. This can use either /IntegrateExpression={hvr_op} or multiple actions with /ExpressionScope=scope with /IntegrateExpression=const. Adding such a column does not change which operations (e.g. insert, update, delete etc..) are actually written to files. This depends instead on whether actions /SoftDelete or /TimeKey are defined. Inserts (hvr_op value 1) and 'after updates' (hvr_op 2) are are always written. Deletes (hvr_op 0), 'before key updates' (hvr_op 3) and 'truncates' (hvr_op 5) are written if either /SoftDelete or /TimeKey is defined. A 'before non-key updates' (hvr_op 4) is only written if /TimeKey is defined.

/Resilient mode Resilient integration of inserts, updates and deletes. This modifies the behavior of integration if a change cannot be integrated normally. If a row already exists then an insert is converted to an update, an update or a non-existent row is converted to an insert, and a delete of a non-existent row is discarded. Existence is checked using the replication key known to HVR (rather than checking the actual indexes or constraints on the target table). Resilience is a simple way to improve replication robustness but the disadvantage is that consistency problems can go undetected. Value mode controls whether an error message is written to when this occurs. Available options for mode are:
  • SILENT
  • SILENT_DELETES
  • WARNING
/OnErrorSaveFailed On integrate error, write the failed change into 'fail table' tbl__f and then continue integrating other changes. Changes written into the fail table can be retried afterwards (see command hvrretryfailed). If this parameter is not defined the default behavior if an integrate error occurs is to write a fatal error and to stop the job.

If certain errors occur, then the integrate will no longer fail. Instead, the current file's data will be 'saved' in the file location's state directory, and the integrate job will write a warning and continue processing other replicated files. The file integration can be reattempted (see command hvrretryfailed). Note that this behavior affects only certain errors, for example, if a target file cannot be changed because someone has it open for writing. Other error types (e.g. disk full or network errors) are still fatal. They will just cause the integrate job to fail. If data is being replicated from database locations and this parameter is defined for any table, then it affects all tables integrated to that location. This parameter cannot be used with parameter /Burst.
For Salesforce this parameter causes failed rows/files to be written to a fail directory ($HVR_CONFIG/work/hub/chn/loc/sf) and an error message to be written which describes how the rows can be retried.

/DbProc Apply database changes by calling integrate database procedures instead of using direct SQL statements (insert, update and delete). The database procedures are created by hvrinit and called tbl__ii, tbl__iu, tbl__id. This parameter cannot be used on tables with long column datatypes.
Note: /DbProc cannot be used on tables with long datatypes, for example long varchar or blob.
/TxBundleSize int Bundle small transactions together for improved performance. For example, if the bundle size is 10 and there were 5 transactions with 3 changes each, then the first 3 transactions would be grouped into a transaction with 9 changes and the others would be grouped into a transaction with 6 changes. Transaction bundling does not split transactions. The default transaction bundle size is 100. If this parameter is defined for any table, then it affects all tables integrated to that location.
/TxSplitLimit int Split very large transactions to limit resource usage. For example, if a transaction on the master database affected 10 million rows and the remote databases has a small rollback segment then if the split limit was set to 1000000 the original transaction would split into 10 transactions of 1 million changes each. The default is 0, which means transactions are never split. If this parameter is defined for any table, then it affects all tables integrated to that location.
/NoTriggerFiring For Ingres databases, this parameter disables the firing of all database rules during integration. This is done by performing SQL statement set norules at connection startup.

For SQL Server databases, this parameter disables the firing of database triggers and updating columns during integration if those objects were defined with not for replication. This done by connecting to the database with the SQL Server Replication connection capability. A disadvantage of this connection type is that the database connection string must have form \\host,port instead of form \\host\instance. This port needs to be configured in the Network Configuration section of the SQL Server Configuration Manager. Another limitation is that encryption of the ODBC connection is not supported if this parameter is used for SQL Server. This parameter has no effect for other DBMSs.

For Oracle and SQL Server hvrrefresh will automatically disable triggers on target tables before the refresh and re-enable them afterwards, unless option -q is defined. Other ways to control trigger firing are described in Database Session Names and Recapturing.

/SessionName sess_name Integrate changes with specific session name. The default session name is hvr_integrate. Capture triggers/rules check the session name to avoid recapturing changes during bidirectional replication. For a description of recapturing and session names see action Capture. If this parameter is defined for any table, then it affects all tables integrated to that location.
/Topic expression Name of the Kafka topic. You can use strings/text or expressions as Kafka topic name.

Following are the expressions to substitute capture location or table or schema name as topic name:

  • {hvr_cap_loc} - for capture location name.
  • {hvr_tbl_name} - for current table name. This is only allowed if the channel is defined with tables.
  • {hvr_schema} - for schema name of the table. This is only allowed if the channel is defined with tables and this can only be used when an explicit TableProperties /Schema=my_schema is defined for these tables on the target file location.

If this parameter is not defined, the messages are sent to the location’s Default Topic field. The default topic field may also contain the above expressions. The Kafka topics used should either exist already in the Kafka broker or it should be configured to auto-create Kafka topic when HVR sends a message.

/MessageBundling mode Number of messages written into single Kafka message. Regardless of the file format chosen, each Kafka message contains one row by default. Available options for mode are:
  • ROW: Each Kafka message contains a single row; this mode does not support bundling of multiple rows into a single message. This is the default message bundling mode. Note that this mode causes a key-update to be sent as multiple Kafka messages (first a 'before update' with hvr_op 3, and then an 'after update' with hvr_op 2).
  • CHANGE: Each Kafka message is a bundle containing two rows (a 'before update' and an 'after update' row) whereas messages for other changes (e.g. insert and delete) contain just one row. During refresh there is no concept of changes, so each row is put into a single message. Therefore in that situation, this behavior is the same as mode ROW.
  • TRANSACTION: During replication, each message contains all rows in the original captured transaction. An exception is if the message size looks like it will exceed the bundling threshold (see parameter /MessageBundlingThreshold). During refresh, all changes are treated as if they are from a single capture transaction so this mode behaves the same as bundling mode THRESHOLD.
  • THRESHOLD: Each Kafka message is bundled with rows until it exceeds the message bundling threshold (see parameter /MessageBundlingThreshold).

Note that Confluent's Kafka Connect only allows certain message formats and does not allow any message bundling, therefore /MessageBundling must either be undefined or set to ROW. Bundled messages simply consist of the contents of several single-row messages concatenated together.

Note: When the mode is TRANSACTION or THRESHOLD and if the name of the Kafka topic contains an expression such as {hvr_tbl_name} then the rows of different tables will not be bundled into the same message.
/MessageBundlingThreshold int The threshold for bundling rows in a Kafka message. The default value is 800,000 bytes. This parameter is applicable only for the message bundling modes TRANSACTION and THRESHOLD. For those bundling modes, rows continue to be bundled into the same message until after this threshold is exceeded. After that happens the message is sent and new rows are bundled into the next message.

By default the maximum size of a Kafka message is 1,000,000 bytes; HVR will not send a message exceeding this size and will instead give a fatal error. You can change the maximum Kafka message size that HVR will send by defining $HVR_KAFKA_MSG_MAX_BYTES but ensure not to exceed the maximum message size configured in Kafka broker (settings message.max.bytes). If the message size exceeds this limit then message will be lost.

/RenameExpression expression Expression to name new files. A rename expression can contain constant strings mixed with substitutions. Possible substitutions include:
  • {colname [spec]} is replaced/substituted with the value of current table's column colname. The default format is %[utc] %Y%m%d%H%M%S, but this can be overridden using the timestamp substitution format specifier spec if the column has a date and time data type. For more information, see Timestamp Substitution Format Specifier.
  • {hvr_cap_loc} is replaced with the name of the capture location.
  • {hvr_cap_tstamp [spec]} is replaced with the moment (time) that the change occurred in source location. The default format is %[utc] %Y%m%d%H%M%S, but this can be overridden using the timestamp substitution format specifier spec. For more information, see Timestamp Substitution Format Specifier.
  • {hvr_chn_name} is replaced with the name of the channel.
  • {hvr_integ_tstamp [spec]} is replaced with the moment (time) that the file was integrated into target location. The default format is %[utc] %Y%m%d%H%M%S.SSS, but this can be overridden using the timestamp substitution format specifier spec. For more information, see Timestamp Substitution Format Specifier. If a format is supplied with spec, then HVR will throttle file integration to no faster than one file per second.
  • {hvr_schema} is replaced with the schema name of the table. This is only allowed if the channel is defined with the tables. This can only be used when a explicit TableProperties /Schema=my_schema is defined for these tables on the target file location.
  • {hvr_tbl_name} is replaced with the name of the current table. This is only allowed if the channel is defined with the tables.
  • {hvr_cap_subdirs} and {hvr_cap_filename} are also allowed if the file was captured from a file location.
  • {hvr_cap_subdirs_sharepoint} and {hvr_cap_filename_sharepoint} can be used to rename files whose original name is not legal for SharePoint file systems. Each illegal character is mapped to a hex code, so a file called sugar&spice is renamed to sugarx26spice.
  • If the file was captured with a 'named pattern' (see Capture /Pattern), then the string that matched the named pattern can be used as a substitution. So if a file was matched with /Pattern="{office}.txt" then it could be renamed with expression hello_{office}.data.
Note: When /RenameExpression is not defined -
  • The new files are named {hvr_cap_subdirs}/{hvr_cap_filename} if they are captured from another file location
  • The new files are named {hvr_integ_tstamp}.xml if they are for database changes and the channel is defined with tables.
/ErrorOnOverwrite Error if a new file has same name as an existing file. If data is being replicated from database locations and this parameter is defined for any table, then it affects all tables integrated to that location.
/MaxFileSize int The threshold for bundling rows in a file. This parameter cannot be used for 'blob' file channels which contain no table information and only replicated files as 'blobs'.

The rows are bundled into the same file until after this threshold is exceeded. After that happens, the file is sent and HVR will start writing rows to a new file whose name is found by re-evaluating parameter /RenameExpression (or {hvr_integ_tstamp}.xml if that parameter is not specified). XML files written by HVR always contain at least one row, which means that specifying a number between 1 and 500 will cause each file to contain a single row.

For efficiency reasons HVR's decision to start writing a new file depends on the XML length of the previous row, not the current row. This means that sometimes the actual file size may slightly exceed the value specified. If data is being replicated from database locations and this parameter is defined for any table, then it affects all tables integrated to that location.

/Verbose The file integrate job will write extra information, including the name of each file which is replicated. Normally, the job only reports the number of files written. If data is being replicated from database locations and this parameter is defined for any table, then it affects all tables integrated to that location.
/TableName userarg API name of Salesforce table into which attachments should be uploaded. See section Salesforce Attachment Integration below.
/KeyName userarg API name of key in Salesforce table for uploading attachments. See section Salesforce Attachment Integration below.
/CycleByteLimit int Maximum amount of routed data (compressed) to process per integrate cycle. If more than this amount of data is queued for an integrate job, then it will process the work in 'sub cycles'. The benefit of 'sub cycles' is that the integrate job won't last for hours or days. If the /Burst parameter is defined, then large integrate cycles could boost the integrate speed, but they may require more resources (memory for sorting and disk room in the burst tables tbl_ _b). The default is 10 Mb. Value 0 means unlimited, so the integrate job will process all available work in a single cycle.

If the supplied value is smaller than the size of the first transaction file in the router directory, then all the transactions in that file will be processed. Transactions in a transaction file will never be split between cycles or sub-cycles. If this parameter is defined for any table, then it affects all tables integrated to that location.

/JournalRouterFiles Move processed transaction files to journal directory $HVR_CONFIG/jnl/hub/chn/YYYYMMDD on the hub machine. Normally an integrate job would just delete its processed transactions files. The journal files are compressed, but their contents can be viewed using command hvrrouterview.

Old journal files can be purged with command hvrmaint journal_keep_days=N. If this parameter is defined for any table, then it affects all tables integrated to that location.

/Delay N Delay integration of changes for N seconds
/Context ctx Ignore action unless refresh/compare context ctx is enabled. The value should be the name of a context (a lowercase identifier). It can also have form !ctx, which means that the action is effective unless context ctx is enabled. One or more contexts can be enabled for HVR Compare or Refresh (on the command line with option –Cctx). Defining an action which is only effective when a context is enabled can have different uses. For example, if action Integrate /RenameExpression /Context=qqq is defined, then the file will only be renamed if context qqq is enabled (option -Cqqq).

Columns Changed During Update

If an SQL update is done to one column of a source table, but other columns are not changed, then normally the update statement performed by HVR integrate will only change the column named in the original update. However, all columns will be overwritten if the change was captured with Capture /NoBeforeUpdate.

There are three exceptional situations where columns will never be overwritten by an update statement:

  • If ColumnProperties /NoUpdate is defined;
  • If the column has a LOB datatype and was not change in the original update;
  • If the column was not mentioned in the channel definition.

Controlling Trigger Firing

Sometimes during integration it is preferable for application triggers not to fire. This can be achieved by changing the triggers so that they check the integrate session (for example where userenv('CLIENT_INFO') <>'hvr_integrate'). For more information, see section Application triggering during integration. in Capture.

For Ingres target databases, database rule firing can be prevented by specifying Integrate /NoTriggerFiring or with hvrrefresh option –f.

SharePoint Version History

HVR can replicate to and from a SharePoint/WebDAV location which has versioning enabled. By default, HVR's file integrate will delete the SharePoint file history, but the file history can be preserved if action LocationProperties /StateDirectory is used to configure a state directory (which is the then on the HVR machine, outside SharePoint). Defining a /StateDirectory outside SharePoint does not impact the 'atomicity' of file integrate, because this atomicity is already supplied by the WebDAV protocol.

Salesforce Attachment Integration

Attachments can be integrated into Salesforce.com by defining a 'blob' file channel (without table information) which captures from a file location and integrates into a Salesforce location. In this case, the API name of the Salesforce table containing the attachments can be defined either with Integrate /TableName or using 'named pattern' {hvr_sf_tbl_name} in the Capture /Pattern parameter. Likewise, the key column can be defined either with Integrate /KeyName or using 'named pattern' {hvr_sf_key_name}. The value for each key must be defined with 'named pattern' {hvr_sf_key_value}.

For example, if the photo of each employee is named id.jpg, and these need to be loaded into a table named Employee with key column EmpId, then action Capture /Pattern="{hvr_sf_key_value}.jpg" should be used with action Integrate /TableName="Employee" /KeyName="EmpId".

Note: All rows integrated into Salesforce are treated as 'upserts' (an update or an insert). Deletes cannot be integrated.

Salesforce locations can only be used for replication jobs; HVR Refresh and HVR Compare are not supported.