An audit table is a table that contains the full history of rows. I.e. based on the primary key of a row in the source table one can query the full history of the row in the audit table and find out when the row was created, modified (possibly many times), and maybe eventually deleted. Most replication tools provide the ability to maintain audit tables and so does HVR. Some real-time data replication vendors have a separate product on their price list for auditing when in the end it is the same core replication capability that provides the functionality (you just pay for it twice).

Traditionally HVR used to position the use of stored procedures to build an audit table. The stored procedures are generated by HVR and in a generic way HVR will ensure all changes are inserted into the audit table. Unfortunately calling a stored procedure generally has more overhead than performing direct SQL statements so whilst the approach works well, there should be a better way.

And indeed there is. One day one of our customers indicated that it is possible to maintain an audit table by simply using a combination of built-in features in HVR that take very little time to setup:

  1. Add a column to every table to include the last time the row was updated. In HVR this is done through an action ColumnProperties with an IntegrateCondition {hvr_cap_tstamp} – i.e. the timestamp when the row was captured in the transaction log. This column is made a key column by checking a check box in the action.
  2. Define the Transform action SoftDelete to indicate in another extra column whether a row was deleted (1) or not (0).
  3. Use ResilientUpdate as part of the DbIntegrate action to state that if a row cannot be found then insert the row instead. This action leverages the functionality to deal with a possible overlap between the initial load and the ongoing change data capture in a regular replication setup. In this case because the capture timestamp was made part of the key the row is never found, and hence always inserted.

The result of all this is that an audit table is created with a new row for every row change or delete, and changes to the table are made using efficient SQL statements.

When the auditor queries the audit table (s)he probably wants to get a straightforward view into the data. I.e. if (s)he wants the image of a particular row on May 29th at 3 PM then it should be easy to find that. In practice that implies you should probably store all column values whenever a change happens, irrespective of whether the column was changed in the update. The alternative is to only store column changes but then in addition to the column values the user will need an update mask that indicates if an empty column represents a column untouched in the update versus a column that was changed from a non-empty value to an empty value. With such a mask queries to identify a row image at a certain time become pretty complex and resource-intensive. However to have every column value for every update you will need supplemental logging on every single column of the table. How does that work with HVR?

The nice thing is that, with HVR, the user does not need to think about this, because HVR does the right thing out of the box. The initialization step HVR Load automatically adds any necessary supplemental logging to the tables and because for this case the option ResilientUpdate was used HVR knows it needs supplemental logging on all columns to have all values in case a row must be inserted rather than updated. On top of this HVR generates the DDL for the target tables including the extra columns to store the capture timestamp and the column to indicate whether a row was deleted.

So how much time does it take to implement this for a schema with a 100, 1,000, 10,000 or any other number of tables? Only a few minutes…

Get in contact with HVR to learn more!

About Mark

Mark Van de Wiel is the CTO for HVR. He has a strong background in data replication as well as real-time Business Intelligence and analytics.

© 2019 HVR

Try now Contact us