Database transaction logs are very complex. A transaction commit only succeeds when the change has been written into and persisted by the transaction log (the data for the transaction may still sit in memory). Transaction logs allow a database to recover committed transactions after a crash, and many traditional relational databases also allow a point-in-time recovery based on the state of the transaction logs (a point-in-time recovery allows you to state up until what time you want to recover as opposed to up to current). Because of its central and crucial role in database recovery the transaction log is the basis for many real-time data replication solutions including HVR.

The concept of the transaction log is old and databases like Oracle have known transaction logs for decades. Transaction logs are proprietary for a database, and different versions of a database typically have incompatible transaction files due to support for new operations, new data types, etc. Given a transaction commit waits for the transaction to be written to the transaction log database vendors have done everything they can to optimize writes to the transaction log, in order to support large transaction volumes and high concurrency. Whether it is the same or different kind of destination system, it is up to the replication software to untangle the transactions to be replayed in the same order.

  • An example of an optimization most relational database vendors put into their transaction log is the use of database object identifiers (numbers) instead of table and column names.
    • For example, instead of a statement similar to “update emp set sal = 1500 where empno = 7934” the transaction log may read something like “update 73196 set 6 = 1500 where 1 = 7394”. And of course it is fair to assume that “update” and “set” won’t be spelled out in the transaction log either.

Also, building on the example from the previous paragraph, most databases won’t record “where empno = 7934” by default but instead use an internal row identifier to reference a particular row. This is where the concept of supplemental logging comes in. Databases that are open and accessible for inserts, updates and deletes will always end up with different internal row references even before you think about heterogeneous replication scenarios. But replication tools need the context of a change in order to perform the equivalent operation on a target database that is open and accessible for inserts, updates, and deletes.

Supplemental logging is the concept that allows a database administrator to add context to the transaction to be able to replay the transaction on a destination database. In the example above the column “empno” is the primary key column of the table so in order to replay the transaction on the destination database only this column value is required and values for other untouched columns are not needed. For some databases (e.g. SQL Server) supplemental logging is a per table on/off switch to get all untouched columns included in the log, whereas other databases (e.g. Oracle) allow a more granular supplemental logging configuration with the option to supplementally log just primary key values, or even whatever columns you want in a custom supplemental log group.

Database Administrators (DBAs) are always concerned about the impact supplemental logging has on their system. Of course the impact depends on a numerous factors, including:

  1. What percentage of the workload is updated versus inserts or deletes (both of which are fully logged regardless for most databases).
  2. How wide is the table and what percentage of row data is typically modified in an update statement.
  3. What data types are in the table.
  4. What percentage of database tables will need supplemental logging turned on.
  5. What is the speed for the underlying storage system to write the changes.

In most cases the impact of supplemental logging is barely noticeable to end users.

HVR performs all necessary steps to start database replication through a utility hvrload (HVR Load in the GUI) in the context of a Channel (a group of tables). “Table Enrollment” is the option to translate database object names into object identifiers, and “Supplemental Logging” is the option that creates supplemental logging on the underlying tables. If you change the definition of a channel then you typically have to re-run hvrload in order to propagate the changes into the runtime environment, and if new objects were added then “Table Enrollment” and “Supplemental Logging” must be checked when re-running hvrload.

Want to learn more? Contact us!

About Mark

Mark Van de Wiel is the CTO for HVR. He has a strong background in data replication as well as real-time Business Intelligence and analytics.

Discussion

© 2019 HVR

Live Demo Contact Us