Mark Van de Wiel

Generally, data replication products will not break transaction boundaries unless configured to do so. Also, most real-time data replication tools don’t start applying transactions to the destination system until the source system commits the change (I am aware of one exception to this last generalization but I also heard several rumors about phantom transactions in the destination system; i.e. transactions that made it into the target system but were never committed in the source).

So let’s assume that real-time data replication tools, including HVR, only apply changes to the target system once the source system committed the change, and that transaction boundaries don’t change.

Consider a long-running transaction that modifies a lot of data in tables that are replicated. Let’s say it took 10 minutes to perform the transaction on the source system until it finally commits. What would happen on the target database?

Every data replication solution will perform a set of optimizations in an effort to perform transactions faster on the destination than on the source. This is necessary because changes are serialized by a commit sequence number and (without additional information) have to be applied in that order to maintain consistency and avoid potential deadlocks. Typical optimizations replication tools build in include:

  • The use of batches of changes instead of row-by-row modifications,
  • Coalescing multiple changes to the same row into a single change,
  • Combining multiple individual transactions into fewer, larger transactions (which of course maintains the transaction boundaries).

Note that in the case of a long running transaction on the source database, part of the transaction time may have been spent performing complex calculations when the destination system only applies the result of the transactions.

End-to-end real-time data replication latency is measured by looking at the amount of time it takes between a transaction commit on the source system and the commit of the same transaction on the target system. On a busy system with very many short-running transactions a well-configured replication system should show a few seconds of latency at the most, and this is typical for HVR implementations.

But what about the long-running transaction?

If it takes 2 minutes on the destination system to process the transaction that took 10 minutes on the source, then just before the end of the long transaction on the destination the latency will be 2 minutes. And note that all transactions that committed after the long-running transactions are queued up (in commit sequence order) behind this one long-running transaction. This is what you should expect.

Should you be alarmed?

Of course this depends on the use case and the set up. In HVR’s case the capture will continue to run and capture transactions that are queued up on the hub, and if for whatever reason there is a hiccup between hub and destination database then the system will simply recover and continue where it left off. I.e. your system is recoverable, transactions have been captured, and, depending on your setup, transactions are stored persistently outside of the database.

Also note that most replication products including HVR provide a way to limit the size of the transaction on the target system by committing in the middle of a transaction. Of course a commit in the middle of a transaction breaks the transaction boundary but it may be a way to limit resource consumption on the target.

How long is my long-running transaction anyway? Many OLTP systems hardly ever run a long-running transaction, but others perform regular batch jobs that modify lots of data. For an Oracle Database here is a quick query to find out how long transactions have been running in the database (this runs in SQL Plus, but without the formatting options it should run in any SQL interface):

For an Oracle Database here is a quick query to find out how long transactions have been running in the database (this runs in SQL Plus, but without the formatting options it should run in any SQL interface):
 To discuss this and other questions you may have about HVR, feel free to contact us!

About Mark

Mark Van de Wiel is the CTO for HVR. He has a strong background in data replication as well as real-time Business Intelligence and analytics.

Test drive
Contact us