Why is data replication slow?
How can we speed up replication?
Every now and again I come across the question: how can we speed up the replication? Irrespective of the target database the desirable answer for the person responsible for the replication implementation may be: parallelize the replication. However that is only the best answer if better options have been exhausted, and users don’t always explore the better options. In most cases, the integration side is slower than the capture side because log-based capture uses large sequential block reads out of transaction files. So let’s focus on the integrate side.
How Replication Works
First, let’s understand how replication typically works.
The source database processes many operations in parallel. Most of the data changes are committed and assigned a commit sequence number. Replication products generally pick up the transactions as they are committed to replay them on the target in the order they were committed on the source. There are optimizations in this process but in most cases, by default, a single integration process on the target processes the transactions in commit order to ensure data integrity on the target database in line with the source system. Given there is usually a multi-user workload on the source system it is not a big surprise that, depending on the load (number of transactions, size of transactions, tables and data types involved, etc.), on the target side, the system can not always keep up.
The question to always ask before thinking about parallelizing the data integration – and with that splitting transactions across different processes which likely breaks data integrity (whether that is an issue depends on the application) – is: why is my replication slow? Or, is there any way I can speed up the replication?
Why is my Replication Slow?
There can be many reasons why the replication is slow:
- Single row updates and deletes typically benefit from an index on the primary key column(s). Maybe indexes are missing. Or if there are indexes, maybe they have become fragmented slowing down seeks and (some of) the indexes should be rebuilt.
- A typical optimization replication products (including HVR) implement is to combine multiple source database transactions into fewer transactions on the target database. Maybe the target database does not handle large transactions well and you should tune the replication technology to commit more frequently.
- The target database may be a columnar database. Columnar databases typically show (much) slower performance for updates and deletes than row-based databases, so maybe just to replay the load from the source database is fundamentally not the best way to go. Luckily HVR addresses this challenge with a simple option Burst.
- The target database may an MPP database, and the distributed nature of the system may not lend itself for comparable performance on OLTP operations. Commits may be a lot slower on a distributed system compared to a single instance database. Again in HVR’s case the Burst option can come to the rescue.
Problem Solve: Gather Information
The first step in all cases, rather than guessing what the problem is, is to gather information.
Oracle has a great way to get information (if you are licensed for the Oracle Diagnostic Pack) through the AWR reports. These reports tell you during a time window (ideally when you ran a relevant performance test) how the instance spent most of its time, what are the top wait events, what are the top SQL statements, etc. Looking through these reports may point you to potential problems. Likewise SQL Server has wait events and other statistics that are worth analyzing. Or you can typically find out in the database’s data dictionary what are the sizes of database objects, are indexes fragmented, etc. Implement changes you think make sense based on an analysis of the workload. Add or rebuild indexes as needed, implement partitioning if it makes sense (and you are licensed for it), change the distribution of tables in an MPP system. Etc.
And if all of that is insufficient, then parallelize the data replication. You can do this with HVR like you can with most other replication products.
Want to see for yourself? Get a trial of HVR.