Network Efficiency Matters in Data Replication
A few years ago I worked with an ultra-fast analytical columnar database Vectorwise (now called Vector). At the time I thought Vectorwise could beat any row-based database on query performance thanks to the extremely efficient software. Then one day a user ran a set of queries that returned results much faster on a row-based database…
It turned out that the queries retrieved every column out of a table (so no benefit from columnar storage) with almost no filters, and no aggregation, so there was no benefit from advanced chip technology optimizations exploited by the software. The row-based database spent maybe 5 seconds to execute the query following which the results took 30 seconds to make it to the client. The columnar database ran the query in 0.2 seconds but took 10 minutes to transport the results.
Network efficiency matters, and not only for queries, but also for real-time data replication.
- The only time when network traffic does not matter (much) is if source and target database reside in the same data center. A common data replication use case is real-time reporting/analytics or data consolidation in a big data technology. Some of our larger clients integrate data between multiple data centers across the world. But also more and more clients incorporate the Cloud as a separate data center, for example to implement real-time reporting on Redshift or Microsoft Azure. Network efficiency matters even more for cloud scenarios than it does for intra data center scenarios, because network speed and bandwidth vary throughout the day.
- Most data replication tools including HVR support network compression. Surprisingly to compress data across the wire is not enabled by default for all replication technologies. For HVR it is. HVR’s network communication protocol is highly optimized to work well on high-latency, low-bandwidth network connections. HVR uses as few large block transfers as possible to send data between source and target in a compressed format. If encryption is required then the compressed data will be encrypted.
- Compressing and decompressing data consumes extra CPU resources. In almost all cases it is worth trading higher CPU utilization for lower network utilization. One of the reasons is that over the years CPU performance has increased much faster than network performance. Only on the busiest systems it may make sense to disable compression to allow the software to capture data faster, and HVR provides the option to do this.
- We always recommend customers to install the HVR software next to the database in order to benefit from data compression. With a source or target database in the cloud you must always install HVR in the same part of the cloud (ideally in the same data center, although that cannot always be determined) to support large data and transaction volumes. It makes a big difference.
Skeptical? You are welcome to try the difference between an installation with and without HVR running on a cloud server!