HVR Runs in a Scalable Distributed Architecture
How it Works
Log-based Change Data Capture (CDC) takes place on or as close to the source server as possible. This is where relevant transaction data is extracted and compressed. The data is then sent across the wire to the central hub, the distributor. The hub guarantees recoverability and as needed queues the compressed transactions.
Separately from the capture, an integrate process picks up compressed changes for its destination that are sent to the target where they are unpacked and applied using the most efficient method for the target.
On-Premise to Cloud Example using Oracle, SQL Server and Data Lake in Amazon RedShift
For example, an organization using on-premise Oracle and SQL Server databases as sources and a Data Lake in Amazon Redshift will be able to scale to many sources with capture running on the individual database servers. These servers send compressed (and encrypted) changes into the AWS cloud to be applied to Redshift. The changes into Redshift go through S3 and copy into Redshift tables, followed by set-based SQL statements on the target tables, so that on aggregate the analytical database can still keep up with the transaction load from multiple sources.
Distributed Architecture for On-Premise to Cloud Data Integration Scenario
Data Integration Architecture: Understanding Agents
The question of whether or not to use an agent when performing data integration, especially around use cases with log-based Change Data Capture (CDC) and continuous, near real-time delivery, is common.
In this video, HVR’s CTO, Mark Van de Wiel goes into detail about:
- The pros and cons of using an agentless setup, versus an agent setup
- When to consider one over the other
- Two common distributed architectures using an agent setup