Data Replication For High Availability
High Availability (HA) has many angles. An obvious angle in the context of data replication technology is high availability of a replicated database. However, there is more around HA, especially in the context of heterogeneous replication, because in such a scenario HA isn’t necessarily the primary use case replication covers. Instead, the primary use case for replication could be real-time reporting. Yet the source may be a clustered database such as Oracle RAC or SQL Server AlwaysOn. What happens to the replication technology if in these environments HA features are activated due to network, server, or storage failures? Or what if a database uses its own native DR environment and a switchover happens?
Like most data replication technologies, HVR runs on a single server at the time and may read transaction logs from multiple active nodes (e.g. threads in an Oracle RAC Database). In a clustered environment the first step is to ensure that HVR’s availability is independent of any one of the servers in the cluster.
- If the cluster is a source or destination but does not act as the hub for HVR then simply ensuring that the HVR listener runs on every node is sufficient to ensure HVR does not lose any transactions. A system failure may cause a connection to drop but HVR automatically restarts the job and presumably a new connection request will be answered by an available server, for example, thanks to a virtual IP address that was re-allocated, or because the scan-listener in Oracle realizes that a server is not available anymore.
- If the cluster hosts the hub installation then the HVR scheduler must be migrated to a different node. This requires a little more preparation:
- Ensure the runtime environment is shared across the cluster on storage that is available independent of any one of the nodes being available. In practice it makes sense to also install the software on shared storage although that is not required. Make sure that accounts on every system have access to the runtime environment. I.e. on Linux/Unix users must have the same UID on different servers.
- Enroll the HVR Scheduler as a cluster service in the cluster management software, e.g. in Windows Cluster Services or in Oracle Clusterware.
Beyond these steps HVR relies on its general resilience to recover in case a process gets interrupted.
If a database uses its native capabilities for DR and a switchover happens then HVR should switch its primary capture source or integrate destination. In case of a planned switch the operator can ensure no transactions are lost and capture or integrate on the former DR system starts immediately after the switchover. If the switchover is unplanned then depending on how the native DR capability was configured it could be that:
- If HVR was capturing out of the source:
- Transactions were applied to the primary target but these never made it to the DR system prior to the switchover. In this unusual case a reporting target could be “ahead” of the source.
- All transactions were captured by the replication tool that made it to the DR system.
- Some transactions made it to the DR target but were not captured by the replication technology.
- if HVR integrates changes into the system as a destination:
- Transactions integrated by HVR to the primary target did not make it to the standby target prior to the switchover.
- All transactions integrated by HVR made it to the DR system.
Irrespective of the scenario you probably want to compare source and target to ensure no data was lost, and fix any problems that arose due to the unplanned switch. HVR provides the capabilities to do this.
Finally there is the option to use HVR for HA, and optionally for real-time reporting as well. In such a scenario both the HA/DR environment and the real-time reporting destination may consume the same transactions in which case, depending on the setup, the question is whether transactions are lost but not so much whether source and reporting destination are are out of sync. But again, in case of any doubt there is always HVR Compare to the rescue! Post switchover HVR should immediately start capturing out of the new primary. All of this can easily be scripted around HVR and an example is hvrfailover as part of a standard HVR installation which uses a virtual IP address to route connections to the correct active database.