A How-to: Data Replication for Large Tables
How Do You Manage Large Tables?
Some of HVR’s largest clients store hundreds of millions of rows in tables with many columns. Dealing with tables this size is not necessarily an issue from an ongoing real-time data replication perspective – given replication volume is determined by the transaction volume against the table – but challenges may arise during the initial load, and during comparison. DBAs may think the initial load challenges are manageable in a like-to-like environment e.g. by using native database export/import or backup/restore capabilities. However, quite a few clients face these challenges in heterogeneous environments where native capabilities simply cannot be used.
The question then is: how do you deal with large tables?
- HVR provides an option through the Action Restrict to specify filters. For an initial load users can filter rows based on a condition, and the condition may contain variables. E.g. filter records in the USERS table on the USER_ID column with a between condition specifying the lower and upper limit, and load for example ten million rows at the time. HVR supports the use of variables in the condition so at runtime the values can be changed. The condition (or a different condition such as “0 = 1”) should also be used on the destination side to prevent previously loaded rows from being deleted.
With the use of the Restrict filter, the load of a large table can be split and users prevent errors such as “Snapshot too old” in an Oracle Database (yes, it still exists!). Or on some databases applying a very large transaction uses too many resources and limiting the transaction size limits resource consumption. HVR makes running this process very simple because the GUI will provide in the status bar the equivalent command line statement that users can copy/paste/modify to run on the HVR hub. So performing a load ten million rows at the time is simply a matter of changing variable values.
- Data compare has similar considerations in that select statements may take too long to run and use too many resources, so HVR provides a similar filter option for data compare. Of course for data compare the same filter condition must be applied to the source and target database. In many cases, very large tables store historical data that is mostly dormant and only recent data changes (if there are any changes at all). For such a use case a dynamic restriction to filter for example on a date column for data that is less than a month old may suffice in a comparison. All other data, once validated, doesn’t need to be compared again if it doesn’t change.
Sometimes organizations decide to remove old(er) data from their OLTP database but want to keep history in the destination system. Traditionally SAP administrators used to do this a lot. How do you deal with this?
- HVR provides another answer there. When determining in detail what are the differences between to tables or to (re-)synchronize tables in such a scenario HVR provides the flexibility to decide whether to apply inserts, updates, and deletes or only a subset of these operations.
So, if you want to keep large tables in sync in a heterogeneous environment but don’t know where to start then I invite you to download a free trial to see how HVR will help.