Mark Van de Wiel centralized database

Same problem, different tools

Back around the turn of the century, I worked for a client – an employment agency – who had to centralize data from multiple databases in local branches into a central database. We used Oracle 7.3.4 Advanced Replication with updateable snapshots (materialized views) to implement the solution. The entire project was bespoke and it took the client months of consulting resources to implement the solution. And it worked.

Data replication toolsAbout seven years ago I worked for a different client – a Telco – who had the same challenge. The systems were also running Oracle but this time the tool of choice was GoldenGate (not part of Oracle at the time). During the consulting engagement – whilst the client was figuring out how to perform the initial load of the data – I spent a couple of weeks to develop scripts to run against the Oracle Data Dictionary to generate the many scripts and parameter files GoldenGate would need to get the system in place: add trandata statements to add supplemental logging, mapping of source to target tables in parameter files, and a lot more that I cannot remember. I enjoyed this implementation because, in the end, we could just watch the script run and once it was tested it ensured the outcomes had no typos or other human-introduced mistakes. In hindsight, the solution took a while to build and may have been difficult to maintain over time, but again it worked.

How would I do this today?

Some Challenges of Database Centralization

database centralizationFirst of all, let me list some of the challenges that come into play when dealing with database centralization. The use case could be reporting: report on a single instance of all data as opposed to running reports on multiple disconnected systems.

  • One question to deal with is whether data must (at any point) be shared between instances. Often at first you might think the answer is no, but then if you think about it again then maybe the answer becomes yes. For example the employment agency had workers that could move from one city to another and be allocated to a different branch. I.e. we had to figure out how to deal with this scenario. Of course then there is also the question if there could be collisions (updates to the same row on different sites) and if so how to deal with these.
  • Another topic is primary keys: how will a row be identified on the central database? Or is it not important that there is a primary key (in most cases it is – and certainly back in the days of updateable snapshots with “fast” refresh it was…)? What does the primary key on the central database look like, and how to ensure there are no key collisions? Of course there should be minimal if any impact on the application(s) that run on the remote sites and one would avoid triggers if at all possible. The way I dealt with this before was to add a column to every table on the central database to store where the record originated (which is probably good to know anyway). But there are other approaches that are also used for active/active scenarios such as (for numeric keys) sequential numbers as a starting point with an increment by at least the number of local sites. Another approach is a unique pre/suffix.
  • What is the synchronization frequency? In the previous century once-a-day was what we did, partly because systems weren’t on-line continuously (and updateable snapshots weren’t really built for this anyway).

Back to the question how would I do this today using HVR? The implementation would look something like this (without going into details on some of the topics mentioned before):

  1. Create a single channel with all tables in it, and select as many source locations as necessary to be a member of the single source location group. Whether there are 2, 20 or 200 sites makes no difference for the implementation effort.
  2. Implement all necessary transformations once. E.g. introduce an extra column for every table on the target database that becomes part of the key and is populated with the location where the data came from.
  3. Implement a condition to allow compare between a local site and the central site using the remote site as a variable to ensure on an ongoing basis HVR monitors systems being in sync.
  4. Use HVR to create the target tables, and load the data from the local sites into the central database.
  5. Turn on real-time replication, or if preferred schedule the replication to run at intervals.
  6. Set up automatic monitoring to send emails/alerts in case there are problems with the system.
  7. Go home.

For the first implementation, a coworker and I worked through the night leading up to the go-live date. I cannot see myself doing that again…

About Mark

Mark Van de Wiel is the CTO for HVR. He has a strong background in data replication as well as real-time Business Intelligence and analytics.

Test drive
Contact us