Tracking Database Changes
To understand what happened in your database is very powerful for multiple reasons:
- You want to look at the old state of the database – i.e. you want to go back in time.
- An issue came up, and you have to diagnose when and where it originated.
- Rules and regulations force you to keep a history (e.g. in the financial and healthcare industries this is common)
- Keeping track of the history of changes gives you competitive advantage and/or revenue opportunities: think about how your internet browsing history influences the ads you see on Google, and the products Amazon recommends, but also beyond these examples a customer support organization should have insight in the history of customer support tickets, how they have been handled, and what may have led to an unhappy customer that is now ready to shift to a competitor.
- Etc. Every industry has use cases for keeping historical changes.
It is up to the IT professional – and that may well be you – to implement this history of changes in your environment. How will you go about this?
- A common approach to think about keeping a history of changes is to implement a data warehouse. Irrespective of whether you implement a third normal form (3NF) schema, you use dimensional modeling to implement your data warehouse, or you sit somewhere in between, you have probably heard about the concept of a slowly changing dimension. The slowly changing dimension – as an extension to dimensional modeling – assumes that reference data may change, and you want to keep a history of these changes. The concept of slowly changing dimensions great, but what if you are dealing with changes that don’t just come in “slowly”, or you are faced with changes to – in terms of dimensional modeling – the fact table, and you want to keep track of these too? How can you efficiently deal with all of these?
Identify Database Change with Efficient Change Data Capture (CDC)
In order to store a history of changes you will have to start by identifying the changes. Recently I wrote a blog post about Change Data Capture methods and concluded that log-based Change Data Capture is the most efficient way to identify any changes on the source system, including multiple changes to a row within a transaction, deletes and even truncates, irrespective of the size of the underlying data structures. Depending on your source database there are multiple technologies that can provide this capability.
Needless to say IT professionals already have many projects on their plates due to ever more demanding business users, and ongoing hardware and software maintenance cycles. The last thing you need is another complex data history project requiring a Ph.D. in rocket science to implement. Instead, you want something that is easy to implement across as many tables as you want, ideally with a few clicks and optionally through a wizard driven approach.
Speaking of ever more demanding business users: data cannot be delivered fast enough to the pretty Business Intelligence (BI) tools and dashboards. Whatever you implement has to deliver great performance. There are two aspects to good performance:
- Good performance, and limited resource consumption, to store the history of changes. For example SQL Server has an easy wizard-driven approach to implementing slowly changing dimensions, but does it always work well at any scale?
- Great performance to retrieve the results. Depending on data volumes, the number of systems that are potentially consolidated into a single database or your employer’s desire to move to the cloud you may be looking for an analytical database solution, quite possibly using column-oriented (as opposed to row-based) storage, quite possibly using a Massively Parallel Processing (MPP) infrastructure. You would be looking for a way to store a history of changes efficiently in an environment with different performance characteristics for a transactional workload than your source database, and – if you have to move data into a different data center or into the cloud – limited network bandwidth.
Bonus: Real-Time changes
At HVR Software – thanks to our focus on real-time data replication in heterogeneous environments – we know a thing or two about log-based Change Data Capture on commonly used OLTP databases including Oracle, SQL Server and IBM DB2. We also built optimizations to integrate changes efficiently into columnar and/or MPP databases. For the benefit of not only cloud users by default the technology uses efficient compression to limit network bandwidth, with the option to add encryption on the wire. Last, but for this blog post certainly not least, we recently introduced a Time Key feature to make it extremely easy to store a history of changes on large numbers of tables, into all the supported targets, efficiently. And as a bonus, by default changes are delivered in real-time.
To learn more about what HVR offers, feel free to contact us!