How Log-Based Change Data Capture Revolutionized Data Replication
HVR + Log-Based CDC for Oracle
I remember the very first time I heard the phrase Change Data Capture (CDC), specifically it was Oracle CDC. I was in El Segundo, CA, right across from LAX airport. I was teaching an Oracle DBA Fundamentals class to 24 students. During the break, a student asked me if I could describe a bit more about how CDC works.
Since Oracle redo logs, archive logs, along with backups and recovery were my favorite topics to teach, I was more than happy to oblige.
I explained that Oracle redo logs contain the before and after images for all changes made in the Oracle database. Oracle redo logs were created to make sure that the Oracle database could recover in the event of a failure. The redo logs contain a history of transactions. Old redo logs become archive logs. The backup, archive logs, and the current redo logs are needed in the case of a total or simple failure.
When the database is recovered after a crash, the redo logs “redo” the last transactions so the committed data is recovered and the database is back to a consistent state. The concept of taking data from Oracle in the form of ‘changes’ has been around for quite a few years. There is so much information stored in Oracle redo logs that organizations have been dying to free that data.
Database Administrators (DBAs) knew that data was trapped in redo and archive logs. At the time, triggers were the only way to gather change data. Oracle DBAs were not a fan of having a lot of triggers on their tables because a trigger fires for every single transaction could conceivably slow the database down.
In addition, many applications run on top of Oracle so it makes sense that customers want to query that data. Moving data to a data warehouse or a data lake often involves an Extract, Transform, Load (ETL) tool. In this day and age, organizations don’t have the luxury to wait for a weekly or daily batch job. They need fresh, relevant data. To resolve this, the ETL job could run more frequently but that’s extremely taxing on the source system.
Enter log-based CDC to the rescue!
In 2001, Oracle changed everything with the release of Oracle 9i. While the primary purpose of the redo logs remained database recovery, the before and after images were now available to be read. Also, Oracle introduced a new feature called Oracle Streams (Oracle Streams has since been deprecated with the release of Oracle 12c and is no longer supported starting in Oracle 19c).
With Oracle Streams, Oracle introduced a product to read redo logs. Soon after, a few third parties also realized the value in reading that data and Oracle CDC became more commonplace. Log-based CDC became a favorite of Oracle DBAs because reading data from the Oracle redo and archive logs did not strain their production systems and allowed them to get the data out of the logs in real-time.
Log-based CDC is coming up on its 19th birthday and it is just as relevant today as it was when it first arrived. Oracle databases are still number one especially when it comes to high-volume OLTP (Online Transaction Processing) databases. Whether the data is from an SAP application, Oracle EBS, or a homegrown application the data is valuable and needs to be replicated to a data lake or data warehouse. Considering speed is more relevant today than it was 19 years ago, having log-based CDC is the proven method to move data in real-time.
HVR is perfectly suited to move that data. HVR uses log-based CDC to move data efficiently from on-premise source databases like Oracle to cloud environments. HVR can move the data to an Oracle target (homogeneous replication) or to a non-Oracle target (heterogeneous replication). HVR has built-in encryption and compression so that moving data to the cloud can be done safely, securely, using as little network bandwidth as possible.
HVR has a unique hub and spoke architecture allows for capture once and deliver to many configurations. This allows organizations to be flexible with how they share data. For example, you could move data to another Oracle system that is only used for testing. You could send a subset of data to a Snowflake database that is used as a data warehouse. Or send another subset to a specifically designed database in an AWS S3 bucket. Once the data is removed from the Oracle redo logs, HVR can move it, augment the data, and partition the data in ways that suit the business.
Still not convinced? We invite you to take a Test Drive and get hands-on experience replicating data with HVR.