Problems don’t always originate where they surface
Most software is pretty complex to ensure it works well irrespective of the many ways it can be used. This applies particularly to flexible platform technologies such as operating systems, databases, and also replication technologies. Engineers try to anticipate any and all possible use scenarios so that the software can be robust and easy to use. However, as the number of ways to use the software increases it is more and more difficult for engineers to anticipate what the software may be used for, and with that what may go wrong using it. Well-designed error messages are a way to guide the user to root cause analysis and/or problem resolution.
Recently I have come across a few scenarios where it was not necessarily an HVR issue causing the problem but the problem surfaced there.
Example One: Index Corruption as Root Cause
The first example was a failed update statement. There was of course an HVR error code that came with the failed update, but inside it was also SQL Server error 8630 HRESULT message. A quick internet search indicated that index corruption may be the root cause of the error. Sure enough, a call to dbcc for the table in question showed some data related issues, and again dbcc was my friend to resolve the corruption, and HVR’s update ran fine from there onward. Did HVR cause the corruption? Unlikely, given on the SQL Server target HVR runs regular SQL statements like it does on any other database.
Example Two: Data Type Coercion as Root Cause
The second example was a real-time replication scenario based on an SAP source database. SAP has a long history of incremental improvements but also legacy which has resulted in a database design that is arguably sub-optimal for today’s database technology. For example, and relevant in the context of this use case, many columns containing only numeric data – per SAP’s dictionary – are stored using a character-based data type. In a modern analytical database it is important you choose the right data type for maximum performance and optimum benefit of hardware and software-based optimizations. So numeric columns should be stored in a numeric data type (e.g. a decimal or integer). In this case the customer does not want empty values in the table but instead use 0 as a default value, which is set during the integration into the target. To make matters more complicated, the customer uses a highly optimized columnar database and with that the Burst optimization in HVR to get maximum transaction throughput in the target database.
Having set this all up, the system ran fine for a long time but then suddenly the integration would fail with an odd error message about data type coercion. It turned out that in the very rare case there was an empty value coming from the source database the coercion of the default value into the burst table failed. Of course, this error only surfaced when there was a null value during real-time replication. And of course, the solution was to simply change the representation of the default value to be correctly coerced into the character data type in the burst table (that matches data types with the source database).
There are more examples similar to this that may not even result in an error condition but represent an issue. For example, some time ago I wrote about slow replication performance which can have multiple underlying causes such as a missing or very inefficient index.
In all cases, look for the root cause!
Do you have a data replication project on the horizon? We are here to help!