Columnar databases: How to Perform Real-Time Business Intelligence
How can you perform high-performance real-time Business Intelligence on a columnar database?
Columnar databases – in contrast to row-based databases – physically store data for columns together in data blocks (both on disk and in-memory). Queries retrieving data from many rows but only a subset of the columns in the tables in the query benefit a lot from columnar data storage. This type of workload is typical for a reporting, Business Intelligence or analytics workload.
Update and delete transactions, on the other hand, are much faster on a row-based database because data for the entire row is stored in one or only a few database blocks. Columnar databases – especially ones that don’t use indexes – show very poor update and delete performance because it takes a column scan on the primary key column(s) to identify the row, and then one or more scans to find the data to be modified or deleted.
Columnar database vendors, of course, know that updates and deletes on their database are generally a lot slower compared to row-based databases so most of them will have implemented an approach to circumvent the issue. However all columnar databases HVR has worked with perform update and delete transactions much slower than their row-based counterparts.
So then a natural question is: how can you perform high performance real-time Business Intelligence on a columnar database?
- Create staging tables to store incremental transactions before these are applied using batch statements to the target tables, in near real-time.
- Instead of updating or deleting a single row per statement the batch statement may, depending on the workload, update or delete hundreds or thousands of rows per statement. The result of course is dramatically faster performance.
- On top of this the burst optimization will also automatically coalesce changes so for example multiple updates to the same row become a single update and for example an insert followed by a delete is not processed at all.
The performance benefits of the burst optimization cannot be quantified without a test, but generally tables with more columns and a workload consisting of relatively more updates and deletes versus inserts benefit more from the burst optimization than tables with fewer columns and mostly inserts. The cost is a slight increase in latency but still measured in single-digit seconds.
The real-time Business Intelligence demo on our website uses the burst optimization and without it, latency would only increase. Don’t believe this?
Contact us to try!