Columnar vs. Row-based Databases
2 Key Differences
And the impact on real-time reporting
1. Databases using columnar storage have a major benefit over equivalent databases using row-based storage: query performance for analytical queries that crunch through lots of data are much faster. Data is only accessed if required to compute the query result. On the other hand row-based database solutions retrieve the entire row, even if only few out of many columns are required to run the query. Add that multiple optimizations in modern CPUs (e.g. Single Instruction, Multiple Data (SIMD)operations, super-scalar functions, hardware accelerated string-based operations) work well on columnar data but often don’t benefit queries on row-based data.
- Most modern analytical and reporting relational databases such as Amazon Redshift, SAP HANA and Actian’s Vector database use columnar storage. In fact, even Oracle and Microsoft SQL Server, traditionally row-based databases, provide columnar capabilities (at a cost) to improve query performance.
- If you are looking for a high performance solution to support reporting through SQL-based tools then you should be considering a columnar database. But…
2. Single-row operations on columnar databases are generally less efficient, and with that, despite attempts from database programmers to mitigate the issue, (a lot) slower compared to row-based databases. Columnar databases prefer to process inserts, updates and deletes (or merges) as batch operations. Row-based databases, built for fast and efficient On-Line Transaction Processing (OLTP) in highly concurrent environments, often heavily indexed, are very good at processing single-row operations.
- End users always want the best performance and up-to-date data, and management typically wants lowest cost. Finding a balance between these requirements is not easy. If cost was not an issue then the solution may have been to put in the biggest OLTP system around (Oracle Exadata comes to mind…) and pay for columnar optimizations on the system to achieve fast transaction processing and high performance queries on a single database. More realistically your organization may be thinking alternatives like Amazon Redshift to balance performance and cost. But what about up-to-date data?
- Keeping another database in sync can be achieved through real-time data replication. Most data replication solutions perform efficient log-based Change Data Capture (CDC) on source OLTP databases to replay the operation using SQL statements on the destination system. For a columnar database target make sure to ask about performance optimizations to ensure the system is up-to-date in (near) real-time, despite high transaction volumes on the source database(s).
Feel free to head over to our solutions page to read about what HVR can do for your business.