Column vs Row database

• By Garren Smith
Database Column Store Row Store

Rough Notes for now…

So, broadly speaking, row-oriented databases are more efficient for queries where you want to read all the columns from a single row.

  1. Column-orientated database allows for different shaped data and dynamic columns
    1. Fast for aggregations because it can scan the whole column
    2. Doesn’t need to have indexes as the way columns are stored means that they basically indexes already
    3. Great for tables with lots of columns
    4. More performant with large bulk updates
  2. row-oriented databases are more efficient for queries where you want to read all the columns from a single row.
    1. Will use indexes so similar then to column format but increased overhead to point back to the primary data
    2. Better with smaller tables and more single updates
  3. Casandra is column-orientated

References

  1. https://www.honeycomb.io/blog/why-observability-requires-distributed-column-store
  2. https://www.polarsignals.com/blog/posts/2022/05/04/introducing-arcticdb
  3. https://help.sap.com/docs/SAP_HANA_PLATFORM/6b94445c94ae495c83a19646e7c3fd56/bd2e9b88bb571014b5b7a628fca2a132.html
  4. https://www.scattered-thoughts.net/writing/a-shallow-survey-of-olap-and-htap-query-engines/
  5. https://www.tinybird.co/blog-posts/when-to-use-columnar-database <— this is good
  6. https://news.ycombinator.com/item?id=7846779