What is hbase?

20 Jan 2024

HBase is a distributed column oriented database built on HDFS. Best for when you need random access to large data.

The canonical HBase use case is the webtable, a table of crawled web pages and their attributes, with billions of rows.

HBase was modelled after Googles Bigtable.

Data Models

HBase is built of clients, workers and a master.

Master node orchestrates cluster of regionserver workers.
regionservers carry zero or more regions
HBase relies on zookeeper as authority on cluster state. Such as host vitals and current cluster master.

HBase tries to mirror Hadoop when it can. E.g with configs.

What does it look scaling a RDBMS?

Move from local station to remotely hosted MySQL instance with well defined schema
Add memcached to cache common queries. Reads are not ACID anymore.
Scale MySQL vertically with a beefy server
Denormalise data to reduce joins
Stop doing server side compute
Periodically premateralize the most complete queries and stop joining in most cases
Drop secondary indexes and triggers
Maybe do more partitioning?

HBase

HDFS was built for Hadoop MapReduce not HBase, so it runs into issues.

oboe