Principle:Heibaiying BigData Notes HBase Table Creation
| Knowledge Sources | |
|---|---|
| Domains | NoSQL, Big_Data |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
HBase tables are defined at creation time by their column families, which determine the physical storage layout and properties of the data they contain.
Description
In HBase, a table is a sparse, distributed, sorted map indexed by row key. Unlike relational databases where columns are defined in the schema, HBase tables are defined by their column families. A column family:
- Groups related columns together under a common prefix (e.g.,
info:name,info:ageshare theinfofamily). - Determines physical storage -- each column family is stored in its own set of HFiles on HDFS.
- Carries configuration properties such as compression algorithm, block size, time-to-live (TTL), and maximum number of versions.
Table creation is an administrative operation performed through the Admin interface. The process in HBase 2.x uses the builder pattern:
- Obtain an
Adminhandle from the connection. - Check whether the table already exists using
admin.tableExists(). - Build
ColumnFamilyDescriptorobjects for each column family usingColumnFamilyDescriptorBuilder. - Build a
TableDescriptorusingTableDescriptorBuilder, attaching the column family descriptors. - Call
admin.createTable(tableDescriptor).
Important considerations:
- Column families should be few in number (typically 1-3). Having too many column families can degrade performance because each family triggers its own MemStore flush and compaction cycle.
- Column families must be defined at table creation time, although they can be added or removed later via schema alteration (which is an expensive operation).
- The table name must be unique within the HBase namespace.
Usage
Table creation is performed during application setup or schema migration. It is an idempotent-safe operation when guarded by an existence check. Common scenarios include:
- Initial deployment of an application that requires specific HBase tables.
- Automated provisioning scripts for development or testing environments.
- Schema evolution when new column families are needed.
Theoretical Basis
The HBase table creation model reflects its column-family-oriented storage architecture:
Table "users"
|
|-- Column Family "info" -> stored in HFiles under /hbase/data/default/users/info/
| |-- qualifier "name"
| |-- qualifier "email"
|
|-- Column Family "metrics" -> stored in HFiles under /hbase/data/default/users/metrics/
|-- qualifier "login_count"
|-- qualifier "last_seen"
The builder pattern used in HBase 2.x replaces the deprecated HTableDescriptor and HColumnDescriptor classes from 1.x:
HBase 1.x (deprecated):
HTableDescriptor + HColumnDescriptor
HBase 2.x (current):
TableDescriptorBuilder + ColumnFamilyDescriptorBuilder
The builder pattern ensures immutability of the resulting descriptor objects and provides a fluent API for setting properties.