Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Heibaiying BigData Notes HBase Data Deletion

From Leeroopedia


Knowledge Sources
Domains NoSQL, Big_Data
Last Updated 2026-02-10 10:00 GMT

Overview

HBase supports deleting individual cells, entire rows, or whole tables, where cell and row deletions use tombstone markers that are resolved during compaction, and table deletion requires disabling the table first.

Description

Deletion in HBase operates at three distinct levels, each with different semantics and administrative requirements:

1. Cell Deletion:

Deleting a specific cell (identified by row key, column family, and qualifier) inserts a tombstone marker rather than physically removing the data. The tombstone indicates that the cell should be treated as deleted for all timestamps up to the marker's timestamp. The actual data is only physically removed during major compaction, when HBase merges HFiles and discards tombstoned entries.

2. Row Deletion:

Deleting an entire row places tombstone markers for all cells in the row. A Delete object constructed with just a row key will mark all column families and qualifiers in that row as deleted. Like cell deletion, the physical removal occurs during major compaction.

3. Table Deletion:

Deleting a table is an administrative operation that permanently removes the table and all its data. HBase requires a two-step process:

  1. Disable the table -- This takes the table offline, closing all regions and flushing MemStores. The table becomes inaccessible for reads and writes.
  2. Delete the table -- This physically removes the table metadata and data files from HDFS.

This two-step process is a safety measure to prevent accidental deletion of tables that are actively serving traffic.

Important characteristics of HBase deletion:

  • Tombstone-based -- Deletes are not immediate physical removals; they are logical markers.
  • Version-aware -- Delete markers interact with HBase's versioning system; they mask all versions of the affected cells.
  • Eventually consistent on disk -- Physical space reclamation occurs only during major compaction.
  • Irreversible for tables -- Once a table is deleted, its data cannot be recovered (unless HDFS snapshots or backups exist).

Usage

  • Use cell deletion to remove specific column values while preserving the rest of the row.
  • Use row deletion to remove an entire entity represented by a row key.
  • Use table deletion during schema cleanup, decommissioning of features, or in test environments for teardown.

Always verify that a table is no longer needed before deleting it, as the operation is destructive and irreversible.

Theoretical Basis

HBase's deletion model is based on the Log-Structured Merge (LSM) tree architecture:

Delete Operation:
    Client Delete -> RegionServer
        |-- Write tombstone to WAL
        |-- Write tombstone to MemStore
        (tombstone stored alongside regular data)

Read-time behavior:
    Get/Scan encounters tombstone -> cell is masked (not returned)

Compaction-time behavior:
    Major compaction -> tombstoned cells are physically removed
    Minor compaction -> tombstones may be propagated but data may persist

Table deletion lifecycle:

Table State: ENABLED (serving reads/writes)
    |
    v  admin.disableTable()
Table State: DISABLED (offline, regions closed, MemStores flushed)
    |
    v  admin.deleteTable()
Table State: DELETED (metadata and HDFS files removed)

The disable-before-delete requirement ensures:

  • All in-memory data (MemStore) is flushed before removal.
  • No active region assignments exist that could conflict with deletion.
  • The operation is explicit and intentional, reducing the risk of accidental data loss.

Space reclamation timeline:

Delete Type Logical Effect Physical Removal
Cell/Row delete Immediate (tombstone hides data) Next major compaction
Table delete Immediate (table inaccessible after disable) Immediate (HDFS files removed on deleteTable)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment