Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Datahub project Datahub Stack Lifecycle Management

From Leeroopedia
Revision as of 17:38, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Datahub_project_Datahub_Stack_Lifecycle_Management.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Field Value
Principle Name Stack Lifecycle Management
Namespace Datahub_project_Datahub
Workflow Docker_Quickstart_Deployment
Type Principle
Last Updated 2026-02-10
Source Repository datahub-project/datahub
Domains Deployment, Docker, Metadata_Management

Overview

The set of operations for stopping, destroying, backing up, and restoring a DataHub Docker deployment. Stack lifecycle management provides control over the full deployment lifecycle: stop (pause without data loss), nuke (complete teardown including volumes), backup (SQL dump), and restore (reload from backup). These operations manage both containers and persistent data volumes.

Description

A local DataHub deployment is a stateful system -- metadata is persisted in MySQL and indexed in Elasticsearch. Managing the lifecycle of this deployment requires operations that handle both the container runtime and the persistent data.

Lifecycle States

The DataHub Docker stack transitions through the following states:

[Not Deployed] --quickstart--> [Running] --stop--> [Stopped]
     ^                            |                     |
     |                            |                     |
     +--------nuke (no data)------+                     |
     |                            |                     |
     +--------nuke (no data)------+---------------------+
                                  |
                              [backup]--> [Backup File]
                                  ^
                              [restore]<-- [Backup File]

Stop

Stops all running containers without removing them or their data volumes. This is equivalent to docker compose stop. Containers can be restarted later with datahub docker quickstart without re-downloading images or losing data.

Nuke

Complete teardown of the DataHub deployment. Removes:

  1. Containers -- All containers matching the compose project label
  2. Volumes -- All data volumes (MySQL data, Elasticsearch indices, Kafka data, Zookeeper data) unless --keep-data is specified
  3. Networks -- All Docker networks created by the compose project

The nuke operation also handles legacy volume naming conventions (e.g., datahub_mysqldata, datahub_esdata) for backward compatibility.

Backup

Creates a MySQL dump of the DataHub metadata database. The backup is performed by executing mysqldump inside the running MySQL container and writing the output to a local file (default: ~/.datahub/quickstart/backup.sql).

Restore

Restores metadata from a backup file. This is a two-phase process:

  1. Primary restore -- Loads the SQL dump back into MySQL
  2. Index rebuild -- Runs the RestoreIndices upgrade job to rebuild Elasticsearch indices from the restored database

The index rebuild can be skipped with --no-restore-indices or run independently with --restore-indices.

Usage

When managing a local DataHub deployment lifecycle -- stopping for maintenance, destroying for clean restart, or preserving data.

# Stop without destroying data
datahub docker quickstart --stop

# Complete teardown (removes all data)
datahub docker nuke

# Teardown but keep data volumes
datahub docker nuke --keep-data

# Backup metadata
datahub docker quickstart --backup
datahub docker quickstart --backup --backup-file ~/my-backup.sql

# Restore from backup (includes index rebuild)
datahub docker quickstart --restore
datahub docker quickstart --restore --restore-file ~/my-backup.sql

# Restore without rebuilding indices
datahub docker quickstart --restore --no-restore-indices

# Only rebuild indices (no primary restore)
datahub docker quickstart --restore-indices

Theoretical Basis

This principle follows the lifecycle management pattern -- stateful services require explicit operations for each state transition (running to stopped to destroyed) with data persistence options. The pattern addresses three concerns:

  1. State preservation -- Stopping vs. destroying determines whether data survives the operation
  2. Data portability -- Backup/restore enables moving data between environments or recovering from failures
  3. Clean slate -- Nuke provides a guaranteed clean starting point for troubleshooting or version upgrades

The backup/restore mechanism specifically addresses the challenge of stateful container upgrades where data schema changes between versions may require explicit migration through database dump and reload.

Knowledge Sources

Related Pages

Implementation:Datahub_project_Datahub_Docker_CLI_Lifecycle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment