Principle:Datahub project Datahub Stack Lifecycle Management
| Field | Value |
|---|---|
| Principle Name | Stack Lifecycle Management |
| Namespace | Datahub_project_Datahub |
| Workflow | Docker_Quickstart_Deployment |
| Type | Principle |
| Last Updated | 2026-02-10 |
| Source Repository | datahub-project/datahub |
| Domains | Deployment, Docker, Metadata_Management |
Overview
The set of operations for stopping, destroying, backing up, and restoring a DataHub Docker deployment. Stack lifecycle management provides control over the full deployment lifecycle: stop (pause without data loss), nuke (complete teardown including volumes), backup (SQL dump), and restore (reload from backup). These operations manage both containers and persistent data volumes.
Description
A local DataHub deployment is a stateful system -- metadata is persisted in MySQL and indexed in Elasticsearch. Managing the lifecycle of this deployment requires operations that handle both the container runtime and the persistent data.
Lifecycle States
The DataHub Docker stack transitions through the following states:
[Not Deployed] --quickstart--> [Running] --stop--> [Stopped]
^ | |
| | |
+--------nuke (no data)------+ |
| | |
+--------nuke (no data)------+---------------------+
|
[backup]--> [Backup File]
^
[restore]<-- [Backup File]
Stop
Stops all running containers without removing them or their data volumes. This is equivalent to docker compose stop. Containers can be restarted later with datahub docker quickstart without re-downloading images or losing data.
Nuke
Complete teardown of the DataHub deployment. Removes:
- Containers -- All containers matching the compose project label
- Volumes -- All data volumes (MySQL data, Elasticsearch indices, Kafka data, Zookeeper data) unless
--keep-datais specified - Networks -- All Docker networks created by the compose project
The nuke operation also handles legacy volume naming conventions (e.g., datahub_mysqldata, datahub_esdata) for backward compatibility.
Backup
Creates a MySQL dump of the DataHub metadata database. The backup is performed by executing mysqldump inside the running MySQL container and writing the output to a local file (default: ~/.datahub/quickstart/backup.sql).
Restore
Restores metadata from a backup file. This is a two-phase process:
- Primary restore -- Loads the SQL dump back into MySQL
- Index rebuild -- Runs the
RestoreIndicesupgrade job to rebuild Elasticsearch indices from the restored database
The index rebuild can be skipped with --no-restore-indices or run independently with --restore-indices.
Usage
When managing a local DataHub deployment lifecycle -- stopping for maintenance, destroying for clean restart, or preserving data.
# Stop without destroying data
datahub docker quickstart --stop
# Complete teardown (removes all data)
datahub docker nuke
# Teardown but keep data volumes
datahub docker nuke --keep-data
# Backup metadata
datahub docker quickstart --backup
datahub docker quickstart --backup --backup-file ~/my-backup.sql
# Restore from backup (includes index rebuild)
datahub docker quickstart --restore
datahub docker quickstart --restore --restore-file ~/my-backup.sql
# Restore without rebuilding indices
datahub docker quickstart --restore --no-restore-indices
# Only rebuild indices (no primary restore)
datahub docker quickstart --restore-indices
Theoretical Basis
This principle follows the lifecycle management pattern -- stateful services require explicit operations for each state transition (running to stopped to destroyed) with data persistence options. The pattern addresses three concerns:
- State preservation -- Stopping vs. destroying determines whether data survives the operation
- Data portability -- Backup/restore enables moving data between environments or recovering from failures
- Clean slate -- Nuke provides a guaranteed clean starting point for troubleshooting or version upgrades
The backup/restore mechanism specifically addresses the challenge of stateful container upgrades where data schema changes between versions may require explicit migration through database dump and reload.
Knowledge Sources
- DataHub GitHub Repository
- DataHub Official Documentation
- Source file:
metadata-ingestion/src/datahub/cli/docker_cli.py
Related Pages
- Implemented by: Datahub_project_Datahub_Docker_CLI_Lifecycle