Implementation:Datahub project Datahub Docker CLI Lifecycle
| Field | Value |
|---|---|
| Implementation Name | Docker CLI Lifecycle |
| Namespace | Datahub_project_Datahub |
| Workflow | Docker_Quickstart_Deployment |
| Type | API Doc |
| Language | Python |
| Last Updated | 2026-02-10 |
| Source Repository | datahub-project/datahub |
| Source File | metadata-ingestion/src/datahub/cli/docker_cli.py, lines 906-940 (nuke) and lines 255-436, 594-641 (stop/backup/restore within quickstart)
|
| Domains | Deployment, Docker, Metadata_Management |
Overview
The lifecycle management functions provide stop, nuke, backup, and restore operations for the DataHub Docker deployment. These are implemented as the nuke() command and as behavioral branches within the quickstart() command (stop, backup, restore).
Functions
nuke()
@docker.command()
@telemetry.with_telemetry()
@click.option(
"--keep-data",
type=bool,
is_flag=True,
default=False,
help="Delete data volumes",
)
def nuke(keep_data: bool) -> None:
"""Remove all Docker containers, networks, and volumes associated with DataHub."""
CLI Usage:
datahub docker nuke [--keep-data]
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
--keep-data |
bool | False | If set, preserves data volumes during teardown. |
Implementation (lines 906-940):
with get_docker_client() as client:
# 1. Remove all containers matching the project filter
for container in client.containers.list(
all=True, filters=DATAHUB_COMPOSE_PROJECT_FILTER
):
container.remove(v=True, force=True)
# 2. Remove volumes (unless --keep-data)
if not keep_data:
for filter in DATAHUB_COMPOSE_LEGACY_VOLUME_FILTERS + [
DATAHUB_COMPOSE_PROJECT_FILTER
]:
for volume in client.volumes.list(filters=filter):
volume.remove(force=True)
# 3. Remove networks
for network in client.networks.list(filters=DATAHUB_COMPOSE_PROJECT_FILTER):
network.remove()
The nuke operation uses the Docker SDK directly (not Docker Compose) to remove resources. It handles both current and legacy volume naming conventions.
Legacy Volume Filters:
| Volume Name | Service |
|---|---|
datahub_neo4jdata |
Neo4j (legacy graph store) |
datahub_mysqldata |
MySQL primary store |
datahub_zkdata |
Zookeeper |
datahub_esdata |
Elasticsearch |
datahub_cassandradata |
Cassandra (legacy) |
datahub_broker |
Kafka broker |
Stop (within quickstart)
CLI Usage:
datahub docker quickstart --stop
Implementation:
When --stop is passed to quickstart(), it delegates to _attempt_stop() (lines 255-292):
def _attempt_stop(quickstart_compose_file: List[pathlib.Path]) -> None:
compose = _docker_compose_v2()
base_command = [
*compose,
"--profile", "quickstart",
*itertools.chain.from_iterable(
("-f", f"{path}") for path in compose_files_for_stopping
),
"-p", DOCKER_COMPOSE_PROJECT_NAME,
]
subprocess.run(
[*base_command, "stop"],
check=True,
env=_docker_subprocess_env(),
)
Uses docker compose stop which halts containers without removing them or their volumes.
Backup (within quickstart)
CLI Usage:
datahub docker quickstart --backup [--backup-file PATH]
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
--backup |
bool | False | Trigger backup operation. |
--backup-file |
Path | ~/.datahub/quickstart/backup.sql |
Output file for the backup. |
Implementation (lines 295-311):
def _backup(backup_file: str) -> int:
resolved_backup_file = os.path.expanduser(backup_file)
dirname = os.path.dirname(resolved_backup_file)
os.makedirs(dirname, exist_ok=True)
result = subprocess.run(
[
"bash", "-c",
f"docker exec {DOCKER_COMPOSE_PROJECT_NAME}-mysql-1 "
f"mysqldump -u root -pdatahub datahub > {resolved_backup_file}",
]
)
return result.returncode
Executes mysqldump inside the MySQL container to dump the datahub database. Uses default MySQL credentials (root/datahub).
Restore (within quickstart)
CLI Usage:
datahub docker quickstart --restore [--restore-file PATH] [--no-restore-indices]
datahub docker quickstart --restore-indices
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
--restore |
bool | False | Trigger primary database restore. |
--restore-file |
str | ~/.datahub/quickstart/backup.sql |
Input file for the restore. |
--restore-indices |
bool | False | Rebuild search indices from database. |
--no-restore-indices |
bool | False | Skip index rebuild during restore. |
Implementation (lines 314-436):
The _restore() function performs two phases:
Phase 1 -- Primary Database Restore:
with open(resolved_restore_file) as fp:
result = subprocess.run(
[
"bash", "-c",
f"docker exec -i {DOCKER_COMPOSE_PROJECT_NAME}-mysql-1 "
f"bash -c 'mysql -uroot -pdatahub datahub'",
],
stdin=fp,
capture_output=True,
)
Pipes the SQL dump file into mysql running inside the MySQL container.
Phase 2 -- Index Rebuild:
Pulls and runs the acryl-datahub-upgrade Docker image with the RestoreIndices upgrade command. The container connects to the DataHub Docker network and uses environment variables for database and Elasticsearch connection details.
# Simplified
docker run --network datahub_network --env-file {env_fp.name} \
acryl-datahub-upgrade:${DATAHUB_VERSION:-head} \
-u RestoreIndices -a clean
Restore Option Validation:
The valid_restore_options() function (lines 841-861) validates that flag combinations are sensible:
--no-restore-indiceswithout--restoreis invalid--restore-indiceswith--no-restore-indicesis contradictory--restorewith--restore-indicesis redundant (restore implies index rebuild)
Constants
| Constant | Value | Description |
|---|---|---|
DOCKER_COMPOSE_PROJECT_NAME |
"datahub" (default) |
From DATAHUB_COMPOSE_PROJECT_NAME env var
|
DATAHUB_COMPOSE_PROJECT_FILTER |
{"label": "com.docker.compose.project=datahub"} |
Docker label filter for project containers |
Usage Examples
# Stop containers (preserves data and containers)
datahub docker quickstart --stop
# Complete teardown (removes everything)
datahub docker nuke
# Teardown but keep data volumes
datahub docker nuke --keep-data
# Backup current database
datahub docker quickstart --backup
# Backup to custom location
datahub docker quickstart --backup --backup-file ~/datahub-backup-2026-02-10.sql
# Full restore (database + index rebuild)
datahub docker quickstart --restore
# Restore from custom file
datahub docker quickstart --restore --restore-file ~/datahub-backup-2026-02-10.sql
# Restore database only (skip index rebuild)
datahub docker quickstart --restore --no-restore-indices
# Only rebuild indices (database already restored)
datahub docker quickstart --restore-indices