Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Datahub project Datahub Docker CLI Lifecycle

From Leeroopedia


Field Value
Implementation Name Docker CLI Lifecycle
Namespace Datahub_project_Datahub
Workflow Docker_Quickstart_Deployment
Type API Doc
Language Python
Last Updated 2026-02-10
Source Repository datahub-project/datahub
Source File metadata-ingestion/src/datahub/cli/docker_cli.py, lines 906-940 (nuke) and lines 255-436, 594-641 (stop/backup/restore within quickstart)
Domains Deployment, Docker, Metadata_Management

Overview

The lifecycle management functions provide stop, nuke, backup, and restore operations for the DataHub Docker deployment. These are implemented as the nuke() command and as behavioral branches within the quickstart() command (stop, backup, restore).

Functions

nuke()

@docker.command()
@telemetry.with_telemetry()
@click.option(
    "--keep-data",
    type=bool,
    is_flag=True,
    default=False,
    help="Delete data volumes",
)
def nuke(keep_data: bool) -> None:
    """Remove all Docker containers, networks, and volumes associated with DataHub."""

CLI Usage:

datahub docker nuke [--keep-data]

Parameters:

Parameter Type Default Description
--keep-data bool False If set, preserves data volumes during teardown.

Implementation (lines 906-940):

with get_docker_client() as client:
    # 1. Remove all containers matching the project filter
    for container in client.containers.list(
        all=True, filters=DATAHUB_COMPOSE_PROJECT_FILTER
    ):
        container.remove(v=True, force=True)

    # 2. Remove volumes (unless --keep-data)
    if not keep_data:
        for filter in DATAHUB_COMPOSE_LEGACY_VOLUME_FILTERS + [
            DATAHUB_COMPOSE_PROJECT_FILTER
        ]:
            for volume in client.volumes.list(filters=filter):
                volume.remove(force=True)

    # 3. Remove networks
    for network in client.networks.list(filters=DATAHUB_COMPOSE_PROJECT_FILTER):
        network.remove()

The nuke operation uses the Docker SDK directly (not Docker Compose) to remove resources. It handles both current and legacy volume naming conventions.

Legacy Volume Filters:

Volume Name Service
datahub_neo4jdata Neo4j (legacy graph store)
datahub_mysqldata MySQL primary store
datahub_zkdata Zookeeper
datahub_esdata Elasticsearch
datahub_cassandradata Cassandra (legacy)
datahub_broker Kafka broker

Stop (within quickstart)

CLI Usage:

datahub docker quickstart --stop

Implementation:

When --stop is passed to quickstart(), it delegates to _attempt_stop() (lines 255-292):

def _attempt_stop(quickstart_compose_file: List[pathlib.Path]) -> None:
    compose = _docker_compose_v2()
    base_command = [
        *compose,
        "--profile", "quickstart",
        *itertools.chain.from_iterable(
            ("-f", f"{path}") for path in compose_files_for_stopping
        ),
        "-p", DOCKER_COMPOSE_PROJECT_NAME,
    ]
    subprocess.run(
        [*base_command, "stop"],
        check=True,
        env=_docker_subprocess_env(),
    )

Uses docker compose stop which halts containers without removing them or their volumes.

Backup (within quickstart)

CLI Usage:

datahub docker quickstart --backup [--backup-file PATH]

Parameters:

Parameter Type Default Description
--backup bool False Trigger backup operation.
--backup-file Path ~/.datahub/quickstart/backup.sql Output file for the backup.

Implementation (lines 295-311):

def _backup(backup_file: str) -> int:
    resolved_backup_file = os.path.expanduser(backup_file)
    dirname = os.path.dirname(resolved_backup_file)
    os.makedirs(dirname, exist_ok=True)
    result = subprocess.run(
        [
            "bash", "-c",
            f"docker exec {DOCKER_COMPOSE_PROJECT_NAME}-mysql-1 "
            f"mysqldump -u root -pdatahub datahub > {resolved_backup_file}",
        ]
    )
    return result.returncode

Executes mysqldump inside the MySQL container to dump the datahub database. Uses default MySQL credentials (root/datahub).

Restore (within quickstart)

CLI Usage:

datahub docker quickstart --restore [--restore-file PATH] [--no-restore-indices]
datahub docker quickstart --restore-indices

Parameters:

Parameter Type Default Description
--restore bool False Trigger primary database restore.
--restore-file str ~/.datahub/quickstart/backup.sql Input file for the restore.
--restore-indices bool False Rebuild search indices from database.
--no-restore-indices bool False Skip index rebuild during restore.

Implementation (lines 314-436):

The _restore() function performs two phases:

Phase 1 -- Primary Database Restore:

with open(resolved_restore_file) as fp:
    result = subprocess.run(
        [
            "bash", "-c",
            f"docker exec -i {DOCKER_COMPOSE_PROJECT_NAME}-mysql-1 "
            f"bash -c 'mysql -uroot -pdatahub datahub'",
        ],
        stdin=fp,
        capture_output=True,
    )

Pipes the SQL dump file into mysql running inside the MySQL container.

Phase 2 -- Index Rebuild:

Pulls and runs the acryl-datahub-upgrade Docker image with the RestoreIndices upgrade command. The container connects to the DataHub Docker network and uses environment variables for database and Elasticsearch connection details.

# Simplified
docker run --network datahub_network --env-file {env_fp.name} \
    acryl-datahub-upgrade:${DATAHUB_VERSION:-head} \
    -u RestoreIndices -a clean

Restore Option Validation:

The valid_restore_options() function (lines 841-861) validates that flag combinations are sensible:

  • --no-restore-indices without --restore is invalid
  • --restore-indices with --no-restore-indices is contradictory
  • --restore with --restore-indices is redundant (restore implies index rebuild)

Constants

Constant Value Description
DOCKER_COMPOSE_PROJECT_NAME "datahub" (default) From DATAHUB_COMPOSE_PROJECT_NAME env var
DATAHUB_COMPOSE_PROJECT_FILTER {"label": "com.docker.compose.project=datahub"} Docker label filter for project containers

Usage Examples

# Stop containers (preserves data and containers)
datahub docker quickstart --stop

# Complete teardown (removes everything)
datahub docker nuke

# Teardown but keep data volumes
datahub docker nuke --keep-data

# Backup current database
datahub docker quickstart --backup

# Backup to custom location
datahub docker quickstart --backup --backup-file ~/datahub-backup-2026-02-10.sql

# Full restore (database + index rebuild)
datahub docker quickstart --restore

# Restore from custom file
datahub docker quickstart --restore --restore-file ~/datahub-backup-2026-02-10.sql

# Restore database only (skip index rebuild)
datahub docker quickstart --restore --no-restore-indices

# Only rebuild indices (database already restored)
datahub docker quickstart --restore-indices

Knowledge Sources

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment