Implementation:Datahub project Datahub Datahub Ingest Dry Run

Property	Value
Page Type	Implementation (API Doc)
Workflow	Metadata_Ingestion_Pipeline
API	CLI `datahub ingest run -c recipe.yml --dry-run` or `run()` click command
Source File	`metadata-ingestion/src/datahub/cli/ingest_cli.py`
Repository	https://github.com/datahub-project/datahub
Implements	Principle:Datahub_project_Datahub_Connection_Validation
Last Updated	2026-02-09 17:00 GMT

Overview

Description

The datahub ingest run command is the primary CLI entry point for executing metadata ingestion pipelines. It supports several validation-oriented flags that allow users to test connectivity and verify configuration without committing metadata to the sink:

--dry-run: Runs the full pipeline (source extraction, transformation, work unit generation) but skips all sink writes. This exercises the entire data flow without persisting any metadata.
--test-source-connection: Bypasses the pipeline entirely and invokes the source's dedicated connection test via ConnectionManager.test_source_connection(). Returns a structured connection report.
--preview: Limits the number of work units extracted from the source to a configurable count (default 10 via --preview-workunits), enabling rapid feedback on the metadata being produced.

The command is implemented as a Click command function decorated with telemetry tracking and upgrade checking. It loads the recipe file, resolves environment variables, and delegates to Pipeline.create() and Pipeline.run() for execution.

Usage

# Full dry run - exercises entire pipeline without writing to sink
datahub ingest run -c recipe.yml --dry-run

# Preview mode - extract only 5 work units, skip sink writes
datahub ingest run -c recipe.yml --dry-run --preview --preview-workunits 5

# Test source connection only - fastest validation
datahub ingest run -c recipe.yml --test-source-connection

# Test connection and write report to file
datahub ingest run -c recipe.yml --test-source-connection --report-to connection_report.json

# Strict warnings mode - treat warnings as errors (non-zero exit code)
datahub ingest run -c recipe.yml --strict-warnings

Code Reference

Source Location

File	Lines	Description
`metadata-ingestion/src/datahub/cli/ingest_cli.py`	L45-258	`run()` click command definition with all options and execution logic
`metadata-ingestion/src/datahub/cli/ingest_cli.py`	L481-495	`_test_source_connection()` helper function
`metadata-ingestion/src/datahub/ingestion/run/connection.py`	—	`ConnectionManager.test_source_connection()` implementation

Signature

@ingest.command()
@click.option("-c", "--config", type=click.Path(dir_okay=False), required=True,
              help="Config file in .toml or .yaml format.")
@click.option("-n", "--dry-run", type=bool, is_flag=True, default=False,
              help="Perform a dry run of the ingestion, essentially skipping writing to sink.")
@click.option("--preview", type=bool, is_flag=True, default=False,
              help="Perform limited ingestion from the source to the sink to get a quick preview.")
@click.option("--preview-workunits", type=int, default=10,
              help="The number of workunits to produce for preview.")
@click.option("--strict-warnings/--no-strict-warnings", default=False,
              help="If enabled, ingestion runs with warnings will yield a non-zero error code")
@click.option("--test-source-connection", type=bool, is_flag=True, default=False,
              help="When set, ingestion will only test the source connection details from the recipe")
@click.option("--report-to", type=str, default="datahub",
              help="Provide a destination to send a structured report from the run.")
@click.option("--no-default-report", type=bool, is_flag=True, default=False,
              help="Turn off default reporting of ingestion results to DataHub")
@click.option("--no-spinner", type=bool, is_flag=True, default=False,
              help="Turn off spinner")
@click.option("--no-progress", type=bool, is_flag=True, default=False,
              help="If enabled, mute intermediate progress ingestion reports")
def run(
    config: str,
    dry_run: bool,
    preview: bool,
    strict_warnings: bool,
    preview_workunits: int,
    test_source_connection: bool,
    report_to: Optional[str],
    no_default_report: bool,
    no_spinner: bool,
    no_progress: bool,
    record: bool,
    record_password: Optional[str],
    record_output_path: Optional[str],
    no_s3_upload: bool,
    no_secret_redaction: bool,
) -> None:

Import

from datahub.cli.ingest_cli import ingest

I/O Contract

Direction	Type	Description
Input	`-c/--config` (str, required)	Path to YAML or TOML recipe file
Input	`--dry-run` (bool, flag)	When set, skips all sink writes
Input	`--preview` (bool, flag)	When set, limits work unit extraction to `--preview-workunits` count
Input	`--preview-workunits` (int, default 10)	Number of work units to extract in preview mode
Input	`--test-source-connection` (bool, flag)	When set, tests only the source connection and exits
Input	`--strict-warnings` (bool, flag)	When set, treat warnings as failures (non-zero exit code)
Input	`--report-to` (str, default "datahub")	Destination for structured report; "datahub" sends to server, other values are treated as file paths
Output	Exit code (int)	0 on success, 1 on failure or warnings (if strict mode)
Output	Terminal output	Color-coded pipeline summary with record counts, failures, and warnings
Output	Connection report (JSON)	When `--test-source-connection` is used with `--report-to file.json`

Usage Examples

Example 1: Dry run to validate a new MySQL recipe

# recipe.yml
# source:
#   type: mysql
#   config:
#     host_port: "db.example.com:3306"
#     database: "analytics"
#     username: "${MYSQL_USER}"
#     password: "${MYSQL_PASS}"

export MYSQL_USER=datahub
export MYSQL_PASS=secret

datahub ingest run -c recipe.yml --dry-run
# Output: Pipeline finished successfully; produced 0 events (sink writes skipped)

Example 2: Preview mode to sample metadata from Snowflake

datahub ingest run -c snowflake_recipe.yml --dry-run --preview --preview-workunits 5
# Extracts only 5 work units, prints source report showing sampled entities

Example 3: Test source connection and save report

datahub ingest run -c recipe.yml --test-source-connection --report-to /tmp/conn_report.json
# Exits immediately after testing connection
# Report written to /tmp/conn_report.json

Example 4: Programmatic dry run via Python

from datahub.ingestion.run.pipeline import Pipeline

recipe = {
    "source": {
        "type": "mysql",
        "config": {
            "host_port": "localhost:3306",
            "database": "test_db",
            "username": "root",
            "password": "root",
        },
    },
}

pipeline = Pipeline.create(recipe, dry_run=True, preview_mode=True, preview_workunits=5)
pipeline.run()
ret = pipeline.pretty_print_summary()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment