Implementation:Datahub project Datahub Datahub Ingest Dry Run
| Property | Value |
|---|---|
| Page Type | Implementation (API Doc) |
| Workflow | Metadata_Ingestion_Pipeline |
| API | CLI datahub ingest run -c recipe.yml --dry-run or run() click command
|
| Source File | metadata-ingestion/src/datahub/cli/ingest_cli.py
|
| Repository | https://github.com/datahub-project/datahub |
| Implements | Principle:Datahub_project_Datahub_Connection_Validation |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Description
The datahub ingest run command is the primary CLI entry point for executing metadata ingestion pipelines. It supports several validation-oriented flags that allow users to test connectivity and verify configuration without committing metadata to the sink:
--dry-run: Runs the full pipeline (source extraction, transformation, work unit generation) but skips all sink writes. This exercises the entire data flow without persisting any metadata.--test-source-connection: Bypasses the pipeline entirely and invokes the source's dedicated connection test viaConnectionManager.test_source_connection(). Returns a structured connection report.--preview: Limits the number of work units extracted from the source to a configurable count (default 10 via--preview-workunits), enabling rapid feedback on the metadata being produced.
The command is implemented as a Click command function decorated with telemetry tracking and upgrade checking. It loads the recipe file, resolves environment variables, and delegates to Pipeline.create() and Pipeline.run() for execution.
Usage
# Full dry run - exercises entire pipeline without writing to sink
datahub ingest run -c recipe.yml --dry-run
# Preview mode - extract only 5 work units, skip sink writes
datahub ingest run -c recipe.yml --dry-run --preview --preview-workunits 5
# Test source connection only - fastest validation
datahub ingest run -c recipe.yml --test-source-connection
# Test connection and write report to file
datahub ingest run -c recipe.yml --test-source-connection --report-to connection_report.json
# Strict warnings mode - treat warnings as errors (non-zero exit code)
datahub ingest run -c recipe.yml --strict-warnings
Code Reference
Source Location
| File | Lines | Description |
|---|---|---|
metadata-ingestion/src/datahub/cli/ingest_cli.py |
L45-258 | run() click command definition with all options and execution logic
|
metadata-ingestion/src/datahub/cli/ingest_cli.py |
L481-495 | _test_source_connection() helper function
|
metadata-ingestion/src/datahub/ingestion/run/connection.py |
— | ConnectionManager.test_source_connection() implementation
|
Signature
@ingest.command()
@click.option("-c", "--config", type=click.Path(dir_okay=False), required=True,
help="Config file in .toml or .yaml format.")
@click.option("-n", "--dry-run", type=bool, is_flag=True, default=False,
help="Perform a dry run of the ingestion, essentially skipping writing to sink.")
@click.option("--preview", type=bool, is_flag=True, default=False,
help="Perform limited ingestion from the source to the sink to get a quick preview.")
@click.option("--preview-workunits", type=int, default=10,
help="The number of workunits to produce for preview.")
@click.option("--strict-warnings/--no-strict-warnings", default=False,
help="If enabled, ingestion runs with warnings will yield a non-zero error code")
@click.option("--test-source-connection", type=bool, is_flag=True, default=False,
help="When set, ingestion will only test the source connection details from the recipe")
@click.option("--report-to", type=str, default="datahub",
help="Provide a destination to send a structured report from the run.")
@click.option("--no-default-report", type=bool, is_flag=True, default=False,
help="Turn off default reporting of ingestion results to DataHub")
@click.option("--no-spinner", type=bool, is_flag=True, default=False,
help="Turn off spinner")
@click.option("--no-progress", type=bool, is_flag=True, default=False,
help="If enabled, mute intermediate progress ingestion reports")
def run(
config: str,
dry_run: bool,
preview: bool,
strict_warnings: bool,
preview_workunits: int,
test_source_connection: bool,
report_to: Optional[str],
no_default_report: bool,
no_spinner: bool,
no_progress: bool,
record: bool,
record_password: Optional[str],
record_output_path: Optional[str],
no_s3_upload: bool,
no_secret_redaction: bool,
) -> None:
Import
from datahub.cli.ingest_cli import ingest
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input | -c/--config (str, required) |
Path to YAML or TOML recipe file |
| Input | --dry-run (bool, flag) |
When set, skips all sink writes |
| Input | --preview (bool, flag) |
When set, limits work unit extraction to --preview-workunits count
|
| Input | --preview-workunits (int, default 10) |
Number of work units to extract in preview mode |
| Input | --test-source-connection (bool, flag) |
When set, tests only the source connection and exits |
| Input | --strict-warnings (bool, flag) |
When set, treat warnings as failures (non-zero exit code) |
| Input | --report-to (str, default "datahub") |
Destination for structured report; "datahub" sends to server, other values are treated as file paths |
| Output | Exit code (int) | 0 on success, 1 on failure or warnings (if strict mode) |
| Output | Terminal output | Color-coded pipeline summary with record counts, failures, and warnings |
| Output | Connection report (JSON) | When --test-source-connection is used with --report-to file.json
|
Usage Examples
Example 1: Dry run to validate a new MySQL recipe
# recipe.yml
# source:
# type: mysql
# config:
# host_port: "db.example.com:3306"
# database: "analytics"
# username: "${MYSQL_USER}"
# password: "${MYSQL_PASS}"
export MYSQL_USER=datahub
export MYSQL_PASS=secret
datahub ingest run -c recipe.yml --dry-run
# Output: Pipeline finished successfully; produced 0 events (sink writes skipped)
Example 2: Preview mode to sample metadata from Snowflake
datahub ingest run -c snowflake_recipe.yml --dry-run --preview --preview-workunits 5
# Extracts only 5 work units, prints source report showing sampled entities
Example 3: Test source connection and save report
datahub ingest run -c recipe.yml --test-source-connection --report-to /tmp/conn_report.json
# Exits immediately after testing connection
# Report written to /tmp/conn_report.json
Example 4: Programmatic dry run via Python
from datahub.ingestion.run.pipeline import Pipeline
recipe = {
"source": {
"type": "mysql",
"config": {
"host_port": "localhost:3306",
"database": "test_db",
"username": "root",
"password": "root",
},
},
}
pipeline = Pipeline.create(recipe, dry_run=True, preview_mode=True, preview_workunits=5)
pipeline.run()
ret = pipeline.pretty_print_summary()