Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Datahub project Datahub DatahubClientConfig

From Leeroopedia


Field Value
Implementation Name DatahubClientConfig
Overview Concrete tool for configuring authenticated connections to the DataHub GMS server, including server URL, token, TLS settings, and retry behavior.
Type API Doc
Implements Datahub_project_Datahub_Client_Authentication
Status Active
Domains Data_Integration, Metadata_Management
Source DataHub Repository -- metadata-ingestion/src/datahub/ingestion/graph/config.py (lines 19-36)
Last Updated 2026-02-10
Knowledge Sources DataHub Repository

Description

DatahubClientConfig is a pydantic ConfigModel that encapsulates all connection parameters needed to communicate with a DataHub GMS server. It is used by the DataHubGraph client, the DatahubRestSink, and the PipelineConfig.datahub_api field to configure authenticated HTTP connections.

The class supports token-based authentication (Bearer tokens), custom TLS/SSL settings (CA certificates, client certificates, SSL verification toggle), retry policies, and extra HTTP headers.

Class Signature

from datahub.ingestion.graph.config import DatahubClientConfig

class DatahubClientConfig(ConfigModel):
    """Configuration class for holding connectivity to datahub gms"""

    server: str
    token: Optional[str] = None
    timeout_sec: Optional[float] = None
    retry_status_codes: Optional[List[int]] = None
    retry_max_times: Optional[int] = None
    extra_headers: Optional[Dict[str, str]] = None
    ca_certificate_path: Optional[str] = None
    client_certificate_path: Optional[str] = None
    disable_ssl_verification: bool = False
    openapi_ingestion: Optional[bool] = None
    client_mode: Optional[ClientMode] = None
    datahub_component: Optional[str] = None
    server_config_refresh_interval: Optional[int] = None

    model_config = ConfigDict(extra="ignore")

Source file: metadata-ingestion/src/datahub/ingestion/graph/config.py, lines 19-36.

Key Parameters

Parameter Type Default Description
server str (required) The URL of the DataHub GMS server (e.g., http://localhost:8080 or https://datahub.example.com:8080).
token Optional[str] None Personal Access Token (PAT) for authentication. Sent as a Bearer token in the Authorization header.
timeout_sec Optional[float] None HTTP request timeout in seconds.
retry_status_codes Optional[List[int]] None HTTP status codes that should trigger a retry.
retry_max_times Optional[int] None Maximum number of retry attempts for failed requests.
extra_headers Optional[Dict[str, str]] None Additional HTTP headers to include in every request.
ca_certificate_path Optional[str] None Path to a CA certificate bundle for TLS verification.
client_certificate_path Optional[str] None Path to a client certificate for mutual TLS (mTLS) authentication.
disable_ssl_verification bool False Disable SSL certificate verification (not recommended for production).
openapi_ingestion Optional[bool] None Whether to use the OpenAPI ingestion endpoint instead of the default RestLI endpoint.
client_mode Optional[ClientMode] None The client mode (INGESTION, CLI, or SDK) which affects request behavior.
datahub_component Optional[str] None Identifies the DataHub component making the request.
server_config_refresh_interval Optional[int] None Interval in seconds for refreshing server configuration.

Import

from datahub.ingestion.graph.config import DatahubClientConfig

I/O Contract

Inputs

  • Server URL string (required)
  • Optional authentication token
  • Optional TLS/SSL configuration parameters
  • Optional retry and timeout parameters

Outputs

  • A validated DatahubClientConfig instance ready to be passed to DataHubGraph or used as the datahub_api field in PipelineConfig
  • Raises pydantic.ValidationError if required fields are missing or types are incorrect

Usage Examples

In a Recipe YAML

datahub_api:
  server: "https://datahub.example.com:8080"
  token: "${DATAHUB_TOKEN}"
  timeout_sec: 30
  retry_max_times: 3
  disable_ssl_verification: false

In Python Code

from datahub.ingestion.graph.config import DatahubClientConfig
from datahub.ingestion.graph.client import DataHubGraph

config = DatahubClientConfig(
    server="https://datahub.example.com:8080",
    token="my-personal-access-token",
    timeout_sec=30,
)

with DataHubGraph(config) as graph:
    graph.test_connection()

With Custom TLS Settings

datahub_api:
  server: "https://datahub.internal:8080"
  token: "${DATAHUB_TOKEN}"
  ca_certificate_path: "/etc/ssl/certs/internal-ca.pem"
  client_certificate_path: "/etc/ssl/certs/client.pem"

ClientMode Enum

The client_mode field uses the ClientMode enum defined in the same module:

class ClientMode(Enum):
    INGESTION = auto()
    CLI = auto()
    SDK = auto()

This enum identifies the calling context, which may affect server-side behavior or logging.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment