Implementation:Datahub project Datahub DatahubClientConfig
| Field | Value |
|---|---|
| Implementation Name | DatahubClientConfig |
| Overview | Concrete tool for configuring authenticated connections to the DataHub GMS server, including server URL, token, TLS settings, and retry behavior. |
| Type | API Doc |
| Implements | Datahub_project_Datahub_Client_Authentication |
| Status | Active |
| Domains | Data_Integration, Metadata_Management |
| Source | DataHub Repository -- metadata-ingestion/src/datahub/ingestion/graph/config.py (lines 19-36)
|
| Last Updated | 2026-02-10 |
| Knowledge Sources | DataHub Repository |
Description
DatahubClientConfig is a pydantic ConfigModel that encapsulates all connection parameters needed to communicate with a DataHub GMS server. It is used by the DataHubGraph client, the DatahubRestSink, and the PipelineConfig.datahub_api field to configure authenticated HTTP connections.
The class supports token-based authentication (Bearer tokens), custom TLS/SSL settings (CA certificates, client certificates, SSL verification toggle), retry policies, and extra HTTP headers.
Class Signature
from datahub.ingestion.graph.config import DatahubClientConfig
class DatahubClientConfig(ConfigModel):
"""Configuration class for holding connectivity to datahub gms"""
server: str
token: Optional[str] = None
timeout_sec: Optional[float] = None
retry_status_codes: Optional[List[int]] = None
retry_max_times: Optional[int] = None
extra_headers: Optional[Dict[str, str]] = None
ca_certificate_path: Optional[str] = None
client_certificate_path: Optional[str] = None
disable_ssl_verification: bool = False
openapi_ingestion: Optional[bool] = None
client_mode: Optional[ClientMode] = None
datahub_component: Optional[str] = None
server_config_refresh_interval: Optional[int] = None
model_config = ConfigDict(extra="ignore")
Source file: metadata-ingestion/src/datahub/ingestion/graph/config.py, lines 19-36.
Key Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
server |
str |
(required) | The URL of the DataHub GMS server (e.g., http://localhost:8080 or https://datahub.example.com:8080).
|
token |
Optional[str] |
None |
Personal Access Token (PAT) for authentication. Sent as a Bearer token in the Authorization header. |
timeout_sec |
Optional[float] |
None |
HTTP request timeout in seconds. |
retry_status_codes |
Optional[List[int]] |
None |
HTTP status codes that should trigger a retry. |
retry_max_times |
Optional[int] |
None |
Maximum number of retry attempts for failed requests. |
extra_headers |
Optional[Dict[str, str]] |
None |
Additional HTTP headers to include in every request. |
ca_certificate_path |
Optional[str] |
None |
Path to a CA certificate bundle for TLS verification. |
client_certificate_path |
Optional[str] |
None |
Path to a client certificate for mutual TLS (mTLS) authentication. |
disable_ssl_verification |
bool |
False |
Disable SSL certificate verification (not recommended for production). |
openapi_ingestion |
Optional[bool] |
None |
Whether to use the OpenAPI ingestion endpoint instead of the default RestLI endpoint. |
client_mode |
Optional[ClientMode] |
None |
The client mode (INGESTION, CLI, or SDK) which affects request behavior. |
datahub_component |
Optional[str] |
None |
Identifies the DataHub component making the request. |
server_config_refresh_interval |
Optional[int] |
None |
Interval in seconds for refreshing server configuration. |
Import
from datahub.ingestion.graph.config import DatahubClientConfig
I/O Contract
Inputs
- Server URL string (required)
- Optional authentication token
- Optional TLS/SSL configuration parameters
- Optional retry and timeout parameters
Outputs
- A validated
DatahubClientConfiginstance ready to be passed toDataHubGraphor used as thedatahub_apifield inPipelineConfig - Raises
pydantic.ValidationErrorif required fields are missing or types are incorrect
Usage Examples
In a Recipe YAML
datahub_api:
server: "https://datahub.example.com:8080"
token: "${DATAHUB_TOKEN}"
timeout_sec: 30
retry_max_times: 3
disable_ssl_verification: false
In Python Code
from datahub.ingestion.graph.config import DatahubClientConfig
from datahub.ingestion.graph.client import DataHubGraph
config = DatahubClientConfig(
server="https://datahub.example.com:8080",
token="my-personal-access-token",
timeout_sec=30,
)
with DataHubGraph(config) as graph:
graph.test_connection()
With Custom TLS Settings
datahub_api:
server: "https://datahub.internal:8080"
token: "${DATAHUB_TOKEN}"
ca_certificate_path: "/etc/ssl/certs/internal-ca.pem"
client_certificate_path: "/etc/ssl/certs/client.pem"
ClientMode Enum
The client_mode field uses the ClientMode enum defined in the same module:
class ClientMode(Enum):
INGESTION = auto()
CLI = auto()
SDK = auto()
This enum identifies the calling context, which may affect server-side behavior or logging.
Related Pages
- Implements: Datahub_project_Datahub_Client_Authentication
- Related: Datahub_project_Datahub_PipelineConfig
- Related: Datahub_project_Datahub_Ingest_CLI_Run
- Environment: Environment:Datahub_project_Datahub_Python_3_10_Ingestion_Environment
- Heuristic: Heuristic:Datahub_project_Datahub_Secret_Handling_And_Deprecation_Patterns