Principle:Datahub project Datahub Client Authentication
| Field | Value |
|---|---|
| Principle Name | Client Authentication |
| Overview | The mechanism for authenticating client applications with the DataHub GMS server using personal access tokens. |
| Status | Active |
| Domains | Data_Integration, Metadata_Management |
| Related Implementations | Datahub_project_Datahub_DatahubClientConfig |
| Last Updated | 2026-02-10 |
| Knowledge Sources | DataHub Repository |
Description
Client authentication enables secure communication between ingestion pipelines and the DataHub backend. It uses token-based authentication where a Personal Access Token (PAT) is passed in HTTP headers. The configuration is centralized in the DatahubClientConfig class, which holds the server URL, authentication token, and connection parameters.
When authentication is enabled on the DataHub server (via METADATA_SERVICE_AUTH_ENABLED=true), all API requests must include a valid token. The token is included in the Authorization header as a Bearer token for each HTTP request sent to the GMS server.
The authentication configuration can be specified in multiple locations:
- Recipe file -- In the
datahub_apisection of a recipe YAML - Sink configuration -- In the
sink.configsection when usingdatahub-restordatahub-kafkasinks - CLI configuration -- Via
datahub initwhich stores credentials in~/.datahubenv - Environment variables -- Token and server URL can be set via environment variables
Usage
Use client authentication when connecting to a DataHub instance that has authentication enabled. This is the standard configuration for production deployments where access control is required.
Example Recipe with Authentication
source:
type: snowflake
config:
account_id: "myaccount"
username: "datahub_user"
password: "${SNOWFLAKE_PASSWORD}"
datahub_api:
server: "https://datahub.example.com:8080"
token: "${DATAHUB_TOKEN}"
Theoretical Basis
Token-based authentication follows the Bearer Token pattern from OAuth 2.0 (RFC 6750). The client includes a token in the Authorization header of each HTTP request:
Authorization: Bearer <token>
This pattern is stateless -- the server validates the token on each request without maintaining session state. Personal Access Tokens (PATs) are long-lived tokens generated by the DataHub UI or API, tied to a specific user identity and set of permissions.
Benefits of this approach:
- Stateless -- No server-side session management required
- Revocable -- Tokens can be individually revoked without affecting other sessions
- Auditable -- Each token is tied to a user identity for access logging
- Portable -- The same token works across CLI, SDK, and API clients
Constraints
- Authentication is optional -- if the server does not have
METADATA_SERVICE_AUTH_ENABLED=true, tokens are not required - Tokens are sensitive credentials and should be stored securely (e.g., environment variables, secrets managers)
- SSL/TLS is recommended for production deployments to protect tokens in transit
- The
DatahubClientConfigsupports custom CA certificates and client certificates for mTLS environments
Related Pages
- Implemented by: Datahub_project_Datahub_DatahubClientConfig