Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Datahub project Datahub Client Authentication

From Leeroopedia


Field Value
Principle Name Client Authentication
Overview The mechanism for authenticating client applications with the DataHub GMS server using personal access tokens.
Status Active
Domains Data_Integration, Metadata_Management
Related Implementations Datahub_project_Datahub_DatahubClientConfig
Last Updated 2026-02-10
Knowledge Sources DataHub Repository

Description

Client authentication enables secure communication between ingestion pipelines and the DataHub backend. It uses token-based authentication where a Personal Access Token (PAT) is passed in HTTP headers. The configuration is centralized in the DatahubClientConfig class, which holds the server URL, authentication token, and connection parameters.

When authentication is enabled on the DataHub server (via METADATA_SERVICE_AUTH_ENABLED=true), all API requests must include a valid token. The token is included in the Authorization header as a Bearer token for each HTTP request sent to the GMS server.

The authentication configuration can be specified in multiple locations:

  • Recipe file -- In the datahub_api section of a recipe YAML
  • Sink configuration -- In the sink.config section when using datahub-rest or datahub-kafka sinks
  • CLI configuration -- Via datahub init which stores credentials in ~/.datahubenv
  • Environment variables -- Token and server URL can be set via environment variables

Usage

Use client authentication when connecting to a DataHub instance that has authentication enabled. This is the standard configuration for production deployments where access control is required.

Example Recipe with Authentication

source:
  type: snowflake
  config:
    account_id: "myaccount"
    username: "datahub_user"
    password: "${SNOWFLAKE_PASSWORD}"

datahub_api:
  server: "https://datahub.example.com:8080"
  token: "${DATAHUB_TOKEN}"

Theoretical Basis

Token-based authentication follows the Bearer Token pattern from OAuth 2.0 (RFC 6750). The client includes a token in the Authorization header of each HTTP request:

Authorization: Bearer <token>

This pattern is stateless -- the server validates the token on each request without maintaining session state. Personal Access Tokens (PATs) are long-lived tokens generated by the DataHub UI or API, tied to a specific user identity and set of permissions.

Benefits of this approach:

  • Stateless -- No server-side session management required
  • Revocable -- Tokens can be individually revoked without affecting other sessions
  • Auditable -- Each token is tied to a user identity for access logging
  • Portable -- The same token works across CLI, SDK, and API clients

Constraints

  • Authentication is optional -- if the server does not have METADATA_SERVICE_AUTH_ENABLED=true, tokens are not required
  • Tokens are sensitive credentials and should be stored securely (e.g., environment variables, secrets managers)
  • SSL/TLS is recommended for production deployments to protect tokens in transit
  • The DatahubClientConfig supports custom CA certificates and client certificates for mTLS environments

Related Pages

Implementation:Datahub_project_Datahub_DatahubClientConfig

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment