Implementation:Datahub project Datahub Pip Install Datahub SDK
Appearance
Metadata
| Field | Value |
|---|---|
| implementation_name | Pip Install Datahub SDK |
| description | Installing the DataHub Python SDK via pip with transport-specific extras for REST or Kafka metadata emission. |
| type | implementation |
| category | External Tool Doc |
| status | active |
| last_updated | 2026-02-10 |
| version | 1.0 |
Overview
This implementation covers the pip installation commands and extras configuration for the DataHub Python SDK (acryl-datahub). The SDK is installed via pip with extras that provision the appropriate transport backend (REST or Kafka) for metadata emission.
Source Reference
| Field | Value |
|---|---|
| File | metadata-ingestion/setup.py
|
| Lines | L479-743 |
| Repository | datahub-project/datahub |
Installation Commands
REST Emitter (HTTP to GMS)
pip install 'acryl-datahub[datahub-rest]'
Kafka Emitter (Kafka topics)
pip install 'acryl-datahub[datahub-kafka]'
Both Transports
pip install 'acryl-datahub[datahub-rest,datahub-kafka]'
Extras Dependencies
| Extra | Dependencies | Purpose |
|---|---|---|
| datahub-rest | rest_common (includes requests) |
HTTP transport to GMS |
| datahub-kafka | confluent_kafka[schemaregistry,avro]>=1.9.0,!=2.8.1,<3.0.0, fastavro>=1.2.0,<2.0.0 |
Kafka transport for async emission |
| sync-file-emitter | filelock<4.0.0 |
File-based emission with locking |
| datahub-lite | duckdb>=1.0.0,<2.0.0, fastapi<0.129.0, uvicorn<0.41.0 |
Lightweight local metadata store |
Base Dependencies
The base package (installed regardless of extras) includes:
| Dependency | Version Constraint |
|---|---|
typing_extensions |
>=4.8.0,<5.0.0
|
pydantic |
>=2.4.0,<3.0.0
|
avro |
>=1.11.3,<1.13
|
avro-gen3 |
==0.7.16
|
sentry-sdk |
>=1.33.1,<3.0.0
|
setuptools |
<82.0.0
|
I/O Contract
| Field | Value |
|---|---|
| Input | pip install command with the desired extras specification |
| Output | Python environment with datahub package and transport dependencies installed
|
| Side Effects | Installs Python packages into the active virtual environment; makes emitter classes importable |
Usage Example
# Create and activate a virtual environment
python3 -m venv datahub-env
source datahub-env/bin/activate
# Install with REST support
pip install 'acryl-datahub[datahub-rest]'
# Verify installation
python3 -c "from datahub.emitter.rest_emitter import DataHubRestEmitter; print('REST emitter available')"
# Install with Kafka support
pip install 'acryl-datahub[datahub-kafka]'
# Verify installation
python3 -c "from datahub.emitter.kafka_emitter import DatahubKafkaEmitter; print('Kafka emitter available')"
Post-Installation Imports
After installing the appropriate extra, the following imports become available:
# REST emitter (requires datahub-rest extra)
from datahub.emitter.rest_emitter import DataHubRestEmitter
# Kafka emitter (requires datahub-kafka extra)
from datahub.emitter.kafka_emitter import DatahubKafkaEmitter
# Always available (core package)
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.mce_builder import make_dataset_urn
Related
- Implements: Datahub_project_Datahub_Python_SDK_Installation
- Related to: Datahub_project_Datahub_DataHubRestEmitter_Init
- Environment: Environment:Datahub_project_Datahub_Python_3_10_Ingestion_Environment
- Heuristic: Heuristic:Datahub_project_Datahub_Gradle_Formatting_Over_Direct_Tools
Knowledge Sources
Domains
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment