Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Datahub project Datahub Pip Install Datahub SDK

From Leeroopedia


Metadata

Field Value
implementation_name Pip Install Datahub SDK
description Installing the DataHub Python SDK via pip with transport-specific extras for REST or Kafka metadata emission.
type implementation
category External Tool Doc
status active
last_updated 2026-02-10
version 1.0

Overview

This implementation covers the pip installation commands and extras configuration for the DataHub Python SDK (acryl-datahub). The SDK is installed via pip with extras that provision the appropriate transport backend (REST or Kafka) for metadata emission.

Source Reference

Field Value
File metadata-ingestion/setup.py
Lines L479-743
Repository datahub-project/datahub

Installation Commands

REST Emitter (HTTP to GMS)

pip install 'acryl-datahub[datahub-rest]'

Kafka Emitter (Kafka topics)

pip install 'acryl-datahub[datahub-kafka]'

Both Transports

pip install 'acryl-datahub[datahub-rest,datahub-kafka]'

Extras Dependencies

Extra Dependencies Purpose
datahub-rest rest_common (includes requests) HTTP transport to GMS
datahub-kafka confluent_kafka[schemaregistry,avro]>=1.9.0,!=2.8.1,<3.0.0, fastavro>=1.2.0,<2.0.0 Kafka transport for async emission
sync-file-emitter filelock<4.0.0 File-based emission with locking
datahub-lite duckdb>=1.0.0,<2.0.0, fastapi<0.129.0, uvicorn<0.41.0 Lightweight local metadata store

Base Dependencies

The base package (installed regardless of extras) includes:

Dependency Version Constraint
typing_extensions >=4.8.0,<5.0.0
pydantic >=2.4.0,<3.0.0
avro >=1.11.3,<1.13
avro-gen3 ==0.7.16
sentry-sdk >=1.33.1,<3.0.0
setuptools <82.0.0

I/O Contract

Field Value
Input pip install command with the desired extras specification
Output Python environment with datahub package and transport dependencies installed
Side Effects Installs Python packages into the active virtual environment; makes emitter classes importable

Usage Example

# Create and activate a virtual environment
python3 -m venv datahub-env
source datahub-env/bin/activate

# Install with REST support
pip install 'acryl-datahub[datahub-rest]'

# Verify installation
python3 -c "from datahub.emitter.rest_emitter import DataHubRestEmitter; print('REST emitter available')"
# Install with Kafka support
pip install 'acryl-datahub[datahub-kafka]'

# Verify installation
python3 -c "from datahub.emitter.kafka_emitter import DatahubKafkaEmitter; print('Kafka emitter available')"

Post-Installation Imports

After installing the appropriate extra, the following imports become available:

# REST emitter (requires datahub-rest extra)
from datahub.emitter.rest_emitter import DataHubRestEmitter

# Kafka emitter (requires datahub-kafka extra)
from datahub.emitter.kafka_emitter import DatahubKafkaEmitter

# Always available (core package)
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.mce_builder import make_dataset_urn

Related

Knowledge Sources

Domains

Data_Integration, Metadata_Management

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment