Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Datahub project Datahub Emitter Initialization

From Leeroopedia


Metadata

Field Value
principle_name Emitter Initialization
description The process of creating an authenticated transport client for emitting metadata to DataHub via REST or Kafka.
type principle
status active
last_updated 2026-02-10
version 1.0

Overview

Emitter Initialization is the process of creating an authenticated transport client for emitting metadata to DataHub via REST or Kafka. The emitter is the primary interface through which Python applications send metadata change proposals to the DataHub backend.

Description

Emitter initialization establishes a connection to the DataHub backend. The Emitter protocol (defined in datahub.emitter.generic_emitter) defines the contract that all emitter implementations must satisfy:

class Emitter(Protocol):
    def emit(
        self,
        item: Union[MetadataChangeEvent, MetadataChangeProposal, MetadataChangeProposalWrapper],
        callback: Optional[Callable[[Exception, str], None]] = None,
    ) -> None: ...

    def flush(self) -> None: ...

Two concrete implementations exist:

  • DataHubRestEmitter -- Communicates with DataHub GMS over HTTP. Handles session management, retry logic with exponential backoff, SSL configuration, authentication via bearer tokens, and payload size management. Supports both RestLI and OpenAPI endpoints.
  • DatahubKafkaEmitter -- Publishes metadata to Kafka topics using Avro serialization. Manages schema registry integration, topic routing for MCE and MCP message types, and optional OAuth authentication for managed Kafka services like AWS MSK.

Both implementations extend the Closeable interface alongside the Emitter protocol, ensuring proper resource cleanup.

Usage

Use Emitter Initialization when starting a Python application that will emit metadata events to DataHub. The choice of emitter depends on infrastructure and requirements:

  • DataHubRestEmitter is appropriate when:
    • Direct, synchronous communication with GMS is needed
    • Request-response confirmation of each emission is required
    • The application needs to inspect server configuration (e.g., feature support)
    • OpenAPI-based batching and trace-based write verification are desired
  • DatahubKafkaEmitter is appropriate when:
    • High-throughput asynchronous emission is needed
    • Kafka infrastructure is already deployed
    • Decoupled, fire-and-forget semantics are acceptable
    • Schema registry-based serialization is preferred

Theoretical Basis

This principle follows the Strategy pattern. The Emitter Protocol defines the interface, with REST and Kafka as interchangeable strategies. Selection depends on infrastructure and latency requirements.

The Protocol-based approach (using typing_extensions.Protocol) enables structural subtyping, meaning any class that implements emit() and flush() with the correct signatures is considered an Emitter without explicit inheritance. This provides flexibility for custom emitter implementations.

Key design aspects:

  • Authentication abstraction -- Both emitters handle authentication transparently (bearer tokens for REST, SASL/OAuth for Kafka)
  • Retry resilience -- The REST emitter includes weighted retry logic with special handling for HTTP 429 (rate limiting) responses
  • Configuration-driven behavior -- Both emitters accept configuration objects that control transport behavior

Related

Implementation:Datahub_project_Datahub_DataHubRestEmitter_Init

Knowledge Sources

Domains

Data_Integration, Metadata_Management

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment