Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Datahub project Datahub Emitter Instantiation

From Leeroopedia


Property Value
Principle Name Emitter_Instantiation
Category Java_SDK_Metadata_Emission
Workflow Java_SDK_Metadata_Emission
Repository https://github.com/datahub-project/datahub
Last Updated 2026-02-09 17:00 GMT

Overview

Description

Emitter Instantiation is the principle of creating metadata transport channel objects through factory methods and builders rather than direct constructor invocation. The DataHub Java SDK defines a common Emitter interface (datahub.client.Emitter) with four concrete implementations -- RestEmitter, KafkaEmitter, FileEmitter, and S3Emitter -- each targeting a different transport backend. The instantiation step configures the transport with connection parameters, authentication credentials, serialization settings, and retry policies, producing a ready-to-use emitter that can accept metadata change proposals.

Usage

Emitter instantiation is the first step after dependency resolution in any metadata emission workflow. The choice of emitter implementation depends on the deployment topology:

  • RestEmitter -- Used when the application has direct HTTP access to the DataHub GMS server. This is the most common choice for CI/CD pipelines and microservices.
  • KafkaEmitter -- Used when the application should decouple metadata production from GMS availability by writing to a Kafka topic. This is preferred for high-throughput scenarios or when GMS may experience downtime.
  • FileEmitter -- Used when the application has no network access to DataHub and must write metadata to a local JSON file for later ingestion.
  • S3Emitter -- Used when the application should write metadata to an S3 bucket, combining the FileEmitter approach with cloud storage upload.

Theoretical Basis

Emitter Instantiation draws on two foundational design patterns:

Factory Pattern -- The RestEmitter.create(Consumer<RestEmitterConfigBuilder>) static factory method encapsulates the construction logic behind a single entry point. The caller provides a lambda that configures the builder, and the factory method handles object assembly including HTTP client initialization, retry strategy setup, and connection pool configuration. This hides the complexity of constructing the underlying CloseableHttpAsyncClient from the caller. The RestEmitter.createWithDefaults() convenience method further simplifies the common case where default settings are acceptable.

Strategy Pattern -- The Emitter interface acts as a strategy abstraction, allowing client code to program against the interface rather than a specific transport implementation. A metadata emission pipeline can accept any Emitter implementation and emit metadata without knowledge of whether the events travel over HTTP, Kafka, or are written to a file. This enables transport-agnostic code that can be reconfigured at deployment time by swapping the emitter implementation.

Builder Pattern -- The RestEmitterConfig.RestEmitterConfigBuilder (generated by Lombok @Builder) provides a fluent API for setting configuration properties step by step. The builder accumulates settings for server URL, authentication token, timeout, SSL verification, retry policy, extra headers, and HTTP client customizations, and then produces an immutable RestEmitterConfig value object. The with(Consumer) method on the builder enables the lambda-based configuration style used by RestEmitter.create().

Separation of Configuration from Construction -- The config object (RestEmitterConfig, KafkaEmitterConfig, FileEmitterConfig, S3EmitterConfig) is separate from the emitter itself. This allows configuration to be built incrementally, validated independently, and potentially serialized or reused, while the emitter constructor performs the actual resource allocation (opening HTTP connections, creating Kafka producers, opening file handles).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment