Workflow:Datahub project Datahub Java SDK Metadata Emission

Knowledge Sources	DataHub Java SDK V1 Guide
Domains	Data_Engineering, Metadata_Management, Java_SDK
Last Updated	2026-02-09 17:00 GMT

Overview

End-to-end process for programmatically emitting metadata to DataHub from Java applications using the V1 SDK emitter library.

Description

This workflow covers the low-level Java SDK (V1) approach for sending metadata to DataHub. It uses the MetadataChangeProposalWrapper pattern to construct metadata events and one of four emitter backends (REST, Kafka, File, S3) to deliver them. This approach is suitable for CI/CD pipelines, custom orchestrators, and applications that need fine-grained control over metadata emission.

Usage

Execute this workflow when you need to emit metadata to DataHub from a Java application, build tool, or CI/CD pipeline. This is the appropriate choice when you require low-level control over MetadataChangeProposal construction, need to integrate with existing Java infrastructure, or want to emit metadata via Kafka rather than REST.

Execution Steps

Step 1: Add SDK Dependency

Add the datahub-client library to your project's build configuration. The SDK is published to Maven Central and supports both Gradle and Maven build systems.

Key considerations:

The dependency groupId is io.acryl and artifactId is datahub-client
Include the appropriate version matching your DataHub deployment
The SDK bundles dependencies for REST, Kafka, File, and S3 emitters

Step 2: Create an Emitter Instance

Instantiate the appropriate emitter based on your transport preference. The REST emitter sends metadata over HTTP to the DataHub GMS endpoint. The Kafka emitter publishes directly to Kafka topics. File and S3 emitters write metadata to local files or cloud storage.

Key considerations:

RestEmitter requires a GMS server URL and optional auth token
KafkaEmitter requires bootstrap server and schema registry URLs
FileEmitter writes JSON to a specified local file path
S3Emitter writes JSON to an S3 bucket with configurable key prefix

Step 3: Construct Metadata Change Proposals

Build MetadataChangeProposalWrapper objects that describe the metadata you want to emit. Each wrapper contains an entity URN, an aspect name, and the aspect value. Use the provided builder utilities and Pegasus-generated aspect classes.

Key considerations:

URNs follow the format urn:li:entityType:(key components)
Aspect classes are generated from Avro/PDL schemas in metadata-models
Common aspects include DatasetProperties, SchemaMetadata, Ownership, and UpstreamLineage
The wrapper infers the aspect name from the aspect class type

Step 4: Emit Metadata Events

Send the constructed MCPs through the emitter. REST emission supports both blocking and non-blocking modes. Kafka emission is inherently asynchronous with callback support. Check the MetadataWriteResponse for success or error information.

Key considerations:

REST blocking mode returns a Future that resolves to MetadataWriteResponse
Kafka mode uses a Callback interface for async acknowledgement
Always call emitter.flush() for Kafka to ensure all pending events are delivered
Handle exceptions for network failures and server errors

Step 5: Close the Emitter

Close the emitter to release resources and ensure all pending metadata events are flushed. This is especially important for Kafka and File emitters which buffer events internally.

Key considerations:

Use try-with-resources or explicit close() calls
For Kafka, flush() before close() to ensure delivery
FileEmitter finalizes the JSON output on close

Execution Diagram

GitHub URL

Workflow Repository