Workflow:Datahub project Datahub Java SDK Metadata Emission
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Metadata_Management, Java_SDK |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
End-to-end process for programmatically emitting metadata to DataHub from Java applications using the V1 SDK emitter library.
Description
This workflow covers the low-level Java SDK (V1) approach for sending metadata to DataHub. It uses the MetadataChangeProposalWrapper pattern to construct metadata events and one of four emitter backends (REST, Kafka, File, S3) to deliver them. This approach is suitable for CI/CD pipelines, custom orchestrators, and applications that need fine-grained control over metadata emission.
Usage
Execute this workflow when you need to emit metadata to DataHub from a Java application, build tool, or CI/CD pipeline. This is the appropriate choice when you require low-level control over MetadataChangeProposal construction, need to integrate with existing Java infrastructure, or want to emit metadata via Kafka rather than REST.
Execution Steps
Step 1: Add SDK Dependency
Add the datahub-client library to your project's build configuration. The SDK is published to Maven Central and supports both Gradle and Maven build systems.
Key considerations:
- The dependency groupId is io.acryl and artifactId is datahub-client
- Include the appropriate version matching your DataHub deployment
- The SDK bundles dependencies for REST, Kafka, File, and S3 emitters
Step 2: Create an Emitter Instance
Instantiate the appropriate emitter based on your transport preference. The REST emitter sends metadata over HTTP to the DataHub GMS endpoint. The Kafka emitter publishes directly to Kafka topics. File and S3 emitters write metadata to local files or cloud storage.
Key considerations:
- RestEmitter requires a GMS server URL and optional auth token
- KafkaEmitter requires bootstrap server and schema registry URLs
- FileEmitter writes JSON to a specified local file path
- S3Emitter writes JSON to an S3 bucket with configurable key prefix
Step 3: Construct Metadata Change Proposals
Build MetadataChangeProposalWrapper objects that describe the metadata you want to emit. Each wrapper contains an entity URN, an aspect name, and the aspect value. Use the provided builder utilities and Pegasus-generated aspect classes.
Key considerations:
- URNs follow the format urn:li:entityType:(key components)
- Aspect classes are generated from Avro/PDL schemas in metadata-models
- Common aspects include DatasetProperties, SchemaMetadata, Ownership, and UpstreamLineage
- The wrapper infers the aspect name from the aspect class type
Step 4: Emit Metadata Events
Send the constructed MCPs through the emitter. REST emission supports both blocking and non-blocking modes. Kafka emission is inherently asynchronous with callback support. Check the MetadataWriteResponse for success or error information.
Key considerations:
- REST blocking mode returns a Future that resolves to MetadataWriteResponse
- Kafka mode uses a Callback interface for async acknowledgement
- Always call emitter.flush() for Kafka to ensure all pending events are delivered
- Handle exceptions for network failures and server errors
Step 5: Close the Emitter
Close the emitter to release resources and ensure all pending metadata events are flushed. This is especially important for Kafka and File emitters which buffer events internally.
Key considerations:
- Use try-with-resources or explicit close() calls
- For Kafka, flush() before close() to ensure delivery
- FileEmitter finalizes the JSON output on close