Implementation:Datahub project Datahub Java SDK V1 Examples
Implementation: Java SDK V1 Examples
| Knowledge Sources | |
|---|---|
| Domains | Java_SDK, Examples |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Description
The Java SDK V1 examples demonstrate how to programmatically emit metadata to DataHub using the original (V1) Java SDK. These examples cover creating and updating entities such as Datasets, DataFlows, DataJobs, Tags, Forms, and Structured Properties. The V1 SDK uses two primary patterns: the MetadataChangeProposalWrapper (MCP wrapper) pattern for full aspect upserts, and the PatchBuilder pattern for incremental, field-level modifications to existing metadata.
All V1 examples reside in the package io.datahubproject.examples under the metadata-integration/java/examples module.
Usage
To run these examples against a local DataHub instance:
# Ensure DataHub is running on localhost:8080
# Then compile and run any example:
cd metadata-integration/java/examples
../../../gradlew run -PmainClass=io.datahubproject.examples.DatasetAdd
Each example contains a main method and connects to DataHub via the RestEmitter client, typically configured with:
RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080").token(token));
Some newer V1 examples (such as DataFlowCreateExample and DataFlowFullExample) use the DataHubClientV2 client but are classified as V1 because they reside in the top-level examples package.
Code Reference
Source Location
All V1 example files are located at:
metadata-integration/java/examples/src/main/java/io/datahubproject/examples/
Files:
DataFlowCreateExample.java(128 lines)DataFlowFullExample.java(267 lines)DataJobLineageAdd.java(63 lines)DatasetAdd.java(93 lines)DatasetCustomPropertiesAdd.java(49 lines)DatasetCustomPropertiesAddRemove.java(49 lines)DatasetCustomPropertiesReplace.java(48 lines)DatasetStructuredPropertiesUpdate.java(79 lines)FormCreate.java(68 lines)FormUpdate.java(57 lines)StructuredPropertyUpsert.java(102 lines)TagCreate.java(35 lines)
Example Catalog
| Example | Entity Type | Operation | SDK Pattern | File |
|---|---|---|---|---|
| DataFlow Create | DataFlow | Create (basic) | DataHubClientV2 / Fluent Builder | DataFlowCreateExample.java
|
| DataFlow Full | DataFlow | Create (comprehensive) | DataHubClientV2 / Fluent Builder | DataFlowFullExample.java
|
| DataJob Lineage Add | DataJob | Patch (add lineage) | PatchBuilder | DataJobLineageAdd.java
|
| Dataset Add | Dataset | Upsert (schema metadata) | MCP Wrapper | DatasetAdd.java
|
| Dataset Custom Properties Add | Dataset | Patch (add properties) | PatchBuilder | DatasetCustomPropertiesAdd.java
|
| Dataset Custom Properties Add/Remove | Dataset | Patch (add and remove) | PatchBuilder | DatasetCustomPropertiesAddRemove.java
|
| Dataset Custom Properties Replace | Dataset | Patch (replace all) | PatchBuilder | DatasetCustomPropertiesReplace.java
|
| Dataset Structured Properties Update | Dataset | Patch (structured props) | PatchBuilder | DatasetStructuredPropertiesUpdate.java
|
| Form Create | Form | Upsert (create form) | MCP Wrapper | FormCreate.java
|
| Form Update | Form | Patch (update form) | PatchBuilder | FormUpdate.java
|
| Structured Property Upsert | Structured Property | Upsert (create/update) | PatchBuilder | StructuredPropertyUpsert.java
|
| Tag Create | Tag | Upsert (create tag) | MCP Wrapper | TagCreate.java
|
Common Patterns
Pattern 1: MetadataChangeProposalWrapper (Full Aspect Upsert)
The MCP Wrapper pattern is used when you want to set or replace an entire aspect on an entity. It constructs a MetadataChangeProposalWrapper with a builder, specifying the entity type, URN, and the aspect object. This is the simplest approach for creating new entities with initial metadata.
Key classes:
datahub.event.MetadataChangeProposalWrapperdatahub.client.rest.RestEmitter- Various aspect classes from
com.linkedin.*
Typical flow:
// 1. Construct the aspect object
SomeAspect aspect = new SomeAspect().setField1("value1").setField2("value2");
// 2. Build the MCP wrapper
MetadataChangeProposalWrapper mcpw =
MetadataChangeProposalWrapper.builder()
.entityType("dataset")
.entityUrn("urn:li:dataset:(urn:li:dataPlatform:hive,my_table,PROD)")
.upsert()
.aspect(aspect)
.build();
// 3. Emit via RestEmitter
RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080").token(token));
Future<MetadataWriteResponse> response = emitter.emit(mcpw, null);
System.out.println(response.get().getResponseContent());
Pattern 2: PatchBuilder (Incremental Field-Level Updates)
The PatchBuilder pattern is used for incremental modifications that should not overwrite unrelated fields. Each entity/aspect type has a dedicated PatchBuilder class (e.g., DatasetPropertiesPatchBuilder, DataJobInputOutputPatchBuilder). The builder produces a MetadataChangeProposal that applies JSON Patch semantics.
Key classes:
com.linkedin.metadata.aspect.patch.builder.DatasetPropertiesPatchBuildercom.linkedin.metadata.aspect.patch.builder.DataJobInputOutputPatchBuildercom.linkedin.metadata.aspect.patch.builder.StructuredPropertiesPatchBuildercom.linkedin.metadata.aspect.patch.builder.StructuredPropertyDefinitionPatchBuildercom.linkedin.metadata.aspect.patch.builder.FormInfoPatchBuilder
Typical flow:
// 1. Build the patch MCP via a dedicated PatchBuilder
MetadataChangeProposal mcp =
new DatasetPropertiesPatchBuilder()
.urn(UrnUtils.toDatasetUrn("hive", "fct_users_deleted", "PROD"))
.addCustomProperty("cluster_name", "datahubproject.acryl.io")
.addCustomProperty("retention_time", "2 years")
.build();
// 2. Emit via RestEmitter
RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080").token(token));
Future<MetadataWriteResponse> response = emitter.emit(mcp);
System.out.println(response.get().getResponseContent());
Pattern 3: DataHubClientV2 with Entity Builders (Hybrid)
Some V1-level examples (DataFlowCreateExample, DataFlowFullExample) use the newer DataHubClientV2 client with fluent entity builders. These demonstrate the transition toward the V2 API style while still residing in the top-level examples package.
Key classes:
datahub.client.v2.DataHubClientV2datahub.client.v2.entity.DataFlow
Typical flow:
try (DataHubClientV2 client =
DataHubClientV2.builder().server("http://localhost:8080").token(token).build()) {
DataFlow dataFlow = DataFlow.builder()
.orchestrator("airflow")
.flowId("my_pipeline")
.cluster("prod")
.displayName("My Pipeline")
.description("A sample pipeline")
.build();
dataFlow.addTag("etl").addTag("production");
dataFlow.addOwner("urn:li:corpuser:user1", OwnershipType.TECHNICAL_OWNER);
dataFlow.addCustomProperty("schedule", "0 2 * * *");
client.entities().upsert(dataFlow);
}
Usage Examples
Example 1: Creating a Tag (MCP Wrapper Pattern)
The simplest V1 example, demonstrating how to create a Tag entity with a name and description.
Source: TagCreate.java
package io.datahubproject.examples;
import com.linkedin.tag.TagProperties;
import datahub.client.MetadataWriteResponse;
import datahub.client.rest.RestEmitter;
import datahub.event.MetadataChangeProposalWrapper;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
public class TagCreate {
private TagCreate() {}
public static void main(String[] args)
throws IOException, ExecutionException, InterruptedException {
TagProperties tagProperties =
new TagProperties()
.setName("Deprecated")
.setDescription("Having this tag means this column or table is deprecated.");
MetadataChangeProposalWrapper mcpw =
MetadataChangeProposalWrapper.builder()
.entityType("tag")
.entityUrn("urn:li:tag:deprecated")
.upsert()
.aspect(tagProperties)
.build();
String token = "";
RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080").token(token));
Future<MetadataWriteResponse> response = emitter.emit(mcpw, null);
System.out.println(response.get().getResponseContent());
}
}
Example 2: Adding Dataset Schema Metadata (MCP Wrapper Pattern)
Demonstrates creating a Dataset entity with schema metadata including multiple typed fields.
Source: DatasetAdd.java
package io.datahubproject.examples;
import com.linkedin.common.AuditStamp;
import com.linkedin.common.urn.CorpuserUrn;
import com.linkedin.common.urn.DataPlatformUrn;
import com.linkedin.common.urn.DatasetUrn;
import com.linkedin.common.urn.UrnUtils;
import com.linkedin.schema.*;
import datahub.client.MetadataWriteResponse;
import datahub.client.rest.RestEmitter;
import datahub.event.MetadataChangeProposalWrapper;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
public class DatasetAdd {
private DatasetAdd() {}
public static void main(String[] args)
throws IOException, ExecutionException, InterruptedException {
DatasetUrn datasetUrn = UrnUtils.toDatasetUrn("hive", "fct_users_deleted", "PROD");
CorpuserUrn userUrn = new CorpuserUrn("ingestion");
AuditStamp lastModified = new AuditStamp().setTime(1640692800000L).setActor(userUrn);
SchemaMetadata schemaMetadata =
new SchemaMetadata()
.setSchemaName("customer")
.setPlatform(new DataPlatformUrn("hive"))
.setVersion(0L)
.setHash("")
.setPlatformSchema(
SchemaMetadata.PlatformSchema.create(
new OtherSchema().setRawSchema("__insert raw schema here__")))
.setLastModified(lastModified);
SchemaFieldArray fields = new SchemaFieldArray();
SchemaField field1 =
new SchemaField()
.setFieldPath("address.zipcode")
.setType(
new SchemaFieldDataType()
.setType(SchemaFieldDataType.Type.create(new StringType())))
.setNativeDataType("VARCHAR(50)")
.setDescription("This is the zipcode of the address.")
.setLastModified(lastModified);
fields.add(field1);
schemaMetadata.setFields(fields);
MetadataChangeProposalWrapper mcpw =
MetadataChangeProposalWrapper.builder()
.entityType("dataset")
.entityUrn(datasetUrn)
.upsert()
.aspect(schemaMetadata)
.build();
String token = "";
RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080").token(token));
Future<MetadataWriteResponse> response = emitter.emit(mcpw, null);
System.out.println(response.get().getResponseContent());
}
}
Example 3: Incremental Custom Property Updates (PatchBuilder Pattern)
Demonstrates adding, removing, and replacing custom properties on a Dataset without affecting other existing properties.
Source: DatasetCustomPropertiesAdd.java
// Adding properties without affecting existing ones
MetadataChangeProposal datasetPropertiesProposal =
new DatasetPropertiesPatchBuilder()
.urn(UrnUtils.toDatasetUrn("hive", "fct_users_deleted", "PROD"))
.addCustomProperty("cluster_name", "datahubproject.acryl.io")
.addCustomProperty("retention_time", "2 years")
.build();
Source: DatasetCustomPropertiesAddRemove.java
// Adding one property and removing another in a single patch
MetadataChangeProposal datasetPropertiesProposal =
new DatasetPropertiesPatchBuilder()
.urn(UrnUtils.toDatasetUrn("hive", "fct_users_deleted", "PROD"))
.addCustomProperty("cluster_name", "datahubproject.acryl.io")
.removeCustomProperty("retention_time")
.build();
Source: DatasetCustomPropertiesReplace.java
// Replacing the entire custom properties map
Map<String, String> customPropsMap = new HashMap<>();
customPropsMap.put("cluster_name", "datahubproject.acryl.io");
customPropsMap.put("retention_time", "2 years");
MetadataChangeProposal datasetPropertiesProposal =
new DatasetPropertiesPatchBuilder()
.urn(UrnUtils.toDatasetUrn("hive", "fct_users_deleted", "PROD"))
.setCustomProperties(customPropsMap)
.build();
Key Concepts
URN Construction
DataHub uses URN (Uniform Resource Name) identifiers for all entities. V1 examples demonstrate several URN construction methods:
| Method | Example | Description |
|---|---|---|
UrnUtils.toDatasetUrn() |
UrnUtils.toDatasetUrn("hive", "fct_users_deleted", "PROD") |
Convenience method for Dataset URNs |
UrnUtils.getUrn() |
UrnUtils.getUrn("urn:li:structuredProperty:testString") |
Parse any URN string |
new DatasetUrn() |
DatasetUrn.createFromString("urn:li:dataset:...") |
Type-safe Dataset URN creation |
| String literal | "urn:li:tag:deprecated" |
Direct URN string (used with MCP wrapper) |
RestEmitter Lifecycle
The RestEmitter should be properly closed after use. Some examples use try-with-resources, while others call emitter.close() in a finally block:
// Try-with-resources (preferred)
try (RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080").token(token))) {
// ... emit metadata
}
// Manual close (also common in examples)
RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080").token(token));
try {
// ... emit metadata
} finally {
emitter.close();
}
Related Pages
- Datahub_project_Datahub_Java_SDK_V2_Examples - V2 SDK examples with fluent builders and type-safe entity construction
- Datahub_project_Datahub - Main DataHub repository