Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datahub project Datahub Java SDK V1 Examples

From Leeroopedia


Implementation: Java SDK V1 Examples

Knowledge Sources
Domains Java_SDK, Examples
Last Updated 2026-02-10 00:00 GMT

Overview

Description

The Java SDK V1 examples demonstrate how to programmatically emit metadata to DataHub using the original (V1) Java SDK. These examples cover creating and updating entities such as Datasets, DataFlows, DataJobs, Tags, Forms, and Structured Properties. The V1 SDK uses two primary patterns: the MetadataChangeProposalWrapper (MCP wrapper) pattern for full aspect upserts, and the PatchBuilder pattern for incremental, field-level modifications to existing metadata.

All V1 examples reside in the package io.datahubproject.examples under the metadata-integration/java/examples module.

Usage

To run these examples against a local DataHub instance:

# Ensure DataHub is running on localhost:8080
# Then compile and run any example:
cd metadata-integration/java/examples
../../../gradlew run -PmainClass=io.datahubproject.examples.DatasetAdd

Each example contains a main method and connects to DataHub via the RestEmitter client, typically configured with:

RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080").token(token));

Some newer V1 examples (such as DataFlowCreateExample and DataFlowFullExample) use the DataHubClientV2 client but are classified as V1 because they reside in the top-level examples package.

Code Reference

Source Location

All V1 example files are located at:

metadata-integration/java/examples/src/main/java/io/datahubproject/examples/

Files:

  • DataFlowCreateExample.java (128 lines)
  • DataFlowFullExample.java (267 lines)
  • DataJobLineageAdd.java (63 lines)
  • DatasetAdd.java (93 lines)
  • DatasetCustomPropertiesAdd.java (49 lines)
  • DatasetCustomPropertiesAddRemove.java (49 lines)
  • DatasetCustomPropertiesReplace.java (48 lines)
  • DatasetStructuredPropertiesUpdate.java (79 lines)
  • FormCreate.java (68 lines)
  • FormUpdate.java (57 lines)
  • StructuredPropertyUpsert.java (102 lines)
  • TagCreate.java (35 lines)

Example Catalog

Example Entity Type Operation SDK Pattern File
DataFlow Create DataFlow Create (basic) DataHubClientV2 / Fluent Builder DataFlowCreateExample.java
DataFlow Full DataFlow Create (comprehensive) DataHubClientV2 / Fluent Builder DataFlowFullExample.java
DataJob Lineage Add DataJob Patch (add lineage) PatchBuilder DataJobLineageAdd.java
Dataset Add Dataset Upsert (schema metadata) MCP Wrapper DatasetAdd.java
Dataset Custom Properties Add Dataset Patch (add properties) PatchBuilder DatasetCustomPropertiesAdd.java
Dataset Custom Properties Add/Remove Dataset Patch (add and remove) PatchBuilder DatasetCustomPropertiesAddRemove.java
Dataset Custom Properties Replace Dataset Patch (replace all) PatchBuilder DatasetCustomPropertiesReplace.java
Dataset Structured Properties Update Dataset Patch (structured props) PatchBuilder DatasetStructuredPropertiesUpdate.java
Form Create Form Upsert (create form) MCP Wrapper FormCreate.java
Form Update Form Patch (update form) PatchBuilder FormUpdate.java
Structured Property Upsert Structured Property Upsert (create/update) PatchBuilder StructuredPropertyUpsert.java
Tag Create Tag Upsert (create tag) MCP Wrapper TagCreate.java

Common Patterns

Pattern 1: MetadataChangeProposalWrapper (Full Aspect Upsert)

The MCP Wrapper pattern is used when you want to set or replace an entire aspect on an entity. It constructs a MetadataChangeProposalWrapper with a builder, specifying the entity type, URN, and the aspect object. This is the simplest approach for creating new entities with initial metadata.

Key classes:

  • datahub.event.MetadataChangeProposalWrapper
  • datahub.client.rest.RestEmitter
  • Various aspect classes from com.linkedin.*

Typical flow:

// 1. Construct the aspect object
SomeAspect aspect = new SomeAspect().setField1("value1").setField2("value2");

// 2. Build the MCP wrapper
MetadataChangeProposalWrapper mcpw =
    MetadataChangeProposalWrapper.builder()
        .entityType("dataset")
        .entityUrn("urn:li:dataset:(urn:li:dataPlatform:hive,my_table,PROD)")
        .upsert()
        .aspect(aspect)
        .build();

// 3. Emit via RestEmitter
RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080").token(token));
Future<MetadataWriteResponse> response = emitter.emit(mcpw, null);
System.out.println(response.get().getResponseContent());

Pattern 2: PatchBuilder (Incremental Field-Level Updates)

The PatchBuilder pattern is used for incremental modifications that should not overwrite unrelated fields. Each entity/aspect type has a dedicated PatchBuilder class (e.g., DatasetPropertiesPatchBuilder, DataJobInputOutputPatchBuilder). The builder produces a MetadataChangeProposal that applies JSON Patch semantics.

Key classes:

  • com.linkedin.metadata.aspect.patch.builder.DatasetPropertiesPatchBuilder
  • com.linkedin.metadata.aspect.patch.builder.DataJobInputOutputPatchBuilder
  • com.linkedin.metadata.aspect.patch.builder.StructuredPropertiesPatchBuilder
  • com.linkedin.metadata.aspect.patch.builder.StructuredPropertyDefinitionPatchBuilder
  • com.linkedin.metadata.aspect.patch.builder.FormInfoPatchBuilder

Typical flow:

// 1. Build the patch MCP via a dedicated PatchBuilder
MetadataChangeProposal mcp =
    new DatasetPropertiesPatchBuilder()
        .urn(UrnUtils.toDatasetUrn("hive", "fct_users_deleted", "PROD"))
        .addCustomProperty("cluster_name", "datahubproject.acryl.io")
        .addCustomProperty("retention_time", "2 years")
        .build();

// 2. Emit via RestEmitter
RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080").token(token));
Future<MetadataWriteResponse> response = emitter.emit(mcp);
System.out.println(response.get().getResponseContent());

Pattern 3: DataHubClientV2 with Entity Builders (Hybrid)

Some V1-level examples (DataFlowCreateExample, DataFlowFullExample) use the newer DataHubClientV2 client with fluent entity builders. These demonstrate the transition toward the V2 API style while still residing in the top-level examples package.

Key classes:

  • datahub.client.v2.DataHubClientV2
  • datahub.client.v2.entity.DataFlow

Typical flow:

try (DataHubClientV2 client =
    DataHubClientV2.builder().server("http://localhost:8080").token(token).build()) {

    DataFlow dataFlow = DataFlow.builder()
        .orchestrator("airflow")
        .flowId("my_pipeline")
        .cluster("prod")
        .displayName("My Pipeline")
        .description("A sample pipeline")
        .build();

    dataFlow.addTag("etl").addTag("production");
    dataFlow.addOwner("urn:li:corpuser:user1", OwnershipType.TECHNICAL_OWNER);
    dataFlow.addCustomProperty("schedule", "0 2 * * *");

    client.entities().upsert(dataFlow);
}

Usage Examples

Example 1: Creating a Tag (MCP Wrapper Pattern)

The simplest V1 example, demonstrating how to create a Tag entity with a name and description.

Source: TagCreate.java

package io.datahubproject.examples;

import com.linkedin.tag.TagProperties;
import datahub.client.MetadataWriteResponse;
import datahub.client.rest.RestEmitter;
import datahub.event.MetadataChangeProposalWrapper;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;

public class TagCreate {

  private TagCreate() {}

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    TagProperties tagProperties =
        new TagProperties()
            .setName("Deprecated")
            .setDescription("Having this tag means this column or table is deprecated.");

    MetadataChangeProposalWrapper mcpw =
        MetadataChangeProposalWrapper.builder()
            .entityType("tag")
            .entityUrn("urn:li:tag:deprecated")
            .upsert()
            .aspect(tagProperties)
            .build();

    String token = "";
    RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080").token(token));
    Future<MetadataWriteResponse> response = emitter.emit(mcpw, null);
    System.out.println(response.get().getResponseContent());
  }
}

Example 2: Adding Dataset Schema Metadata (MCP Wrapper Pattern)

Demonstrates creating a Dataset entity with schema metadata including multiple typed fields.

Source: DatasetAdd.java

package io.datahubproject.examples;

import com.linkedin.common.AuditStamp;
import com.linkedin.common.urn.CorpuserUrn;
import com.linkedin.common.urn.DataPlatformUrn;
import com.linkedin.common.urn.DatasetUrn;
import com.linkedin.common.urn.UrnUtils;
import com.linkedin.schema.*;
import datahub.client.MetadataWriteResponse;
import datahub.client.rest.RestEmitter;
import datahub.event.MetadataChangeProposalWrapper;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;

public class DatasetAdd {

  private DatasetAdd() {}

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    DatasetUrn datasetUrn = UrnUtils.toDatasetUrn("hive", "fct_users_deleted", "PROD");
    CorpuserUrn userUrn = new CorpuserUrn("ingestion");
    AuditStamp lastModified = new AuditStamp().setTime(1640692800000L).setActor(userUrn);

    SchemaMetadata schemaMetadata =
        new SchemaMetadata()
            .setSchemaName("customer")
            .setPlatform(new DataPlatformUrn("hive"))
            .setVersion(0L)
            .setHash("")
            .setPlatformSchema(
                SchemaMetadata.PlatformSchema.create(
                    new OtherSchema().setRawSchema("__insert raw schema here__")))
            .setLastModified(lastModified);

    SchemaFieldArray fields = new SchemaFieldArray();

    SchemaField field1 =
        new SchemaField()
            .setFieldPath("address.zipcode")
            .setType(
                new SchemaFieldDataType()
                    .setType(SchemaFieldDataType.Type.create(new StringType())))
            .setNativeDataType("VARCHAR(50)")
            .setDescription("This is the zipcode of the address.")
            .setLastModified(lastModified);
    fields.add(field1);

    schemaMetadata.setFields(fields);

    MetadataChangeProposalWrapper mcpw =
        MetadataChangeProposalWrapper.builder()
            .entityType("dataset")
            .entityUrn(datasetUrn)
            .upsert()
            .aspect(schemaMetadata)
            .build();

    String token = "";
    RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080").token(token));
    Future<MetadataWriteResponse> response = emitter.emit(mcpw, null);
    System.out.println(response.get().getResponseContent());
  }
}

Example 3: Incremental Custom Property Updates (PatchBuilder Pattern)

Demonstrates adding, removing, and replacing custom properties on a Dataset without affecting other existing properties.

Source: DatasetCustomPropertiesAdd.java

// Adding properties without affecting existing ones
MetadataChangeProposal datasetPropertiesProposal =
    new DatasetPropertiesPatchBuilder()
        .urn(UrnUtils.toDatasetUrn("hive", "fct_users_deleted", "PROD"))
        .addCustomProperty("cluster_name", "datahubproject.acryl.io")
        .addCustomProperty("retention_time", "2 years")
        .build();

Source: DatasetCustomPropertiesAddRemove.java

// Adding one property and removing another in a single patch
MetadataChangeProposal datasetPropertiesProposal =
    new DatasetPropertiesPatchBuilder()
        .urn(UrnUtils.toDatasetUrn("hive", "fct_users_deleted", "PROD"))
        .addCustomProperty("cluster_name", "datahubproject.acryl.io")
        .removeCustomProperty("retention_time")
        .build();

Source: DatasetCustomPropertiesReplace.java

// Replacing the entire custom properties map
Map<String, String> customPropsMap = new HashMap<>();
customPropsMap.put("cluster_name", "datahubproject.acryl.io");
customPropsMap.put("retention_time", "2 years");

MetadataChangeProposal datasetPropertiesProposal =
    new DatasetPropertiesPatchBuilder()
        .urn(UrnUtils.toDatasetUrn("hive", "fct_users_deleted", "PROD"))
        .setCustomProperties(customPropsMap)
        .build();

Key Concepts

URN Construction

DataHub uses URN (Uniform Resource Name) identifiers for all entities. V1 examples demonstrate several URN construction methods:

Method Example Description
UrnUtils.toDatasetUrn() UrnUtils.toDatasetUrn("hive", "fct_users_deleted", "PROD") Convenience method for Dataset URNs
UrnUtils.getUrn() UrnUtils.getUrn("urn:li:structuredProperty:testString") Parse any URN string
new DatasetUrn() DatasetUrn.createFromString("urn:li:dataset:...") Type-safe Dataset URN creation
String literal "urn:li:tag:deprecated" Direct URN string (used with MCP wrapper)

RestEmitter Lifecycle

The RestEmitter should be properly closed after use. Some examples use try-with-resources, while others call emitter.close() in a finally block:

// Try-with-resources (preferred)
try (RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080").token(token))) {
    // ... emit metadata
}

// Manual close (also common in examples)
RestEmitter emitter = RestEmitter.create(b -> b.server("http://localhost:8080").token(token));
try {
    // ... emit metadata
} finally {
    emitter.close();
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment