Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datahub project Datahub Java SDK V2 Examples

From Leeroopedia


Implementation: Java SDK V2 Examples

Knowledge Sources
Domains Java_SDK, Examples
Last Updated 2026-02-10 00:00 GMT

Overview

Description

The Java SDK V2 examples demonstrate how to create and manage DataHub metadata entities using the second-generation Java SDK. The V2 SDK introduces a fluent builder pattern with type-safe entity construction, a unified DataHubClientV2 client, and a patch accumulation model that collects metadata changes in memory before emitting them in a single upsert operation.

The V2 examples cover eight entity types: Chart, Container, Dashboard, DataJob, Dataset, MLModel, and MLModelGroup. Each entity type has up to three example files: a Create example (basic usage), a Full example (comprehensive metadata), and a Lineage example (input/output dataset relationships).

All V2 examples reside in the package io.datahubproject.examples.v2 under the metadata-integration/java/examples module.

Usage

To run these examples against a local DataHub instance:

# Set environment variables (optional, defaults to localhost:8080)
export DATAHUB_SERVER="http://localhost:8080"
export DATAHUB_TOKEN="your-token-here"

# Run any example:
cd metadata-integration/java/examples
../../../gradlew run -PmainClass=io.datahubproject.examples.v2.ChartCreateExample

All V2 examples use the DataHubClientV2 client:

DataHubClientV2 client =
    DataHubClientV2.builder()
        .server(System.getenv().getOrDefault("DATAHUB_SERVER", "http://localhost:8080"))
        .token(System.getenv("DATAHUB_TOKEN"))
        .build();

Code Reference

Source Location

All V2 example files are located at:

metadata-integration/java/examples/src/main/java/io/datahubproject/examples/v2/

Files:

  • ChartCreateExample.java (76 lines)
  • ChartFullExample.java (153 lines)
  • ChartLineageExample.java (258 lines)
  • ContainerCreateExample.java (85 lines)
  • ContainerFullExample.java (197 lines)
  • DashboardCreateExample.java (88 lines)
  • DashboardFullExample.java (173 lines)
  • DashboardLineageExample.java (249 lines)
  • DataJobCreateExample.java (87 lines)
  • DataJobFullExample.java (153 lines)
  • DataJobLineageExample.java (296 lines)
  • DatasetPatchExample.java (72 lines)
  • MLModelCreateExample.java (106 lines)
  • MLModelFullExample.java (209 lines)
  • MLModelGroupCreateExample.java (91 lines)
  • MLModelGroupFullExample.java (165 lines)

Example Catalog

Example Entity Type Category Key Operations File
Chart Create Chart Create Builder, custom properties, upsert ChartCreateExample.java
Chart Full Chart Full Tags, owners, terms, domain, chart type, access, lineage ChartFullExample.java
Chart Lineage Chart Lineage setInputDatasets, addInputDataset, removeInputDataset ChartLineageExample.java
Container Create Container Create Database container, tags, owners, domain ContainerCreateExample.java
Container Full Container Full Schema container, sub-containers, comprehensive metadata ContainerFullExample.java
Dashboard Create Dashboard Create Builder, tags, owners, custom properties DashboardCreateExample.java
Dashboard Full Dashboard Full Tags, owners, terms, domain, charts, access, URLs DashboardFullExample.java
Dashboard Lineage Dashboard Lineage Chart associations, input dataset lineage DashboardLineageExample.java
DataJob Create DataJob Create Builder, orchestrator, flow/job IDs DataJobCreateExample.java
DataJob Full DataJob Full Tags, owners, terms, domain, inlets, outlets, custom props DataJobFullExample.java
DataJob Lineage DataJob Lineage Input/output datasets, pipeline modeling, DatasetUrn types DataJobLineageExample.java
Dataset Patch Dataset Patch Patch accumulation, incremental updates DatasetPatchExample.java
MLModel Create MLModel Create Training metrics, hyperparameters, tags, owners MLModelCreateExample.java
MLModel Full MLModel Full Comprehensive metrics, hyperparams, terms, domain, lineage MLModelFullExample.java
MLModelGroup Create MLModelGroup Create Model group, training job references MLModelGroupCreateExample.java
MLModelGroup Full MLModelGroup Full Comprehensive metadata, model references, training jobs MLModelGroupFullExample.java

V2 SDK Patterns

Pattern 1: Fluent Entity Builders

Every V2 entity type has a .builder() method that returns a type-safe builder. The builder enforces required fields (such as platform, IDs, and environment) and provides optional metadata setters. The .build() call constructs the entity object and generates its URN automatically.

Supported entity builders:

Entity Class Required Builder Fields URN Format
Chart tool, id urn:li:chart:(tool,id)
Container platform, database (or schema), env urn:li:container:...
Dashboard tool, id urn:li:dashboard:(tool,id)
DataJob orchestrator, flowId, cluster, jobId urn:li:dataJob:(urn:li:dataFlow:(orch,flow,cluster),jobId)
Dataset platform, name, env urn:li:dataset:(urn:li:dataPlatform:platform,name,ENV)
MLModel platform, name, env urn:li:mlModel:(urn:li:dataPlatform:platform,name,ENV)
MLModelGroup platform, groupId, env urn:li:mlModelGroup:(urn:li:dataPlatform:platform,groupId,ENV)

Pattern 2: Patch Accumulation

V2 entities accumulate metadata changes as pending patches in memory. Changes are not sent to DataHub until client.entities().upsert(entity) is called. This allows batching multiple metadata operations (tags, owners, properties, lineage) into a single network call.

// Patches accumulate in memory
entity.addTag("pii");
entity.addOwner("urn:li:corpuser:user1", OwnershipType.TECHNICAL_OWNER);
entity.addCustomProperty("team", "data-engineering");

// Check how many patches are pending
System.out.println("Pending patches: " + entity.getPendingPatches().size());

// Emit all patches in a single upsert
client.entities().upsert(entity);

// After upsert, pending patches are cleared
// New patches can be accumulated for further updates
entity.addTag("gdpr");
client.entities().upsert(entity);

Pattern 3: Common Metadata Operations

All V2 entity types share a consistent set of metadata operations through method chaining:

Operation Method Description
Add Tag entity.addTag("tag-name") Adds a tag to the entity
Add Owner entity.addOwner("urn:li:corpuser:user", OwnershipType.TECHNICAL_OWNER) Adds an owner with a specific role
Add Glossary Term entity.addTerm("urn:li:glossaryTerm:TermName") Associates a glossary term
Set Domain entity.setDomain("urn:li:domain:DomainName") Assigns the entity to a domain
Add Custom Property entity.addCustomProperty("key", "value") Adds a key-value custom property

Pattern 4: Lineage Management

Entities that participate in data lineage (Chart, Dashboard, DataJob) provide methods for managing input and output dataset relationships:

DataJob lineage (input datasets / output datasets):

// Add individual input/output datasets
dataJob.addInputDataset("urn:li:dataset:(urn:li:dataPlatform:snowflake,raw.orders,PROD)");
dataJob.addOutputDataset("urn:li:dataset:(urn:li:dataPlatform:snowflake,analytics.orders,PROD)");

// Set multiple input/output datasets at once
dataJob.setInputDatasets(Arrays.asList("urn1", "urn2", "urn3"));
dataJob.setOutputDatasets(Arrays.asList("urn4", "urn5"));

Chart lineage (input datasets):

// Using DatasetUrn objects for type safety
DatasetUrn salesDataset = DatasetUrn.createFromString(
    "urn:li:dataset:(urn:li:dataPlatform:snowflake,sales.transactions,PROD)");

chart.setInputDatasets(Arrays.asList(salesDataset, customerDataset));
chart.addInputDataset(additionalDataset);
chart.removeInputDataset(legacyDataset);

Pattern 5: Connection Testing

All V2 examples include a connection test before performing operations:

if (!client.testConnection()) {
    System.err.println("Failed to connect to DataHub server");
    return;
}

Usage Examples

Example 1: Creating a Chart (Basic)

Demonstrates the minimal V2 pattern for creating an entity with custom properties.

Source: ChartCreateExample.java

package io.datahubproject.examples.v2;

import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.Chart;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ExecutionException;

public class ChartCreateExample {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    DataHubClientV2 client =
        DataHubClientV2.builder()
            .server(System.getenv().getOrDefault("DATAHUB_SERVER", "http://localhost:8080"))
            .token(System.getenv("DATAHUB_TOKEN"))
            .build();

    try {
      if (!client.testConnection()) {
        System.err.println("Failed to connect to DataHub server");
        return;
      }

      Map<String, String> customProperties = new HashMap<>();
      customProperties.put("dashboard", "executive_summary");
      customProperties.put("refresh_rate", "daily");
      customProperties.put("data_source", "snowflake.analytics.revenue_summary");
      customProperties.put("chart_type", "stacked_bar");

      Chart chart =
          Chart.builder()
              .tool("looker")
              .id("customer_revenue_by_region")
              .title("Revenue by Region")
              .description("Monthly revenue breakdown by geographic region "
                  + "with year-over-year comparison")
              .customProperties(customProperties)
              .build();

      client.entities().upsert(chart);
      System.out.println("Successfully created chart: " + chart.getUrn());

    } finally {
      client.close();
    }
  }
}

Example 2: Creating an ML Model with Metrics and Hyperparameters

Demonstrates ML-specific metadata operations including training metrics and hyperparameters alongside standard tags, owners, and custom properties.

Source: MLModelCreateExample.java

package io.datahubproject.examples.v2;

import com.linkedin.common.OwnershipType;
import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.MLModel;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

public class MLModelCreateExample {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    DataHubClientV2 client =
        DataHubClientV2.builder()
            .server(System.getenv().getOrDefault("DATAHUB_SERVER", "http://localhost:8080"))
            .token(System.getenv("DATAHUB_TOKEN"))
            .build();

    try {
      if (!client.testConnection()) {
        System.err.println("Failed to connect to DataHub server");
        return;
      }

      // Build ML model with basic metadata
      MLModel model =
          MLModel.builder()
              .platform("tensorflow")
              .name("user_churn_predictor")
              .env("PROD")
              .displayName("User Churn Prediction Model")
              .description("XGBoost model predicting user churn probability "
                  + "based on user behavior")
              .build();

      // Add training metrics
      model
          .addTrainingMetric("accuracy", "0.94")
          .addTrainingMetric("precision", "0.93")
          .addTrainingMetric("recall", "0.91")
          .addTrainingMetric("f1_score", "0.92")
          .addTrainingMetric("auc_roc", "0.96");

      // Add hyperparameters
      model
          .addHyperParam("learning_rate", "0.01")
          .addHyperParam("max_depth", "6")
          .addHyperParam("n_estimators", "100")
          .addHyperParam("subsample", "0.8");

      // Add tags, owners, custom properties
      model.addTag("production").addTag("ml-model").addTag("classification");
      model
          .addOwner("urn:li:corpuser:ml_team", OwnershipType.TECHNICAL_OWNER)
          .addOwner("urn:li:corpuser:data_science", OwnershipType.DATA_STEWARD);
      model
          .addCustomProperty("framework", "XGBoost")
          .addCustomProperty("training_date", "2025-10-15")
          .addCustomProperty("model_version", "1.0.0");

      client.entities().upsert(model);
      System.out.println("Successfully created ML model: " + model.getUrn());

    } finally {
      client.close();
    }
  }
}

Example 3: Comprehensive DataJob with Lineage

Demonstrates creating a DataJob with all available metadata types including lineage relationships (input and output datasets).

Source: DataJobFullExample.java

package io.datahubproject.examples.v2;

import com.linkedin.common.OwnershipType;
import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.DataJob;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

public class DataJobFullExample {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    DataHubClientV2 client =
        DataHubClientV2.builder()
            .server(System.getenv().getOrDefault("DATAHUB_SERVER", "http://localhost:8080"))
            .token(System.getenv("DATAHUB_TOKEN"))
            .build();

    try {
      if (!client.testConnection()) {
        System.err.println("Failed to connect to DataHub server");
        return;
      }

      // Build DataJob with builder
      DataJob dataJob =
          DataJob.builder()
              .orchestrator("airflow")
              .flowId("financial_reporting_pipeline")
              .cluster("prod")
              .jobId("aggregate_customer_transactions")
              .description("Critical ETL job that aggregates daily customer "
                  + "transaction data from multiple sources")
              .name("Aggregate Customer Transactions")
              .type("BATCH")
              .build();

      // Add tags for categorization
      dataJob.addTag("critical").addTag("pii").addTag("financial")
             .addTag("etl").addTag("production");

      // Add owners with different roles
      dataJob
          .addOwner("urn:li:corpuser:data_engineering", OwnershipType.TECHNICAL_OWNER)
          .addOwner("urn:li:corpuser:finance_team", OwnershipType.BUSINESS_OWNER)
          .addOwner("urn:li:corpuser:compliance_team", OwnershipType.DATA_STEWARD);

      // Add glossary terms for business context
      dataJob
          .addTerm("urn:li:glossaryTerm:DataProcessing")
          .addTerm("urn:li:glossaryTerm:ETL")
          .addTerm("urn:li:glossaryTerm:FinancialReporting");

      // Set domain
      dataJob.setDomain("urn:li:domain:Finance");

      // Add custom properties
      dataJob
          .addCustomProperty("schedule", "0 2 * * *")
          .addCustomProperty("sla_hours", "4")
          .addCustomProperty("priority", "high");

      // Define lineage: input datasets (what this job reads)
      dataJob
          .addInputDataset(
              "urn:li:dataset:(urn:li:dataPlatform:snowflake,raw.transactions,PROD)")
          .addInputDataset(
              "urn:li:dataset:(urn:li:dataPlatform:snowflake,raw.customers,PROD)");

      // Define lineage: output datasets (what this job writes)
      dataJob
          .addOutputDataset(
              "urn:li:dataset:(urn:li:dataPlatform:snowflake,"
              + "analytics.customer_transactions,PROD)")
          .addOutputDataset(
              "urn:li:dataset:(urn:li:dataPlatform:snowflake,"
              + "analytics.daily_summary,PROD)");

      // All patches are emitted in a single upsert
      client.entities().upsert(dataJob);
      System.out.println("Successfully created DataJob: " + dataJob.getUrn());

    } finally {
      client.close();
    }
  }
}

Entity-Specific Features

Chart

The Chart entity supports chart-specific properties beyond the common metadata operations:

chart.setChartType("BAR");    // BAR, LINE, PIE, TABLE, TEXT, BOXPLOT
chart.setAccess("PUBLIC");    // PUBLIC or PRIVATE
chart.setExternalUrl("https://looker.company.com/charts/my_chart");
chart.setChartUrl("https://looker.company.com/embed/charts/my_chart");
chart.setLastRefreshed(System.currentTimeMillis());
chart.setInputDatasets(Arrays.asList(dataset1, dataset2));

Container

Containers represent logical groupings such as databases or schemas:

Container database = Container.builder()
    .platform("snowflake")
    .database("analytics_db")
    .env("PROD")
    .displayName("Analytics Database")
    .description("Production database for analytics")
    .qualifiedName("prod.snowflake.analytics_db")
    .build();

MLModel and MLModelGroup

ML entities support training metrics, hyperparameters, and model group associations:

// MLModel-specific operations
model.addTrainingMetric("accuracy", "0.94");
model.addHyperParam("learning_rate", "0.01");

// MLModelGroup-specific operations
modelGroup.addTrainingJob("urn:li:dataProcessInstance:training_pipeline");

V1 vs V2 Comparison

Feature V1 SDK V2 SDK
Client RestEmitter DataHubClientV2
Entity Construction Manual aspect objects + MCP Wrapper Fluent .builder() pattern
URN Management Manual URN construction Auto-generated from builder fields
Metadata Updates Individual emit() calls per aspect Patch accumulation + single upsert()
Incremental Updates Dedicated PatchBuilder classes Built-in addTag(), addOwner(), etc.
Connection Testing Not built-in client.testConnection()
Type Safety Varies (string URNs or typed URNs) Strongly typed builders
Resource Management Manual emitter.close() client.close() or try-with-resources

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment