Implementation:Datahub project Datahub Java SDK V2 Examples
Implementation: Java SDK V2 Examples
| Knowledge Sources | |
|---|---|
| Domains | Java_SDK, Examples |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Description
The Java SDK V2 examples demonstrate how to create and manage DataHub metadata entities using the second-generation Java SDK. The V2 SDK introduces a fluent builder pattern with type-safe entity construction, a unified DataHubClientV2 client, and a patch accumulation model that collects metadata changes in memory before emitting them in a single upsert operation.
The V2 examples cover eight entity types: Chart, Container, Dashboard, DataJob, Dataset, MLModel, and MLModelGroup. Each entity type has up to three example files: a Create example (basic usage), a Full example (comprehensive metadata), and a Lineage example (input/output dataset relationships).
All V2 examples reside in the package io.datahubproject.examples.v2 under the metadata-integration/java/examples module.
Usage
To run these examples against a local DataHub instance:
# Set environment variables (optional, defaults to localhost:8080)
export DATAHUB_SERVER="http://localhost:8080"
export DATAHUB_TOKEN="your-token-here"
# Run any example:
cd metadata-integration/java/examples
../../../gradlew run -PmainClass=io.datahubproject.examples.v2.ChartCreateExample
All V2 examples use the DataHubClientV2 client:
DataHubClientV2 client =
DataHubClientV2.builder()
.server(System.getenv().getOrDefault("DATAHUB_SERVER", "http://localhost:8080"))
.token(System.getenv("DATAHUB_TOKEN"))
.build();
Code Reference
Source Location
All V2 example files are located at:
metadata-integration/java/examples/src/main/java/io/datahubproject/examples/v2/
Files:
ChartCreateExample.java(76 lines)ChartFullExample.java(153 lines)ChartLineageExample.java(258 lines)ContainerCreateExample.java(85 lines)ContainerFullExample.java(197 lines)DashboardCreateExample.java(88 lines)DashboardFullExample.java(173 lines)DashboardLineageExample.java(249 lines)DataJobCreateExample.java(87 lines)DataJobFullExample.java(153 lines)DataJobLineageExample.java(296 lines)DatasetPatchExample.java(72 lines)MLModelCreateExample.java(106 lines)MLModelFullExample.java(209 lines)MLModelGroupCreateExample.java(91 lines)MLModelGroupFullExample.java(165 lines)
Example Catalog
| Example | Entity Type | Category | Key Operations | File |
|---|---|---|---|---|
| Chart Create | Chart | Create | Builder, custom properties, upsert | ChartCreateExample.java
|
| Chart Full | Chart | Full | Tags, owners, terms, domain, chart type, access, lineage | ChartFullExample.java
|
| Chart Lineage | Chart | Lineage | setInputDatasets, addInputDataset, removeInputDataset | ChartLineageExample.java
|
| Container Create | Container | Create | Database container, tags, owners, domain | ContainerCreateExample.java
|
| Container Full | Container | Full | Schema container, sub-containers, comprehensive metadata | ContainerFullExample.java
|
| Dashboard Create | Dashboard | Create | Builder, tags, owners, custom properties | DashboardCreateExample.java
|
| Dashboard Full | Dashboard | Full | Tags, owners, terms, domain, charts, access, URLs | DashboardFullExample.java
|
| Dashboard Lineage | Dashboard | Lineage | Chart associations, input dataset lineage | DashboardLineageExample.java
|
| DataJob Create | DataJob | Create | Builder, orchestrator, flow/job IDs | DataJobCreateExample.java
|
| DataJob Full | DataJob | Full | Tags, owners, terms, domain, inlets, outlets, custom props | DataJobFullExample.java
|
| DataJob Lineage | DataJob | Lineage | Input/output datasets, pipeline modeling, DatasetUrn types | DataJobLineageExample.java
|
| Dataset Patch | Dataset | Patch | Patch accumulation, incremental updates | DatasetPatchExample.java
|
| MLModel Create | MLModel | Create | Training metrics, hyperparameters, tags, owners | MLModelCreateExample.java
|
| MLModel Full | MLModel | Full | Comprehensive metrics, hyperparams, terms, domain, lineage | MLModelFullExample.java
|
| MLModelGroup Create | MLModelGroup | Create | Model group, training job references | MLModelGroupCreateExample.java
|
| MLModelGroup Full | MLModelGroup | Full | Comprehensive metadata, model references, training jobs | MLModelGroupFullExample.java
|
V2 SDK Patterns
Pattern 1: Fluent Entity Builders
Every V2 entity type has a .builder() method that returns a type-safe builder. The builder enforces required fields (such as platform, IDs, and environment) and provides optional metadata setters. The .build() call constructs the entity object and generates its URN automatically.
Supported entity builders:
| Entity Class | Required Builder Fields | URN Format |
|---|---|---|
Chart |
tool, id |
urn:li:chart:(tool,id)
|
Container |
platform, database (or schema), env |
urn:li:container:...
|
Dashboard |
tool, id |
urn:li:dashboard:(tool,id)
|
DataJob |
orchestrator, flowId, cluster, jobId |
urn:li:dataJob:(urn:li:dataFlow:(orch,flow,cluster),jobId)
|
Dataset |
platform, name, env |
urn:li:dataset:(urn:li:dataPlatform:platform,name,ENV)
|
MLModel |
platform, name, env |
urn:li:mlModel:(urn:li:dataPlatform:platform,name,ENV)
|
MLModelGroup |
platform, groupId, env |
urn:li:mlModelGroup:(urn:li:dataPlatform:platform,groupId,ENV)
|
Pattern 2: Patch Accumulation
V2 entities accumulate metadata changes as pending patches in memory. Changes are not sent to DataHub until client.entities().upsert(entity) is called. This allows batching multiple metadata operations (tags, owners, properties, lineage) into a single network call.
// Patches accumulate in memory
entity.addTag("pii");
entity.addOwner("urn:li:corpuser:user1", OwnershipType.TECHNICAL_OWNER);
entity.addCustomProperty("team", "data-engineering");
// Check how many patches are pending
System.out.println("Pending patches: " + entity.getPendingPatches().size());
// Emit all patches in a single upsert
client.entities().upsert(entity);
// After upsert, pending patches are cleared
// New patches can be accumulated for further updates
entity.addTag("gdpr");
client.entities().upsert(entity);
Pattern 3: Common Metadata Operations
All V2 entity types share a consistent set of metadata operations through method chaining:
| Operation | Method | Description |
|---|---|---|
| Add Tag | entity.addTag("tag-name") |
Adds a tag to the entity |
| Add Owner | entity.addOwner("urn:li:corpuser:user", OwnershipType.TECHNICAL_OWNER) |
Adds an owner with a specific role |
| Add Glossary Term | entity.addTerm("urn:li:glossaryTerm:TermName") |
Associates a glossary term |
| Set Domain | entity.setDomain("urn:li:domain:DomainName") |
Assigns the entity to a domain |
| Add Custom Property | entity.addCustomProperty("key", "value") |
Adds a key-value custom property |
Pattern 4: Lineage Management
Entities that participate in data lineage (Chart, Dashboard, DataJob) provide methods for managing input and output dataset relationships:
DataJob lineage (input datasets / output datasets):
// Add individual input/output datasets
dataJob.addInputDataset("urn:li:dataset:(urn:li:dataPlatform:snowflake,raw.orders,PROD)");
dataJob.addOutputDataset("urn:li:dataset:(urn:li:dataPlatform:snowflake,analytics.orders,PROD)");
// Set multiple input/output datasets at once
dataJob.setInputDatasets(Arrays.asList("urn1", "urn2", "urn3"));
dataJob.setOutputDatasets(Arrays.asList("urn4", "urn5"));
Chart lineage (input datasets):
// Using DatasetUrn objects for type safety
DatasetUrn salesDataset = DatasetUrn.createFromString(
"urn:li:dataset:(urn:li:dataPlatform:snowflake,sales.transactions,PROD)");
chart.setInputDatasets(Arrays.asList(salesDataset, customerDataset));
chart.addInputDataset(additionalDataset);
chart.removeInputDataset(legacyDataset);
Pattern 5: Connection Testing
All V2 examples include a connection test before performing operations:
if (!client.testConnection()) {
System.err.println("Failed to connect to DataHub server");
return;
}
Usage Examples
Example 1: Creating a Chart (Basic)
Demonstrates the minimal V2 pattern for creating an entity with custom properties.
Source: ChartCreateExample.java
package io.datahubproject.examples.v2;
import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.Chart;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ExecutionException;
public class ChartCreateExample {
public static void main(String[] args)
throws IOException, ExecutionException, InterruptedException {
DataHubClientV2 client =
DataHubClientV2.builder()
.server(System.getenv().getOrDefault("DATAHUB_SERVER", "http://localhost:8080"))
.token(System.getenv("DATAHUB_TOKEN"))
.build();
try {
if (!client.testConnection()) {
System.err.println("Failed to connect to DataHub server");
return;
}
Map<String, String> customProperties = new HashMap<>();
customProperties.put("dashboard", "executive_summary");
customProperties.put("refresh_rate", "daily");
customProperties.put("data_source", "snowflake.analytics.revenue_summary");
customProperties.put("chart_type", "stacked_bar");
Chart chart =
Chart.builder()
.tool("looker")
.id("customer_revenue_by_region")
.title("Revenue by Region")
.description("Monthly revenue breakdown by geographic region "
+ "with year-over-year comparison")
.customProperties(customProperties)
.build();
client.entities().upsert(chart);
System.out.println("Successfully created chart: " + chart.getUrn());
} finally {
client.close();
}
}
}
Example 2: Creating an ML Model with Metrics and Hyperparameters
Demonstrates ML-specific metadata operations including training metrics and hyperparameters alongside standard tags, owners, and custom properties.
Source: MLModelCreateExample.java
package io.datahubproject.examples.v2;
import com.linkedin.common.OwnershipType;
import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.MLModel;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
public class MLModelCreateExample {
public static void main(String[] args)
throws IOException, ExecutionException, InterruptedException {
DataHubClientV2 client =
DataHubClientV2.builder()
.server(System.getenv().getOrDefault("DATAHUB_SERVER", "http://localhost:8080"))
.token(System.getenv("DATAHUB_TOKEN"))
.build();
try {
if (!client.testConnection()) {
System.err.println("Failed to connect to DataHub server");
return;
}
// Build ML model with basic metadata
MLModel model =
MLModel.builder()
.platform("tensorflow")
.name("user_churn_predictor")
.env("PROD")
.displayName("User Churn Prediction Model")
.description("XGBoost model predicting user churn probability "
+ "based on user behavior")
.build();
// Add training metrics
model
.addTrainingMetric("accuracy", "0.94")
.addTrainingMetric("precision", "0.93")
.addTrainingMetric("recall", "0.91")
.addTrainingMetric("f1_score", "0.92")
.addTrainingMetric("auc_roc", "0.96");
// Add hyperparameters
model
.addHyperParam("learning_rate", "0.01")
.addHyperParam("max_depth", "6")
.addHyperParam("n_estimators", "100")
.addHyperParam("subsample", "0.8");
// Add tags, owners, custom properties
model.addTag("production").addTag("ml-model").addTag("classification");
model
.addOwner("urn:li:corpuser:ml_team", OwnershipType.TECHNICAL_OWNER)
.addOwner("urn:li:corpuser:data_science", OwnershipType.DATA_STEWARD);
model
.addCustomProperty("framework", "XGBoost")
.addCustomProperty("training_date", "2025-10-15")
.addCustomProperty("model_version", "1.0.0");
client.entities().upsert(model);
System.out.println("Successfully created ML model: " + model.getUrn());
} finally {
client.close();
}
}
}
Example 3: Comprehensive DataJob with Lineage
Demonstrates creating a DataJob with all available metadata types including lineage relationships (input and output datasets).
Source: DataJobFullExample.java
package io.datahubproject.examples.v2;
import com.linkedin.common.OwnershipType;
import datahub.client.v2.DataHubClientV2;
import datahub.client.v2.entity.DataJob;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
public class DataJobFullExample {
public static void main(String[] args)
throws IOException, ExecutionException, InterruptedException {
DataHubClientV2 client =
DataHubClientV2.builder()
.server(System.getenv().getOrDefault("DATAHUB_SERVER", "http://localhost:8080"))
.token(System.getenv("DATAHUB_TOKEN"))
.build();
try {
if (!client.testConnection()) {
System.err.println("Failed to connect to DataHub server");
return;
}
// Build DataJob with builder
DataJob dataJob =
DataJob.builder()
.orchestrator("airflow")
.flowId("financial_reporting_pipeline")
.cluster("prod")
.jobId("aggregate_customer_transactions")
.description("Critical ETL job that aggregates daily customer "
+ "transaction data from multiple sources")
.name("Aggregate Customer Transactions")
.type("BATCH")
.build();
// Add tags for categorization
dataJob.addTag("critical").addTag("pii").addTag("financial")
.addTag("etl").addTag("production");
// Add owners with different roles
dataJob
.addOwner("urn:li:corpuser:data_engineering", OwnershipType.TECHNICAL_OWNER)
.addOwner("urn:li:corpuser:finance_team", OwnershipType.BUSINESS_OWNER)
.addOwner("urn:li:corpuser:compliance_team", OwnershipType.DATA_STEWARD);
// Add glossary terms for business context
dataJob
.addTerm("urn:li:glossaryTerm:DataProcessing")
.addTerm("urn:li:glossaryTerm:ETL")
.addTerm("urn:li:glossaryTerm:FinancialReporting");
// Set domain
dataJob.setDomain("urn:li:domain:Finance");
// Add custom properties
dataJob
.addCustomProperty("schedule", "0 2 * * *")
.addCustomProperty("sla_hours", "4")
.addCustomProperty("priority", "high");
// Define lineage: input datasets (what this job reads)
dataJob
.addInputDataset(
"urn:li:dataset:(urn:li:dataPlatform:snowflake,raw.transactions,PROD)")
.addInputDataset(
"urn:li:dataset:(urn:li:dataPlatform:snowflake,raw.customers,PROD)");
// Define lineage: output datasets (what this job writes)
dataJob
.addOutputDataset(
"urn:li:dataset:(urn:li:dataPlatform:snowflake,"
+ "analytics.customer_transactions,PROD)")
.addOutputDataset(
"urn:li:dataset:(urn:li:dataPlatform:snowflake,"
+ "analytics.daily_summary,PROD)");
// All patches are emitted in a single upsert
client.entities().upsert(dataJob);
System.out.println("Successfully created DataJob: " + dataJob.getUrn());
} finally {
client.close();
}
}
}
Entity-Specific Features
Chart
The Chart entity supports chart-specific properties beyond the common metadata operations:
chart.setChartType("BAR"); // BAR, LINE, PIE, TABLE, TEXT, BOXPLOT
chart.setAccess("PUBLIC"); // PUBLIC or PRIVATE
chart.setExternalUrl("https://looker.company.com/charts/my_chart");
chart.setChartUrl("https://looker.company.com/embed/charts/my_chart");
chart.setLastRefreshed(System.currentTimeMillis());
chart.setInputDatasets(Arrays.asList(dataset1, dataset2));
Container
Containers represent logical groupings such as databases or schemas:
Container database = Container.builder()
.platform("snowflake")
.database("analytics_db")
.env("PROD")
.displayName("Analytics Database")
.description("Production database for analytics")
.qualifiedName("prod.snowflake.analytics_db")
.build();
MLModel and MLModelGroup
ML entities support training metrics, hyperparameters, and model group associations:
// MLModel-specific operations
model.addTrainingMetric("accuracy", "0.94");
model.addHyperParam("learning_rate", "0.01");
// MLModelGroup-specific operations
modelGroup.addTrainingJob("urn:li:dataProcessInstance:training_pipeline");
V1 vs V2 Comparison
| Feature | V1 SDK | V2 SDK |
|---|---|---|
| Client | RestEmitter |
DataHubClientV2
|
| Entity Construction | Manual aspect objects + MCP Wrapper | Fluent .builder() pattern
|
| URN Management | Manual URN construction | Auto-generated from builder fields |
| Metadata Updates | Individual emit() calls per aspect |
Patch accumulation + single upsert()
|
| Incremental Updates | Dedicated PatchBuilder classes |
Built-in addTag(), addOwner(), etc.
|
| Connection Testing | Not built-in | client.testConnection()
|
| Type Safety | Varies (string URNs or typed URNs) | Strongly typed builders |
| Resource Management | Manual emitter.close() |
client.close() or try-with-resources
|
Related Pages
- Datahub_project_Datahub_Java_SDK_V1_Examples - Legacy V1 SDK examples using RestEmitter and PatchBuilder patterns
- Datahub_project_Datahub - Main DataHub repository