Principle:Datahub project Datahub Java SDK Dependency Setup

Field	Value
Principle Name	Java SDK Dependency Setup
Category	Dependency Management
Status	Active
Last Updated	2026-02-10
Repository	Datahub_project_Datahub

Overview

The practice of adding the DataHub Java SDK V2 as a project dependency via Maven or Gradle. This principle governs how Java applications integrate with the DataHub metadata platform by declaring the correct build dependency, which provisions all type-safe entity management classes required for metadata operations.

Description

Java SDK dependency setup provisions the type-safe entity management classes needed to interact with DataHub programmatically. The SDK packages entity builders, aspect traits (HasTags, HasOwners, HasGlossaryTerms, HasDomains), and the RestEmitter-backed client into a single artifact published under the io.acryl group.

The datahub-client artifact is built as a shadow (fat) JAR that relocates third-party namespaces to avoid classpath conflicts. Key relocated packages include Spring Framework, Jackson, Apache HTTP Client, Kafka, and Guava. This ensures the SDK can be embedded in diverse Java environments without dependency version collisions.

The artifact is published to Maven Central via Sonatype's OSSRH staging repository, supporting both release and snapshot versions. The build configuration includes:

API dependencies exposed transitively: entity-registry, datahub-event, slf4j-api
Implementation dependencies shaded into the JAR: datahub-schematron, kafka-avro-serializer, avro, httpClient, jackson, aws-s3
Compile-only dependencies not included in the artifact: lombok, guava

Usage

When building Java applications that need to create, read, update, or delete metadata entities in DataHub. This is the entry point for any Java-based integration, whether it is a data pipeline, a microservice, or a standalone tool that needs to register datasets, dashboards, charts, or other entities in the DataHub catalog.

Typical scenarios include:

Custom ingestion pipelines that push metadata from proprietary systems
CI/CD pipelines that register or update dataset lineage
Microservices that programmatically manage tags, owners, and glossary terms
Data governance tools that enforce metadata policies

Theoretical Basis

Dependency management via build tools (Maven/Gradle) ensures consistent versioning and transitive dependency resolution. The shadow JAR pattern (also known as fat JAR or uber JAR) prevents classpath conflicts by relocating third-party namespaces into a vendor-specific package hierarchy. This follows the principle of encapsulation -- the SDK's internal dependencies are isolated from the consuming application's dependency tree.

The Maven Central publication model follows the standard Java artifact distribution pattern, enabling reproducible builds and version pinning through standard dependency management mechanisms.

Knowledge Sources

Domains

Data_Integration, Metadata_Management, Java_SDK

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment