Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Datahub project Datahub Java SDK Dependency Setup

From Leeroopedia


Field Value
Principle Name Java SDK Dependency Setup
Category Dependency Management
Status Active
Last Updated 2026-02-10
Repository Datahub_project_Datahub

Overview

The practice of adding the DataHub Java SDK V2 as a project dependency via Maven or Gradle. This principle governs how Java applications integrate with the DataHub metadata platform by declaring the correct build dependency, which provisions all type-safe entity management classes required for metadata operations.

Description

Java SDK dependency setup provisions the type-safe entity management classes needed to interact with DataHub programmatically. The SDK packages entity builders, aspect traits (HasTags, HasOwners, HasGlossaryTerms, HasDomains), and the RestEmitter-backed client into a single artifact published under the io.acryl group.

The datahub-client artifact is built as a shadow (fat) JAR that relocates third-party namespaces to avoid classpath conflicts. Key relocated packages include Spring Framework, Jackson, Apache HTTP Client, Kafka, and Guava. This ensures the SDK can be embedded in diverse Java environments without dependency version collisions.

The artifact is published to Maven Central via Sonatype's OSSRH staging repository, supporting both release and snapshot versions. The build configuration includes:

  • API dependencies exposed transitively: entity-registry, datahub-event, slf4j-api
  • Implementation dependencies shaded into the JAR: datahub-schematron, kafka-avro-serializer, avro, httpClient, jackson, aws-s3
  • Compile-only dependencies not included in the artifact: lombok, guava

Usage

When building Java applications that need to create, read, update, or delete metadata entities in DataHub. This is the entry point for any Java-based integration, whether it is a data pipeline, a microservice, or a standalone tool that needs to register datasets, dashboards, charts, or other entities in the DataHub catalog.

Typical scenarios include:

  • Custom ingestion pipelines that push metadata from proprietary systems
  • CI/CD pipelines that register or update dataset lineage
  • Microservices that programmatically manage tags, owners, and glossary terms
  • Data governance tools that enforce metadata policies

Theoretical Basis

Dependency management via build tools (Maven/Gradle) ensures consistent versioning and transitive dependency resolution. The shadow JAR pattern (also known as fat JAR or uber JAR) prevents classpath conflicts by relocating third-party namespaces into a vendor-specific package hierarchy. This follows the principle of encapsulation -- the SDK's internal dependencies are isolated from the consuming application's dependency tree.

The Maven Central publication model follows the standard Java artifact distribution pattern, enabling reproducible builds and version pinning through standard dependency management mechanisms.

Related

Implementation:Datahub_project_Datahub_Datahub_Client_Dependency

Knowledge Sources

Domains

Data_Integration, Metadata_Management, Java_SDK

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment