Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Datahub project Datahub Entity Metadata Enrichment

From Leeroopedia


Field Value
Principle Name Entity Metadata Enrichment
Category Metadata Management
Status Active
Last Updated 2026-02-10
Repository Datahub_project_Datahub

Overview

The process of adding metadata facets (tags, owners, glossary terms, domains, custom properties) to entity objects via trait interfaces. Entity metadata enrichment uses mixin interfaces (HasTags, HasOwners, HasGlossaryTerms, HasDomains) to provide a uniform API for adding metadata to any entity type that supports those facets.

Description

Metadata enrichment in the DataHub Java SDK V2 is implemented through a Mixin/Trait pattern using Java default interface methods. Each metadata capability is defined as a separate interface:

  • HasTags<T> -- Provides addTag(), removeTag(), setTags(), getTags()
  • HasOwners<T> -- Provides addOwner(), removeOwner(), setOwners(), getOwners()
  • HasGlossaryTerms<T> -- Provides addTerm(), removeTerm(), setTerms(), getTerms()
  • HasDomains<T> -- Provides setDomain(), removeDomain(), clearDomains(), getDomain(), getDomains()

Entity classes compose these interfaces to declare which metadata capabilities they support. For example, Dataset implements all four:

public class Dataset extends Entity
    implements HasTags<Dataset>, HasGlossaryTerms<Dataset>,
               HasOwners<Dataset>, HasDomains<Dataset>, ...

Mutations follow an Accumulate-then-Flush pattern:

  1. Accumulation: Each addTag(), addOwner(), etc. call accumulates a patch operation in an internal AbstractMultiFieldPatchBuilder (e.g., GlobalTagsPatchBuilder, OwnershipPatchBuilder)
  2. Flushing: When upsert() is called, all accumulated patches are built into Metadata Change Proposals (MCPs) and emitted to the server

Two operation modes affect which aspects are written:

  • SDK mode (default): Writes to editable aspects (e.g., editableDatasetProperties for descriptions)
  • INGESTION mode: Writes to system aspects (e.g., datasetProperties for descriptions)

All mutation methods enforce read-only protection -- entities fetched from the server must be converted via .mutable() before they can be modified.

Usage

When adding tags, owners, glossary terms, domains, or custom properties to existing or new entities. This is the primary way to enrich metadata after entity construction:

  • Tagging datasets with classification labels (e.g., "pii", "sensitive")
  • Assigning ownership with typed ownership roles (DATA_OWNER, TECHNICAL_OWNER)
  • Linking entities to glossary terms for semantic governance
  • Placing entities within organizational domains
  • Attaching arbitrary key-value custom properties

Theoretical Basis

Mixin/Trait pattern -- Metadata capabilities are composed via interfaces rather than inheritance. This avoids the diamond inheritance problem and allows any entity type to selectively implement the metadata facets it supports. The self-bounded generic pattern <T extends Entity & HasTags<T>> ensures type-safe fluent method chaining.

Accumulate-then-Flush pattern -- Mutations are batched for efficient emission. Rather than making an HTTP call for each addTag() invocation, operations are accumulated in patch builders and emitted as a single batch during upsert(). This reduces network overhead and provides transactional semantics within a single entity's mutations.

Patch Semantics -- Incremental operations (addTag, addOwner) use JSON Patch-based MCPs with PATCH change type, while replacement operations (setTags, setOwners) use full aspect MCPs with UPSERT change type.

Related

Implementation:Datahub_project_Datahub_Entity_Metadata_Mutations

Knowledge Sources

Domains

Data_Integration, Metadata_Management, Java_SDK

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment