Principle:Datahub project Datahub Entity Metadata Enrichment

Field	Value
Principle Name	Entity Metadata Enrichment
Category	Metadata Management
Status	Active
Last Updated	2026-02-10
Repository	Datahub_project_Datahub

Overview

The process of adding metadata facets (tags, owners, glossary terms, domains, custom properties) to entity objects via trait interfaces. Entity metadata enrichment uses mixin interfaces (HasTags, HasOwners, HasGlossaryTerms, HasDomains) to provide a uniform API for adding metadata to any entity type that supports those facets.

Description

Metadata enrichment in the DataHub Java SDK V2 is implemented through a Mixin/Trait pattern using Java default interface methods. Each metadata capability is defined as a separate interface:

HasTags<T> -- Provides addTag(), removeTag(), setTags(), getTags()
HasOwners<T> -- Provides addOwner(), removeOwner(), setOwners(), getOwners()
HasGlossaryTerms<T> -- Provides addTerm(), removeTerm(), setTerms(), getTerms()
HasDomains<T> -- Provides setDomain(), removeDomain(), clearDomains(), getDomain(), getDomains()

Entity classes compose these interfaces to declare which metadata capabilities they support. For example, Dataset implements all four:

public class Dataset extends Entity
    implements HasTags<Dataset>, HasGlossaryTerms<Dataset>,
               HasOwners<Dataset>, HasDomains<Dataset>, ...

Mutations follow an Accumulate-then-Flush pattern:

Accumulation: Each addTag(), addOwner(), etc. call accumulates a patch operation in an internal AbstractMultiFieldPatchBuilder (e.g., GlobalTagsPatchBuilder, OwnershipPatchBuilder)
Flushing: When upsert() is called, all accumulated patches are built into Metadata Change Proposals (MCPs) and emitted to the server

Two operation modes affect which aspects are written:

SDK mode (default): Writes to editable aspects (e.g., editableDatasetProperties for descriptions)
INGESTION mode: Writes to system aspects (e.g., datasetProperties for descriptions)

All mutation methods enforce read-only protection -- entities fetched from the server must be converted via .mutable() before they can be modified.

Usage

When adding tags, owners, glossary terms, domains, or custom properties to existing or new entities. This is the primary way to enrich metadata after entity construction:

Tagging datasets with classification labels (e.g., "pii", "sensitive")
Assigning ownership with typed ownership roles (DATA_OWNER, TECHNICAL_OWNER)
Linking entities to glossary terms for semantic governance
Placing entities within organizational domains
Attaching arbitrary key-value custom properties

Theoretical Basis

Mixin/Trait pattern -- Metadata capabilities are composed via interfaces rather than inheritance. This avoids the diamond inheritance problem and allows any entity type to selectively implement the metadata facets it supports. The self-bounded generic pattern <T extends Entity & HasTags<T>> ensures type-safe fluent method chaining.

Accumulate-then-Flush pattern -- Mutations are batched for efficient emission. Rather than making an HTTP call for each addTag() invocation, operations are accumulated in patch builders and emitted as a single batch during upsert(). This reduces network overhead and provides transactional semantics within a single entity's mutations.

Patch Semantics -- Incremental operations (addTag, addOwner) use JSON Patch-based MCPs with PATCH change type, while replacement operations (setTags, setOwners) use full aspect MCPs with UPSERT change type.

Knowledge Sources

Domains

Data_Integration, Metadata_Management, Java_SDK

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment