Principle:Datahub project Datahub Entity Metadata Enrichment
| Field | Value |
|---|---|
| Principle Name | Entity Metadata Enrichment |
| Category | Metadata Management |
| Status | Active |
| Last Updated | 2026-02-10 |
| Repository | Datahub_project_Datahub |
Overview
The process of adding metadata facets (tags, owners, glossary terms, domains, custom properties) to entity objects via trait interfaces. Entity metadata enrichment uses mixin interfaces (HasTags, HasOwners, HasGlossaryTerms, HasDomains) to provide a uniform API for adding metadata to any entity type that supports those facets.
Description
Metadata enrichment in the DataHub Java SDK V2 is implemented through a Mixin/Trait pattern using Java default interface methods. Each metadata capability is defined as a separate interface:
- HasTags<T> -- Provides
addTag(),removeTag(),setTags(),getTags() - HasOwners<T> -- Provides
addOwner(),removeOwner(),setOwners(),getOwners() - HasGlossaryTerms<T> -- Provides
addTerm(),removeTerm(),setTerms(),getTerms() - HasDomains<T> -- Provides
setDomain(),removeDomain(),clearDomains(),getDomain(),getDomains()
Entity classes compose these interfaces to declare which metadata capabilities they support. For example, Dataset implements all four:
public class Dataset extends Entity
implements HasTags<Dataset>, HasGlossaryTerms<Dataset>,
HasOwners<Dataset>, HasDomains<Dataset>, ...
Mutations follow an Accumulate-then-Flush pattern:
- Accumulation: Each
addTag(),addOwner(), etc. call accumulates a patch operation in an internal AbstractMultiFieldPatchBuilder (e.g.,GlobalTagsPatchBuilder,OwnershipPatchBuilder) - Flushing: When
upsert()is called, all accumulated patches are built into Metadata Change Proposals (MCPs) and emitted to the server
Two operation modes affect which aspects are written:
- SDK mode (default): Writes to editable aspects (e.g.,
editableDatasetPropertiesfor descriptions) - INGESTION mode: Writes to system aspects (e.g.,
datasetPropertiesfor descriptions)
All mutation methods enforce read-only protection -- entities fetched from the server must be converted via .mutable() before they can be modified.
Usage
When adding tags, owners, glossary terms, domains, or custom properties to existing or new entities. This is the primary way to enrich metadata after entity construction:
- Tagging datasets with classification labels (e.g., "pii", "sensitive")
- Assigning ownership with typed ownership roles (DATA_OWNER, TECHNICAL_OWNER)
- Linking entities to glossary terms for semantic governance
- Placing entities within organizational domains
- Attaching arbitrary key-value custom properties
Theoretical Basis
Mixin/Trait pattern -- Metadata capabilities are composed via interfaces rather than inheritance. This avoids the diamond inheritance problem and allows any entity type to selectively implement the metadata facets it supports. The self-bounded generic pattern <T extends Entity & HasTags<T>> ensures type-safe fluent method chaining.
Accumulate-then-Flush pattern -- Mutations are batched for efficient emission. Rather than making an HTTP call for each addTag() invocation, operations are accumulated in patch builders and emitted as a single batch during upsert(). This reduces network overhead and provides transactional semantics within a single entity's mutations.
Patch Semantics -- Incremental operations (addTag, addOwner) use JSON Patch-based MCPs with PATCH change type, while replacement operations (setTags, setOwners) use full aspect MCPs with UPSERT change type.
Related
- Implemented by: Datahub_project_Datahub_Entity_Metadata_Mutations
Implementation:Datahub_project_Datahub_Entity_Metadata_Mutations
- Depends on: Datahub_project_Datahub_Entity_Construction
- Related Principle: Datahub_project_Datahub_Entity_Upsert
- Heuristic: Heuristic:Datahub_project_Datahub_Validation_Across_All_APIs