Principle:Datahub project Datahub Patch Based Updates
| Property | Value |
|---|---|
| Principle Name | Patch_Based_Updates |
| Workflow | Java_SDK_V2_Entity_Management |
| Scope | Partial modification of metadata entities |
| Implementation | Implementation:Datahub_project_Datahub_Entity_Mutable_Patch |
| Repository | https://github.com/datahub-project/datahub |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Description
Patch-Based Updates is the principle of partially modifying metadata entities using JSON Patch operations rather than full aspect replacement. Instead of reading an entire aspect, modifying it locally, and writing the full aspect back, the SDK accumulates incremental patch operations (add tag, remove owner, set description) and emits them as compact Metadata Change Proposals with ChangeType.PATCH. This approach minimizes data transfer, reduces write conflicts, and enables concurrent modifications to different fields of the same aspect.
The SDK V2 provides a fluent API for patch operations through entity mutation methods (addTag(), addOwner(), setDescription(), etc.) that accumulate patch builders internally. When upsert() is called, all accumulated patches are built into MCPs, transformed for server version compatibility, and emitted with retry logic.
Usage
Patch-based updates are used in these scenarios:
- Incremental metadata enrichment where individual tags, owners, or terms are added to existing entities without overwriting existing metadata
- Concurrent metadata management where multiple clients or pipelines update different aspects of the same entity simultaneously
- Read-modify-write workflows where an entity is fetched from the server, made mutable via
.mutable(), modified with patch methods, and upserted back - Removing specific metadata (a single tag or owner) without affecting other entries in the same aspect
Theoretical Basis
Patch/Delta Pattern
The Patch/Delta pattern represents changes as a set of operations to apply to an existing state, rather than transmitting the new full state. This is analogous to:
- JSON Patch (RFC 6902): Defines operations (add, remove, replace, move, copy, test) on JSON documents
- Git diffs: Represent changes as additions and deletions relative to a base version
- Database UPDATE statements: Modify specific columns without rewriting entire rows
In the DataHub SDK V2, patches are represented as MetadataChangeProposal objects with ChangeType.PATCH, containing JSON Patch operations that the server applies to the current aspect state.
Accumulated Patch Builders
The SDK uses accumulated patch builders to combine multiple operations on the same aspect into a single patch MCP. For example, calling dataset.addCustomProperty("key1", "val1") followed by dataset.addCustomProperty("key2", "val2") accumulates both operations in a single DatasetPropertiesPatchBuilder, which produces one patch MCP containing both additions. This reduces the number of API calls and server-side operations.
Patch builders are registered per aspect name via registerPatchBuilder(aspectName, builder) and retrieved via getPatchBuilder(aspectName, builderClass). When upsert() is called, buildAccumulatedPatches() builds all registered builders into MCPs.
Version-Aware Transformation
Not all DataHub server versions support all patch types. The VersionAwarePatchTransformer handles backward compatibility by:
- Checking server version via the cached
ServerConfig - Transforming incompatible patches to full aspect replacements using a read-modify-write approach
- Passing compatible patches through unchanged
| Aspect | Patch Supported Since | Transformation for Older Servers |
|---|---|---|
globalTags |
All versions | None needed (always supported) |
ownership |
All versions | None needed (always supported) |
glossaryTerms |
All versions | None needed (always supported) |
editableDatasetProperties |
DataHub Core > v1.3.0 | Read-modify-write via Dataset.transformPatchToFullAspect()
|
editableContainerProperties |
DataHub Core > v1.3.0 | Read-modify-write via Container.transformPatchToFullAspect()
|
editableMLModelGroupProperties |
DataHub Core > v1.3.0 | Read-modify-write via MLModelGroup.transformPatchToFullAspect()
|
mlModelProperties |
Not yet supported | Always transformed with retry function |
Mutable Entity Pattern
Entities fetched from the server are read-only by default. To apply patches, the caller must explicitly create a mutable copy via entity.mutable(). The mutable copy shares the aspect cache with the original (for efficient reads) but has independent mutation tracking (pending patches, dirty flag). This design:
- Prevents accidental mutations of server data
- Makes the read-modify-write intent explicit
- Allows the original entity to remain usable as an immutable reference
Related Pages
- Implementation:Datahub_project_Datahub_Entity_Mutable_Patch
- Principle:Datahub_project_Datahub_Entity_Upsert -- Patches are emitted during the upsert operation
- Principle:Datahub_project_Datahub_Entity_Retrieval -- Fetching entities before patching
- Principle:Datahub_project_Datahub_Entity_Construction -- Entities built from scratch can also accumulate patches