Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Datahub project Datahub Patch Based Updates

From Leeroopedia


Property Value
Principle Name Patch_Based_Updates
Workflow Java_SDK_V2_Entity_Management
Scope Partial modification of metadata entities
Implementation Implementation:Datahub_project_Datahub_Entity_Mutable_Patch
Repository https://github.com/datahub-project/datahub
Last Updated 2026-02-09 17:00 GMT

Overview

Description

Patch-Based Updates is the principle of partially modifying metadata entities using JSON Patch operations rather than full aspect replacement. Instead of reading an entire aspect, modifying it locally, and writing the full aspect back, the SDK accumulates incremental patch operations (add tag, remove owner, set description) and emits them as compact Metadata Change Proposals with ChangeType.PATCH. This approach minimizes data transfer, reduces write conflicts, and enables concurrent modifications to different fields of the same aspect.

The SDK V2 provides a fluent API for patch operations through entity mutation methods (addTag(), addOwner(), setDescription(), etc.) that accumulate patch builders internally. When upsert() is called, all accumulated patches are built into MCPs, transformed for server version compatibility, and emitted with retry logic.

Usage

Patch-based updates are used in these scenarios:

  • Incremental metadata enrichment where individual tags, owners, or terms are added to existing entities without overwriting existing metadata
  • Concurrent metadata management where multiple clients or pipelines update different aspects of the same entity simultaneously
  • Read-modify-write workflows where an entity is fetched from the server, made mutable via .mutable(), modified with patch methods, and upserted back
  • Removing specific metadata (a single tag or owner) without affecting other entries in the same aspect

Theoretical Basis

Patch/Delta Pattern

The Patch/Delta pattern represents changes as a set of operations to apply to an existing state, rather than transmitting the new full state. This is analogous to:

  • JSON Patch (RFC 6902): Defines operations (add, remove, replace, move, copy, test) on JSON documents
  • Git diffs: Represent changes as additions and deletions relative to a base version
  • Database UPDATE statements: Modify specific columns without rewriting entire rows

In the DataHub SDK V2, patches are represented as MetadataChangeProposal objects with ChangeType.PATCH, containing JSON Patch operations that the server applies to the current aspect state.

Accumulated Patch Builders

The SDK uses accumulated patch builders to combine multiple operations on the same aspect into a single patch MCP. For example, calling dataset.addCustomProperty("key1", "val1") followed by dataset.addCustomProperty("key2", "val2") accumulates both operations in a single DatasetPropertiesPatchBuilder, which produces one patch MCP containing both additions. This reduces the number of API calls and server-side operations.

Patch builders are registered per aspect name via registerPatchBuilder(aspectName, builder) and retrieved via getPatchBuilder(aspectName, builderClass). When upsert() is called, buildAccumulatedPatches() builds all registered builders into MCPs.

Version-Aware Transformation

Not all DataHub server versions support all patch types. The VersionAwarePatchTransformer handles backward compatibility by:

  1. Checking server version via the cached ServerConfig
  2. Transforming incompatible patches to full aspect replacements using a read-modify-write approach
  3. Passing compatible patches through unchanged
Aspect Patch Supported Since Transformation for Older Servers
globalTags All versions None needed (always supported)
ownership All versions None needed (always supported)
glossaryTerms All versions None needed (always supported)
editableDatasetProperties DataHub Core > v1.3.0 Read-modify-write via Dataset.transformPatchToFullAspect()
editableContainerProperties DataHub Core > v1.3.0 Read-modify-write via Container.transformPatchToFullAspect()
editableMLModelGroupProperties DataHub Core > v1.3.0 Read-modify-write via MLModelGroup.transformPatchToFullAspect()
mlModelProperties Not yet supported Always transformed with retry function

Mutable Entity Pattern

Entities fetched from the server are read-only by default. To apply patches, the caller must explicitly create a mutable copy via entity.mutable(). The mutable copy shares the aspect cache with the original (for efficient reads) but has independent mutation tracking (pending patches, dirty flag). This design:

  • Prevents accidental mutations of server data
  • Makes the read-modify-write intent explicit
  • Allows the original entity to remain usable as an immutable reference

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment