Principle:Datahub project Datahub Entity Read Modify
| Field | Value |
|---|---|
| Principle Name | Entity Read and Modify |
| Category | Metadata Management |
| Status | Active |
| Last Updated | 2026-02-10 |
| Repository | Datahub_project_Datahub |
Overview
The pattern of fetching an existing entity from DataHub, creating a mutable copy, applying modifications, and persisting the changes. Read-modify-write enables updating existing entities by separating the concerns of reading (immutable), copying (mutable), modifying (tracked), and persisting (emission).
Description
The read-modify-write pattern in the DataHub Java SDK V2 follows a three-phase approach:
Phase 1: Read (Immutable)
The get() method on EntityClient fetches an entity from the DataHub server with its default aspects. The returned entity is read-only by design -- all mutation methods will throw ReadOnlyEntityException if called on a fetched entity.
This immutability prevents accidental mutations of server data and makes the developer's intent explicit. Read operations on the fetched entity (e.g., getDescription(), getTags()) work normally, reading from the pre-loaded aspect cache.
Phase 2: Copy (Mutable)
Calling .mutable() on a read-only entity creates a writable copy. The mutable copy:
- Shares the aspect cache with the original (both see the same pre-loaded data)
- Has independent mutation tracking (pending patches, pending MCPs, patch builders)
- Is marked as mutable (readOnly = false, dirty = false)
- Is idempotent -- calling
.mutable()on an already mutable entity returns the same instance
Entity subclasses (e.g., Dataset) override mutable() via a copy constructor that preserves the correct type while sharing the cache.
Phase 3: Modify and Persist
The mutable copy can be modified using the standard mutation methods (addTag(), setDescription(), etc.) and then persisted via upsert(). Only the modifications are emitted -- the shared aspect cache is not re-emitted.
Lazy Loading
Entities bound to a client support lazy loading via getAspectLazy(). If an aspect is not in the cache (either not fetched or expired based on TTL), it is transparently fetched from the server on first access. Lazy loading is disabled when the entity has pending mutations (dirty flag set) to prevent mixing stale server data with local changes.
Usage
When modifying metadata on entities that already exist in DataHub. Common scenarios include:
- Adding tags to an existing dataset
- Changing ownership of a dashboard
- Updating the description of a chart
- Adding glossary terms to a data flow
The pattern is also used in test suites to verify that metadata was correctly persisted, and in CI/CD pipelines that update entity metadata based on code changes.
Theoretical Basis
Read-Modify-Write pattern with copy-on-write semantics. The read-only initial state prevents accidental mutations, while the mutable copy enables tracked changes. This is a deliberate trade-off between safety and convenience:
- Safety: Fetched entities are immutable by default, preventing unintended side effects
- Explicitness: The developer must call
.mutable()to opt in to modifications - Efficiency: The shared cache avoids redundant data copying for large aspect payloads
The pattern also supports the Principle of Least Surprise -- a get() call returns an object that reflects the server state, and mutations require an explicit conversion step.
Related
- Implemented by: Datahub_project_Datahub_EntityClient_Get_Mutable
Implementation:Datahub_project_Datahub_EntityClient_Get_Mutable
- Depends on: Datahub_project_Datahub_Entity_Upsert
- Depends on: Datahub_project_Datahub_Entity_Construction
- Related Principle: Datahub_project_Datahub_Entity_Metadata_Enrichment