Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Datahub project Datahub Entity Read Modify

From Leeroopedia
Revision as of 18:08, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Datahub_project_Datahub_Entity_Read_Modify.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Field Value
Principle Name Entity Read and Modify
Category Metadata Management
Status Active
Last Updated 2026-02-10
Repository Datahub_project_Datahub

Overview

The pattern of fetching an existing entity from DataHub, creating a mutable copy, applying modifications, and persisting the changes. Read-modify-write enables updating existing entities by separating the concerns of reading (immutable), copying (mutable), modifying (tracked), and persisting (emission).

Description

The read-modify-write pattern in the DataHub Java SDK V2 follows a three-phase approach:

Phase 1: Read (Immutable)

The get() method on EntityClient fetches an entity from the DataHub server with its default aspects. The returned entity is read-only by design -- all mutation methods will throw ReadOnlyEntityException if called on a fetched entity.

This immutability prevents accidental mutations of server data and makes the developer's intent explicit. Read operations on the fetched entity (e.g., getDescription(), getTags()) work normally, reading from the pre-loaded aspect cache.

Phase 2: Copy (Mutable)

Calling .mutable() on a read-only entity creates a writable copy. The mutable copy:

  • Shares the aspect cache with the original (both see the same pre-loaded data)
  • Has independent mutation tracking (pending patches, pending MCPs, patch builders)
  • Is marked as mutable (readOnly = false, dirty = false)
  • Is idempotent -- calling .mutable() on an already mutable entity returns the same instance

Entity subclasses (e.g., Dataset) override mutable() via a copy constructor that preserves the correct type while sharing the cache.

Phase 3: Modify and Persist

The mutable copy can be modified using the standard mutation methods (addTag(), setDescription(), etc.) and then persisted via upsert(). Only the modifications are emitted -- the shared aspect cache is not re-emitted.

Lazy Loading

Entities bound to a client support lazy loading via getAspectLazy(). If an aspect is not in the cache (either not fetched or expired based on TTL), it is transparently fetched from the server on first access. Lazy loading is disabled when the entity has pending mutations (dirty flag set) to prevent mixing stale server data with local changes.

Usage

When modifying metadata on entities that already exist in DataHub. Common scenarios include:

  • Adding tags to an existing dataset
  • Changing ownership of a dashboard
  • Updating the description of a chart
  • Adding glossary terms to a data flow

The pattern is also used in test suites to verify that metadata was correctly persisted, and in CI/CD pipelines that update entity metadata based on code changes.

Theoretical Basis

Read-Modify-Write pattern with copy-on-write semantics. The read-only initial state prevents accidental mutations, while the mutable copy enables tracked changes. This is a deliberate trade-off between safety and convenience:

  • Safety: Fetched entities are immutable by default, preventing unintended side effects
  • Explicitness: The developer must call .mutable() to opt in to modifications
  • Efficiency: The shared cache avoids redundant data copying for large aspect payloads

The pattern also supports the Principle of Least Surprise -- a get() call returns an object that reflects the server state, and mutations require an explicit conversion step.

Related

Implementation:Datahub_project_Datahub_EntityClient_Get_Mutable

Knowledge Sources

Domains

Data_Integration, Metadata_Management, Java_SDK

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment