Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Datahub project Datahub Entity Retrieval

From Leeroopedia


Property Value
Principle Name Entity_Retrieval
Workflow Java_SDK_V2_Entity_Management
Scope Fetching metadata entities by URN
Implementation Implementation:Datahub_project_Datahub_EntityClient_Get
Repository https://github.com/datahub-project/datahub
Last Updated 2026-02-09 17:00 GMT

Overview

Description

Entity Retrieval is the principle of fetching metadata entities and their aspects from the DataHub platform by URN (Uniform Resource Name). The retrieval operation provides type-safe deserialization of server responses into strongly-typed entity objects, with support for lazy-loading of aspects that were not included in the initial fetch. Retrieved entities are returned in a read-only state to prevent accidental mutations of server data.

The retrieval API supports two access patterns: fetching an entity with its default set of aspects, or fetching with a specific list of aspects. Both patterns use a single batched API call to the GMS REST endpoint, minimizing network round-trips.

Usage

Entity retrieval is used whenever metadata needs to be read from DataHub. Common scenarios include:

  • Inspecting entity metadata to display ownership, tags, descriptions, or schema
  • Read-modify-write workflows where an entity is fetched, made mutable, modified, and then upserted back
  • Validation checks that verify entity metadata matches expected state
  • Metadata migration that reads entities from one DataHub instance for replication

Theoretical Basis

Entity Lookup Pattern

The retrieval API follows the Repository pattern, providing a collection-like interface for accessing domain objects. The get() method acts as a finder that takes a URN (the unique identifier) and returns a fully-hydrated entity object. This abstracts away the HTTP transport, JSON deserialization, and Pegasus record template conversion that happen internally.

Lazy-Loading of Aspects

Retrieved entities support lazy-loading of aspects not included in the initial fetch. When getAspectLazy() is called for an aspect that is not in the cache, the entity transparently fetches the aspect from the server via its bound EntityClient. This pattern provides:

  1. Efficient initial load: Only requested aspects are fetched in the initial call
  2. On-demand access: Additional aspects are loaded when first accessed
  3. Cache-backed reads: Once loaded, aspects are cached with TTL-based expiration (default 60 seconds)

The lazy-loading mechanism respects the entity's dirty state: if the entity has pending mutations, lazy-loading is suppressed to prevent mixing stale server data with local modifications.

Type-Safe Deserialization

Server responses are deserialized into Pegasus RecordTemplate objects using DataHub's code-generated type system. This ensures:

  • Compile-time type safety: Aspect fields are accessed through typed getter methods, not string-based lookups
  • Schema compliance: Deserialized objects conform to the Avro/PDL schema definitions in metadata-models/
  • Null safety: The SDK uses @Nonnull and @Nullable annotations to indicate which fields may be absent

Read-Only by Default

Entities fetched from the server are marked read-only (readOnly = true). Mutation operations (addTag(), setDescription(), etc.) will throw ReadOnlyEntityException on read-only entities. To modify a fetched entity, the caller must explicitly call entity.mutable() to obtain a writable copy. This design:

  • Prevents accidental mutations of server-fetched data
  • Makes intent explicit when modifying existing entities
  • Separates read and write concerns in the entity lifecycle
Aspect Fetch Strategy Description Use Case
Default aspects Fetches the entity's predefined set of default aspects General-purpose entity inspection
Specified aspects Fetches only the explicitly listed aspect classes Targeted reads where specific aspects are needed
Single aspect Fetches a single aspect via getAspect(Urn, Class) Lightweight read of one facet with system metadata

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment