Principle:Datahub project Datahub Entity Retrieval
| Property | Value |
|---|---|
| Principle Name | Entity_Retrieval |
| Workflow | Java_SDK_V2_Entity_Management |
| Scope | Fetching metadata entities by URN |
| Implementation | Implementation:Datahub_project_Datahub_EntityClient_Get |
| Repository | https://github.com/datahub-project/datahub |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Description
Entity Retrieval is the principle of fetching metadata entities and their aspects from the DataHub platform by URN (Uniform Resource Name). The retrieval operation provides type-safe deserialization of server responses into strongly-typed entity objects, with support for lazy-loading of aspects that were not included in the initial fetch. Retrieved entities are returned in a read-only state to prevent accidental mutations of server data.
The retrieval API supports two access patterns: fetching an entity with its default set of aspects, or fetching with a specific list of aspects. Both patterns use a single batched API call to the GMS REST endpoint, minimizing network round-trips.
Usage
Entity retrieval is used whenever metadata needs to be read from DataHub. Common scenarios include:
- Inspecting entity metadata to display ownership, tags, descriptions, or schema
- Read-modify-write workflows where an entity is fetched, made mutable, modified, and then upserted back
- Validation checks that verify entity metadata matches expected state
- Metadata migration that reads entities from one DataHub instance for replication
Theoretical Basis
Entity Lookup Pattern
The retrieval API follows the Repository pattern, providing a collection-like interface for accessing domain objects. The get() method acts as a finder that takes a URN (the unique identifier) and returns a fully-hydrated entity object. This abstracts away the HTTP transport, JSON deserialization, and Pegasus record template conversion that happen internally.
Lazy-Loading of Aspects
Retrieved entities support lazy-loading of aspects not included in the initial fetch. When getAspectLazy() is called for an aspect that is not in the cache, the entity transparently fetches the aspect from the server via its bound EntityClient. This pattern provides:
- Efficient initial load: Only requested aspects are fetched in the initial call
- On-demand access: Additional aspects are loaded when first accessed
- Cache-backed reads: Once loaded, aspects are cached with TTL-based expiration (default 60 seconds)
The lazy-loading mechanism respects the entity's dirty state: if the entity has pending mutations, lazy-loading is suppressed to prevent mixing stale server data with local modifications.
Type-Safe Deserialization
Server responses are deserialized into Pegasus RecordTemplate objects using DataHub's code-generated type system. This ensures:
- Compile-time type safety: Aspect fields are accessed through typed getter methods, not string-based lookups
- Schema compliance: Deserialized objects conform to the Avro/PDL schema definitions in
metadata-models/ - Null safety: The SDK uses
@Nonnulland@Nullableannotations to indicate which fields may be absent
Read-Only by Default
Entities fetched from the server are marked read-only (readOnly = true). Mutation operations (addTag(), setDescription(), etc.) will throw ReadOnlyEntityException on read-only entities. To modify a fetched entity, the caller must explicitly call entity.mutable() to obtain a writable copy. This design:
- Prevents accidental mutations of server-fetched data
- Makes intent explicit when modifying existing entities
- Separates read and write concerns in the entity lifecycle
| Aspect Fetch Strategy | Description | Use Case |
|---|---|---|
| Default aspects | Fetches the entity's predefined set of default aspects | General-purpose entity inspection |
| Specified aspects | Fetches only the explicitly listed aspect classes | Targeted reads where specific aspects are needed |
| Single aspect | Fetches a single aspect via getAspect(Urn, Class) |
Lightweight read of one facet with system metadata |
Related Pages
- Implementation:Datahub_project_Datahub_EntityClient_Get
- Principle:Datahub_project_Datahub_Patch_Based_Updates -- Modifying retrieved entities via patches
- Principle:Datahub_project_Datahub_Entity_Upsert -- Persisting modified entities back to the platform
- Principle:Datahub_project_Datahub_V2_Client_Setup -- Client initialization required for retrieval