Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Datahub project Datahub Entity Construction

From Leeroopedia


Field Value
Principle Name Entity Construction
Category Entity Management
Status Active
Last Updated 2026-02-10
Repository Datahub_project_Datahub

Overview

The process of building type-safe metadata entity objects using fluent builders with schema-aware validation. Entity construction uses the Builder pattern to create strongly-typed entity objects (Dataset, Dashboard, Chart, etc.) with pre-populated aspects. The builder enforces required fields (platform, name) and auto-generates URNs following DataHub's URN specification.

Description

Entity construction in the DataHub Java SDK V2 follows a layered design:

  1. Abstract base class (Entity): Provides the core infrastructure for all entities, including aspect caching (WriteTrackingAspectCache), patch accumulation, dirty tracking, read-only protection, and conversion to Metadata Change Proposals (MCPs).
  2. Concrete entity classes (Dataset, Chart, Dashboard, etc.): Extend Entity and implement trait interfaces (HasTags, HasOwners, HasGlossaryTerms, HasDomains). Each entity type defines its own Builder and default aspect list.
  3. Entity Builder: A static inner class that collects required and optional parameters, validates them, generates the correct URN, and returns a configured entity instance.

The construction process for a Dataset entity involves:

  • Mandatory fields: platform (e.g., "snowflake", "bigquery") and name (e.g., "my_database.my_schema.my_table")
  • Optional fields: env (defaults to "PROD"), platformInstance, description, displayName, schemaFields, customProperties
  • URN generation: Automatically constructs urn:li:dataset:(urn:li:dataPlatform:PLATFORM,NAME,ENV)
  • Aspect caching: Builder-provided properties (description, displayName, customProperties) are cached as dirty aspects for emission during upsert

Entities carry cached aspects for batch emission, following the Unit of Work pattern -- mutations are accumulated locally and flushed to the server in a single upsert operation.

Usage

When programmatically creating new metadata entities to register in DataHub. The builder ensures correctness by:

  • Validating required fields at build time
  • Generating well-formed URNs automatically
  • Pre-populating aspects from builder parameters
  • Providing type safety through generics

Theoretical Basis

The design combines several patterns:

  • Type-safe Builder pattern with mandatory field enforcement ensures entities are always in a valid state
  • Unit of Work pattern -- entities accumulate changes locally (cached aspects, pending patches, pending MCPs) and flush them all during upsert
  • Template Method pattern -- the abstract Entity class defines the skeleton of entity behavior while concrete subclasses fill in entity-specific details (entity type, default aspects, mutable copy creation)

Related

Implementation:Datahub_project_Datahub_Dataset_Builder

Knowledge Sources

Domains

Data_Integration, Metadata_Management, Java_SDK

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment