Principle:Datahub project Datahub Entity Construction
| Field | Value |
|---|---|
| Principle Name | Entity Construction |
| Category | Entity Management |
| Status | Active |
| Last Updated | 2026-02-10 |
| Repository | Datahub_project_Datahub |
Overview
The process of building type-safe metadata entity objects using fluent builders with schema-aware validation. Entity construction uses the Builder pattern to create strongly-typed entity objects (Dataset, Dashboard, Chart, etc.) with pre-populated aspects. The builder enforces required fields (platform, name) and auto-generates URNs following DataHub's URN specification.
Description
Entity construction in the DataHub Java SDK V2 follows a layered design:
- Abstract base class (Entity): Provides the core infrastructure for all entities, including aspect caching (WriteTrackingAspectCache), patch accumulation, dirty tracking, read-only protection, and conversion to Metadata Change Proposals (MCPs).
- Concrete entity classes (Dataset, Chart, Dashboard, etc.): Extend Entity and implement trait interfaces (HasTags, HasOwners, HasGlossaryTerms, HasDomains). Each entity type defines its own Builder and default aspect list.
- Entity Builder: A static inner class that collects required and optional parameters, validates them, generates the correct URN, and returns a configured entity instance.
The construction process for a Dataset entity involves:
- Mandatory fields:
platform(e.g., "snowflake", "bigquery") andname(e.g., "my_database.my_schema.my_table") - Optional fields:
env(defaults to "PROD"),platformInstance,description,displayName,schemaFields,customProperties - URN generation: Automatically constructs
urn:li:dataset:(urn:li:dataPlatform:PLATFORM,NAME,ENV) - Aspect caching: Builder-provided properties (description, displayName, customProperties) are cached as dirty aspects for emission during upsert
Entities carry cached aspects for batch emission, following the Unit of Work pattern -- mutations are accumulated locally and flushed to the server in a single upsert operation.
Usage
When programmatically creating new metadata entities to register in DataHub. The builder ensures correctness by:
- Validating required fields at build time
- Generating well-formed URNs automatically
- Pre-populating aspects from builder parameters
- Providing type safety through generics
Theoretical Basis
The design combines several patterns:
- Type-safe Builder pattern with mandatory field enforcement ensures entities are always in a valid state
- Unit of Work pattern -- entities accumulate changes locally (cached aspects, pending patches, pending MCPs) and flush them all during upsert
- Template Method pattern -- the abstract Entity class defines the skeleton of entity behavior while concrete subclasses fill in entity-specific details (entity type, default aspects, mutable copy creation)
Related
- Implemented by: Datahub_project_Datahub_Dataset_Builder
Implementation:Datahub_project_Datahub_Dataset_Builder
- Depends on: Datahub_project_Datahub_Java_Client_Initialization
- Related Principle: Datahub_project_Datahub_Entity_Metadata_Enrichment
- Related Principle: Datahub_project_Datahub_Entity_Upsert