Principle:Datahub project Datahub Metadata Object Construction
Metadata
| Field | Value |
|---|---|
| principle_name | Metadata Object Construction |
| description | The process of building URN identifiers and aspect instances that represent metadata entities and their properties. |
| type | principle |
| status | active |
| last_updated | 2026-02-10 |
| version | 1.0 |
Overview
Metadata Object Construction is the process of building URN identifiers and aspect instances that represent metadata entities and their properties in DataHub. This involves using builder functions to create properly-formatted URNs (Uniform Resource Names) and typed aspect classes that carry metadata facets such as ownership, schema, tags, and lineage.
Description
Metadata object construction uses builder functions provided by the datahub.emitter.mce_builder module to create properly-formatted URNs and typed aspect classes. The construction process involves two fundamental building blocks:
URN Identifiers
URNs uniquely identify entities in DataHub. They follow the pattern urn:li:entityType:(params). The builder module provides convenience functions that:
- Ensure correct URN formatting and encoding of special characters
- Validate input parameters
- Handle platform instance namespacing
- Normalize entity names (e.g., optional lowercasing for datasets)
Key entity types with URN builders include:
- Datasets --
urn:li:dataset:(urn:li:dataPlatform:platform,name,env) - Users --
urn:li:corpuser:username - Groups --
urn:li:corpGroup:groupname - Tags --
urn:li:tag:tagname - Charts --
urn:li:chart:(platform,chartId) - Dashboards --
urn:li:dashboard:(platform,dashboardId) - Data Flows --
urn:li:dataFlow:(orchestrator,flowId,cluster) - Data Jobs --
urn:li:dataJob:(dataFlowUrn,jobId)
Aspect Classes
Aspects are typed metadata facets attached to entities. Builder functions construct aspect instances from simpler inputs:
- OwnershipClass -- Built from a list of owner URNs with ownership type and source classification
- GlobalTagsClass -- Built from a list of tag name strings
- GlossaryTerms -- Built from a list of glossary term URNs
- UpstreamLineageClass -- Built from upstream dataset URNs with lineage type classification
Usage
Use Metadata Object Construction when programmatically defining metadata about data assets before emitting to DataHub. Common scenarios include:
- Building dataset URNs to reference tables, views, or files across data platforms
- Creating ownership aspects to assign data stewardship
- Constructing tag aspects for classification and discovery
- Building lineage relationships between upstream and downstream datasets
The builder functions ensure that URNs are correctly formatted and encoded, preventing common errors such as missing platform prefixes or improperly escaped special characters.
Theoretical Basis
This principle is based on a URN-based identity system where each entity has a globally unique identifier following the pattern urn:li:entityType:(params). Aspects are typed metadata facets attached to entities via their URNs.
Key design principles:
- Composability -- URNs for complex entities (e.g., data jobs) compose simpler URNs (e.g., data flow URNs)
- Idempotency -- The same inputs always produce the same URN, enabling deduplication
- Platform abstraction -- The
urn:li:dataPlatform:prefix abstracts across data systems (MySQL, Snowflake, BigQuery, etc.) - Environment scoping -- The
FabricTypeparameter (PROD, DEV, etc.) scopes entities to environments - Encoding safety -- Special characters in entity names are URL-encoded via
UrnEncoderto ensure URN validity
Related
- Implemented by: Datahub_project_Datahub_Mce_Builder_URN_Helpers
Implementation:Datahub_project_Datahub_Mce_Builder_URN_Helpers