Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Datahub project Datahub Metadata Object Construction

From Leeroopedia


Metadata

Field Value
principle_name Metadata Object Construction
description The process of building URN identifiers and aspect instances that represent metadata entities and their properties.
type principle
status active
last_updated 2026-02-10
version 1.0

Overview

Metadata Object Construction is the process of building URN identifiers and aspect instances that represent metadata entities and their properties in DataHub. This involves using builder functions to create properly-formatted URNs (Uniform Resource Names) and typed aspect classes that carry metadata facets such as ownership, schema, tags, and lineage.

Description

Metadata object construction uses builder functions provided by the datahub.emitter.mce_builder module to create properly-formatted URNs and typed aspect classes. The construction process involves two fundamental building blocks:

URN Identifiers

URNs uniquely identify entities in DataHub. They follow the pattern urn:li:entityType:(params). The builder module provides convenience functions that:

  • Ensure correct URN formatting and encoding of special characters
  • Validate input parameters
  • Handle platform instance namespacing
  • Normalize entity names (e.g., optional lowercasing for datasets)

Key entity types with URN builders include:

Aspect Classes

Aspects are typed metadata facets attached to entities. Builder functions construct aspect instances from simpler inputs:

  • OwnershipClass -- Built from a list of owner URNs with ownership type and source classification
  • GlobalTagsClass -- Built from a list of tag name strings
  • GlossaryTerms -- Built from a list of glossary term URNs
  • UpstreamLineageClass -- Built from upstream dataset URNs with lineage type classification

Usage

Use Metadata Object Construction when programmatically defining metadata about data assets before emitting to DataHub. Common scenarios include:

  • Building dataset URNs to reference tables, views, or files across data platforms
  • Creating ownership aspects to assign data stewardship
  • Constructing tag aspects for classification and discovery
  • Building lineage relationships between upstream and downstream datasets

The builder functions ensure that URNs are correctly formatted and encoded, preventing common errors such as missing platform prefixes or improperly escaped special characters.

Theoretical Basis

This principle is based on a URN-based identity system where each entity has a globally unique identifier following the pattern urn:li:entityType:(params). Aspects are typed metadata facets attached to entities via their URNs.

Key design principles:

  • Composability -- URNs for complex entities (e.g., data jobs) compose simpler URNs (e.g., data flow URNs)
  • Idempotency -- The same inputs always produce the same URN, enabling deduplication
  • Platform abstraction -- The urn:li:dataPlatform: prefix abstracts across data systems (MySQL, Snowflake, BigQuery, etc.)
  • Environment scoping -- The FabricType parameter (PROD, DEV, etc.) scopes entities to environments
  • Encoding safety -- Special characters in entity names are URL-encoded via UrnEncoder to ensure URN validity

Related

Implementation:Datahub_project_Datahub_Mce_Builder_URN_Helpers

Knowledge Sources

Domains

Data_Integration, Metadata_Management

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment