Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Datahub project Datahub Mce Builder URN Helpers

From Leeroopedia


Metadata

Field Value
implementation_name Mce Builder URN Helpers
description Builder functions for constructing URN identifiers and aspect instances for DataHub metadata entities.
type implementation
category API Doc
status active
last_updated 2026-02-10
version 1.0

Overview

The mce_builder module provides convenience functions for creating URN identifiers and metadata aspect instances. These helper functions ensure correct formatting, encoding, and validation of the metadata objects used throughout the DataHub Python SDK.

Source Reference

Field Value
File metadata-ingestion/src/datahub/emitter/mce_builder.py
Lines L126-574
Repository datahub-project/datahub

Import

from datahub.emitter.mce_builder import (
    make_dataset_urn,
    make_dataset_urn_with_platform_instance,
    make_user_urn,
    make_group_urn,
    make_tag_urn,
    make_term_urn,
    make_domain_urn,
    make_data_flow_urn,
    make_data_job_urn,
    make_dashboard_urn,
    make_chart_urn,
    make_schema_field_urn,
    make_container_urn,
    make_ownership_aspect_from_urn_list,
    make_global_tag_aspect_with_tag_list,
    make_glossary_terms_aspect_from_urn_list,
    make_lineage_mce,
)

URN Builder Functions

make_dataset_urn

def make_dataset_urn(platform: str, name: str, env: str = DEFAULT_ENV) -> str:

Creates a dataset URN. Delegates to make_dataset_urn_with_platform_instance with platform_instance=None. The env parameter defaults to FabricTypeClass.PROD. If the global DATASET_URN_TO_LOWER flag is set, the name is lowercased.

Parameter Type Default Description
platform str (required) Data platform identifier (e.g., "mysql", "snowflake", "bigquery")
name str (required) Fully qualified dataset name (e.g., "db.schema.table")
env str FabricTypeClass.PROD Environment/fabric type (e.g., "PROD", "DEV")

make_dataset_urn_with_platform_instance

def make_dataset_urn_with_platform_instance(
    platform: str, name: str, platform_instance: Optional[str], env: str = DEFAULT_ENV
) -> str:

Creates a dataset URN with optional platform instance for multi-instance deployments.

make_user_urn

def make_user_urn(username: str) -> str:

Creates a corp user URN. If the input already starts with urn:li:corpuser: or urn:li:corpGroup:, it is returned as-is. Special characters in the username are URL-encoded via UrnEncoder.

make_group_urn

def make_group_urn(groupname: str) -> str:

Creates a corp group URN. Passes through existing user or group URNs unchanged.

make_tag_urn

def make_tag_urn(tag: str) -> str:

Creates a tag URN. Returns existing tag URNs as-is.

make_term_urn

def make_term_urn(term: str) -> str:

Creates a glossary term URN. Passes through existing term URNs unchanged.

make_domain_urn

def make_domain_urn(domain: str) -> str:

Creates a domain URN. Passes through existing domain URNs unchanged.

make_data_flow_urn

def make_data_flow_urn(
    orchestrator: str,
    flow_id: str,
    cluster: str = DEFAULT_FLOW_CLUSTER,
    platform_instance: Optional[str] = None,
) -> str:

Creates a data flow URN for orchestration pipelines. The cluster parameter defaults to "prod".

make_data_job_urn

def make_data_job_urn(
    orchestrator: str,
    flow_id: str,
    job_id: str,
    cluster: str = DEFAULT_FLOW_CLUSTER,
    platform_instance: Optional[str] = None,
) -> str:

Creates a data job URN by composing a data flow URN with a job identifier.

make_dashboard_urn

def make_dashboard_urn(
    platform: str, name: str, platform_instance: Optional[str] = None
) -> str:

Creates a dashboard URN for BI tool dashboards.

make_chart_urn

def make_chart_urn(
    platform: str, name: str, platform_instance: Optional[str] = None
) -> str:

Creates a chart URN for BI tool charts.

make_schema_field_urn

def make_schema_field_urn(parent_urn: str, field_path: str) -> str:

Creates a schema field URN. URL-encodes reserved characters in the field path using UrnEncoder.

Aspect Builder Functions

make_ownership_aspect_from_urn_list

def make_ownership_aspect_from_urn_list(
    owner_urns: List[str],
    source_type: Optional[Union[str, OwnershipSourceTypeClass]],
    owner_type: Union[str, OwnershipTypeClass] = OwnershipTypeClass.DATAOWNER,
) -> OwnershipClass:

Builds an OwnershipClass aspect from a list of owner URNs. Each URN must start with urn:li:corpuser: or urn:li:corpGroup:.

make_global_tag_aspect_with_tag_list

def make_global_tag_aspect_with_tag_list(tags: List[str]) -> GlobalTagsClass:

Builds a GlobalTagsClass aspect from a list of tag name strings. Automatically converts each tag to a URN via make_tag_urn.

make_glossary_terms_aspect_from_urn_list

def make_glossary_terms_aspect_from_urn_list(term_urns: List[str]) -> GlossaryTerms:

Builds a GlossaryTerms aspect from a list of glossary term URNs. Validates that each URN starts with urn:li:glossaryTerm:. Automatically adds an audit stamp with the current timestamp.

make_lineage_mce

def make_lineage_mce(
    upstream_urns: List[str],
    downstream_urn: str,
    lineage_type: str = DatasetLineageTypeClass.TRANSFORMED,
) -> MetadataChangeEventClass:

Builds a complete MetadataChangeEventClass representing lineage between upstream and downstream datasets.

I/O Contract

Field Value
Input Entity identifiers (platform names, dataset names, usernames, tags) as strings
Output Formatted URN strings or typed aspect class instances
Validation Asserts that owner URNs match expected prefixes; validates ownership types; URL-encodes special characters
Exceptions AssertionError if owner URN format is invalid; ValueError for unrecognized ownership types

Usage Examples

from datahub.emitter.mce_builder import (
    make_dataset_urn,
    make_user_urn,
    make_tag_urn,
    make_ownership_aspect_from_urn_list,
    make_global_tag_aspect_with_tag_list,
)

# Create a dataset URN
dataset_urn = make_dataset_urn(
    platform="mysql",
    name="prod_db.users",
    env="PROD",
)
# Result: "urn:li:dataset:(urn:li:dataPlatform:mysql,prod_db.users,PROD)"

# Create user URNs
user_urn = make_user_urn("jdoe")
# Result: "urn:li:corpuser:jdoe"

# Create a tag URN
tag_urn = make_tag_urn("pii")
# Result: "urn:li:tag:pii"

# Build an ownership aspect
ownership = make_ownership_aspect_from_urn_list(
    owner_urns=[make_user_urn("jdoe"), make_user_urn("asmith")],
    source_type=None,
)

# Build a tags aspect
tags = make_global_tag_aspect_with_tag_list(["pii", "sensitive", "tier1"])

Related

Knowledge Sources

Domains

Data_Integration, Metadata_Management

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment