Implementation:Datahub project Datahub Mce Builder URN Helpers
Metadata
| Field | Value |
|---|---|
| implementation_name | Mce Builder URN Helpers |
| description | Builder functions for constructing URN identifiers and aspect instances for DataHub metadata entities. |
| type | implementation |
| category | API Doc |
| status | active |
| last_updated | 2026-02-10 |
| version | 1.0 |
Overview
The mce_builder module provides convenience functions for creating URN identifiers and metadata aspect instances. These helper functions ensure correct formatting, encoding, and validation of the metadata objects used throughout the DataHub Python SDK.
Source Reference
| Field | Value |
|---|---|
| File | metadata-ingestion/src/datahub/emitter/mce_builder.py
|
| Lines | L126-574 |
| Repository | datahub-project/datahub |
Import
from datahub.emitter.mce_builder import (
make_dataset_urn,
make_dataset_urn_with_platform_instance,
make_user_urn,
make_group_urn,
make_tag_urn,
make_term_urn,
make_domain_urn,
make_data_flow_urn,
make_data_job_urn,
make_dashboard_urn,
make_chart_urn,
make_schema_field_urn,
make_container_urn,
make_ownership_aspect_from_urn_list,
make_global_tag_aspect_with_tag_list,
make_glossary_terms_aspect_from_urn_list,
make_lineage_mce,
)
URN Builder Functions
make_dataset_urn
def make_dataset_urn(platform: str, name: str, env: str = DEFAULT_ENV) -> str:
Creates a dataset URN. Delegates to make_dataset_urn_with_platform_instance with platform_instance=None. The env parameter defaults to FabricTypeClass.PROD. If the global DATASET_URN_TO_LOWER flag is set, the name is lowercased.
| Parameter | Type | Default | Description |
|---|---|---|---|
platform |
str |
(required) | Data platform identifier (e.g., "mysql", "snowflake", "bigquery")
|
name |
str |
(required) | Fully qualified dataset name (e.g., "db.schema.table")
|
env |
str |
FabricTypeClass.PROD |
Environment/fabric type (e.g., "PROD", "DEV")
|
make_dataset_urn_with_platform_instance
def make_dataset_urn_with_platform_instance(
platform: str, name: str, platform_instance: Optional[str], env: str = DEFAULT_ENV
) -> str:
Creates a dataset URN with optional platform instance for multi-instance deployments.
make_user_urn
def make_user_urn(username: str) -> str:
Creates a corp user URN. If the input already starts with urn:li:corpuser: or urn:li:corpGroup:, it is returned as-is. Special characters in the username are URL-encoded via UrnEncoder.
make_group_urn
def make_group_urn(groupname: str) -> str:
Creates a corp group URN. Passes through existing user or group URNs unchanged.
make_tag_urn
def make_tag_urn(tag: str) -> str:
Creates a tag URN. Returns existing tag URNs as-is.
make_term_urn
def make_term_urn(term: str) -> str:
Creates a glossary term URN. Passes through existing term URNs unchanged.
make_domain_urn
def make_domain_urn(domain: str) -> str:
Creates a domain URN. Passes through existing domain URNs unchanged.
make_data_flow_urn
def make_data_flow_urn(
orchestrator: str,
flow_id: str,
cluster: str = DEFAULT_FLOW_CLUSTER,
platform_instance: Optional[str] = None,
) -> str:
Creates a data flow URN for orchestration pipelines. The cluster parameter defaults to "prod".
make_data_job_urn
def make_data_job_urn(
orchestrator: str,
flow_id: str,
job_id: str,
cluster: str = DEFAULT_FLOW_CLUSTER,
platform_instance: Optional[str] = None,
) -> str:
Creates a data job URN by composing a data flow URN with a job identifier.
make_dashboard_urn
def make_dashboard_urn(
platform: str, name: str, platform_instance: Optional[str] = None
) -> str:
Creates a dashboard URN for BI tool dashboards.
make_chart_urn
def make_chart_urn(
platform: str, name: str, platform_instance: Optional[str] = None
) -> str:
Creates a chart URN for BI tool charts.
make_schema_field_urn
def make_schema_field_urn(parent_urn: str, field_path: str) -> str:
Creates a schema field URN. URL-encodes reserved characters in the field path using UrnEncoder.
Aspect Builder Functions
make_ownership_aspect_from_urn_list
def make_ownership_aspect_from_urn_list(
owner_urns: List[str],
source_type: Optional[Union[str, OwnershipSourceTypeClass]],
owner_type: Union[str, OwnershipTypeClass] = OwnershipTypeClass.DATAOWNER,
) -> OwnershipClass:
Builds an OwnershipClass aspect from a list of owner URNs. Each URN must start with urn:li:corpuser: or urn:li:corpGroup:.
make_global_tag_aspect_with_tag_list
def make_global_tag_aspect_with_tag_list(tags: List[str]) -> GlobalTagsClass:
Builds a GlobalTagsClass aspect from a list of tag name strings. Automatically converts each tag to a URN via make_tag_urn.
make_glossary_terms_aspect_from_urn_list
def make_glossary_terms_aspect_from_urn_list(term_urns: List[str]) -> GlossaryTerms:
Builds a GlossaryTerms aspect from a list of glossary term URNs. Validates that each URN starts with urn:li:glossaryTerm:. Automatically adds an audit stamp with the current timestamp.
make_lineage_mce
def make_lineage_mce(
upstream_urns: List[str],
downstream_urn: str,
lineage_type: str = DatasetLineageTypeClass.TRANSFORMED,
) -> MetadataChangeEventClass:
Builds a complete MetadataChangeEventClass representing lineage between upstream and downstream datasets.
I/O Contract
| Field | Value |
|---|---|
| Input | Entity identifiers (platform names, dataset names, usernames, tags) as strings |
| Output | Formatted URN strings or typed aspect class instances |
| Validation | Asserts that owner URNs match expected prefixes; validates ownership types; URL-encodes special characters |
| Exceptions | AssertionError if owner URN format is invalid; ValueError for unrecognized ownership types
|
Usage Examples
from datahub.emitter.mce_builder import (
make_dataset_urn,
make_user_urn,
make_tag_urn,
make_ownership_aspect_from_urn_list,
make_global_tag_aspect_with_tag_list,
)
# Create a dataset URN
dataset_urn = make_dataset_urn(
platform="mysql",
name="prod_db.users",
env="PROD",
)
# Result: "urn:li:dataset:(urn:li:dataPlatform:mysql,prod_db.users,PROD)"
# Create user URNs
user_urn = make_user_urn("jdoe")
# Result: "urn:li:corpuser:jdoe"
# Create a tag URN
tag_urn = make_tag_urn("pii")
# Result: "urn:li:tag:pii"
# Build an ownership aspect
ownership = make_ownership_aspect_from_urn_list(
owner_urns=[make_user_urn("jdoe"), make_user_urn("asmith")],
source_type=None,
)
# Build a tags aspect
tags = make_global_tag_aspect_with_tag_list(["pii", "sensitive", "tier1"])
Related
- Implements: Datahub_project_Datahub_Metadata_Object_Construction
- Used by: Datahub_project_Datahub_MetadataChangeProposalWrapper_Init
- Environment: Environment:Datahub_project_Datahub_Python_3_10_Ingestion_Environment