Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Datahub project Datahub Metadata Change Proposal

From Leeroopedia


Metadata

Field Value
principle_name Metadata Change Proposal
description A standardized envelope for packaging metadata changes into atomic, type-safe proposals for emission to DataHub.
type principle
status active
last_updated 2026-02-10
version 1.0

Overview

The Metadata Change Proposal (MCP) is DataHub's fundamental unit of metadata mutation. It wraps an entity URN and an aspect instance into a standardized change proposal that can be validated, serialized, and emitted through any transport backend.

Description

The Metadata Change Proposal is the envelope that packages constructed metadata objects (URNs and aspects) into a form that the DataHub backend can process. The Python SDK provides MetadataChangeProposalWrapper as a high-level, type-safe wrapper around the lower-level MetadataChangeProposalClass.

An MCP contains the following key components:

  • entityUrn -- The URN identifying the target entity (e.g., a dataset, user, or tag)
  • entityType -- The type of the entity (automatically inferred from the URN if not provided)
  • aspect -- The typed aspect instance carrying the metadata to be applied
  • aspectName -- The name of the aspect (automatically derived from the aspect class if not provided)
  • changeType -- The type of change, typically UPSERT (create or update)
  • systemMetadata -- Optional system-level metadata (e.g., source pipeline information)
  • auditHeader -- Optional Kafka audit header for traceability

The wrapper provides automatic inference and validation:

  • entityType is inferred from the URN via guess_entity_type()
  • aspectName is derived from the aspect class via get_aspect_name()
  • Cross-validation ensures the manually provided entityType matches the URN-inferred type

The wrapper also provides conversion methods:

  • make_mcp() -- Serializes the wrapper into a MetadataChangeProposalClass with generic (JSON-serialized) aspects
  • validate() -- Checks that the MCP is well-formed and all fields are consistent
  • as_workunit() -- Wraps the MCP into a MetadataWorkUnit for use in ingestion pipelines

Usage

Use Metadata Change Proposal wrapping when packaging constructed metadata objects for emission to DataHub via any transport. This is the step between constructing metadata objects (URNs and aspects) and emitting them to the backend.

Typical workflow:

  1. Construct URNs using builder functions (e.g., make_dataset_urn)
  2. Create aspect instances (e.g., OwnershipClass, GlobalTagsClass)
  3. Wrap into an MCP using MetadataChangeProposalWrapper
  4. Emit via an emitter (DataHubRestEmitter or DatahubKafkaEmitter)

The construct_many class method enables efficient batch construction of multiple MCPs for the same entity.

Theoretical Basis

This principle follows the event sourcing pattern. Metadata changes are captured as discrete proposals (events) rather than direct mutations. Each MCP is an atomic, self-describing change that can be validated independently before emission.

Key design principles:

  • Atomicity -- Each MCP represents a single aspect change on a single entity
  • Self-description -- The MCP contains all information needed to apply the change (entity type, URN, aspect name, aspect value)
  • Idempotency -- UPSERT semantics mean the same MCP can be applied multiple times with the same result
  • Validation -- The wrapper ensures type consistency between URN, entity type, and aspect name before emission
  • Serialization transparency -- The wrapper handles JSON serialization of aspects into the generic wire format

Related

Implementation:Datahub_project_Datahub_MetadataChangeProposalWrapper_Init

Knowledge Sources

Domains

Data_Integration, Metadata_Management

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment