Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Datahub project Datahub Meta Proto Custom Options

From Leeroopedia


Field Value
Implementation Name Meta_Proto_Custom_Options
Type Pattern Doc
Workflow Protobuf_Schema_Ingestion
Repository https://github.com/datahub-project/datahub
Implements Principle:Datahub_project_Datahub_Protobuf_Annotation
Last Updated 2026-02-09 17:00 GMT

Overview

Description

Meta Proto Custom Options defines the pattern for embedding DataHub governance metadata directly within protobuf schema files using custom protobuf options. By importing a shared meta.proto file, schema authors can annotate their messages and fields with ownership, tags, domains, deprecation, and primary key metadata. These annotations are extracted during ingestion by the DatasetVisitor and its sub-visitors and mapped to corresponding DataHub aspects.

The pattern leverages protobuf's native custom option extension mechanism, ensuring that annotations are type-safe, validated at compile time, and preserved through the compilation process into the binary descriptor set.

Usage

Schema authors import meta.proto at the top of their .proto files and then apply the defined custom options at the message level (using option statements) or at the field level (using field option syntax [(meta.option_name) = value]).

Code Reference

Source Location

Visitor extraction logic:

  • metadata-integration/java/datahub-protobuf/src/main/java/datahub/protobuf/visitors/dataset/DatasetVisitor.java, lines 1-180
  • metadata-integration/java/datahub-protobuf/src/main/java/datahub/protobuf/visitors/field/SchemaFieldVisitor.java, lines 1-27

Sub-visitor classes that extract specific annotation types:

  • datahub.protobuf.visitors.dataset.OwnershipVisitor -- extracts meta.ownership annotations
  • datahub.protobuf.visitors.dataset.TagAssociationVisitor -- extracts meta.tag annotations
  • datahub.protobuf.visitors.dataset.DomainVisitor -- extracts meta.domain annotations
  • datahub.protobuf.visitors.dataset.DeprecationVisitor -- extracts meta.deprecation annotations
  • datahub.protobuf.visitors.dataset.TermAssociationVisitor -- extracts glossary term annotations
  • datahub.protobuf.visitors.dataset.InstitutionalMemoryVisitor -- extracts documentation link annotations from comments

Signature

The pattern is applied at the proto schema level (not a Java API signature):

import "meta.proto";

message MyEvent {
  option (meta.ownership) = { type: "DATAOWNER", value: "urn:li:corpuser:jdoe" };
  option (meta.tag) = "EventSchema";
  option (meta.domain) = "urn:li:domain:engineering";
  option (meta.deprecation) = "Use MyEventV2 instead";

  string id = 1 [(meta.is_primary_key) = true];
  string name = 2;
}

Import

In proto schemas:

import "meta.proto";

In Java visitor code:

import datahub.protobuf.visitors.dataset.DatasetVisitor;
import datahub.protobuf.visitors.dataset.OwnershipVisitor;
import datahub.protobuf.visitors.dataset.TagAssociationVisitor;
import datahub.protobuf.visitors.dataset.DomainVisitor;
import datahub.protobuf.visitors.field.SchemaFieldVisitor;

I/O Contract

Direction Type Description
Input Annotated .proto files Proto schema files that import meta.proto and use custom option annotations on messages and fields.
Output DataHub aspects Annotations are mapped to DataHub aspect types during visitor traversal.

Available Annotations

Annotation Type Scope DataHub Aspect Description
meta.ownership.type OwnershipType enum Message/File Ownership The type of ownership (e.g., DATAOWNER, PRODUCER, DEVELOPER).
meta.ownership.value URN string Message/File Ownership The URN of the owner (e.g., urn:li:corpuser:jdoe or urn:li:corpGroup:data-team).
meta.tag String Message/Field GlobalTags A tag string to attach to the dataset or field (e.g., "PII", "Confidential").
meta.domain String Message/File Domains A domain URN to associate the dataset with (e.g., "urn:li:domain:engineering").
meta.deprecation String Message Deprecation A deprecation message indicating the schema is deprecated and what to use instead.
meta.is_primary_key bool Field SchemaField Marks a field as the primary key for the schema.

Usage Examples

Message-Level Ownership and Tags

syntax = "proto3";

import "meta.proto";

message UserEvent {
  option (meta.ownership) = { type: "DATAOWNER", value: "urn:li:corpGroup:user-platform" };
  option (meta.tag) = "PII";
  option (meta.tag) = "UserData";
  option (meta.domain) = "urn:li:domain:identity";

  string user_id = 1 [(meta.is_primary_key) = true];
  string email = 2;
  int64 created_at = 3;
}

This produces the following DataHub aspects:

  • Ownership: CorpGroup user-platform as DATAOWNER
  • GlobalTags: PII, UserData
  • Domains: urn:li:domain:identity
  • SchemaMetadata: Field user_id marked as primary key

Deprecation Annotation

syntax = "proto3";

import "meta.proto";

message LegacyOrderEvent {
  option (meta.deprecation) = "Deprecated since 2025-01. Use OrderEventV2 instead.";

  string order_id = 1;
  string product_id = 2;
}

Comment-Based Annotations (GitHub and Slack References)

In addition to custom options, the ingestion pipeline extracts governance metadata from proto file comments when --github_org or --slack_id are configured:

// @datahub-project/data-team
// #data-engineering
// See also: https://wiki.example.com/schemas/payment
message PaymentEvent {
  string payment_id = 1;
  int64 amount_cents = 2;
}

With --github_org=datahub-project and --slack_id=T1234, this produces:

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment