Implementation:Datahub project Datahub Meta Proto Custom Options
| Field | Value |
|---|---|
| Implementation Name | Meta_Proto_Custom_Options |
| Type | Pattern Doc |
| Workflow | Protobuf_Schema_Ingestion |
| Repository | https://github.com/datahub-project/datahub |
| Implements | Principle:Datahub_project_Datahub_Protobuf_Annotation |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Description
Meta Proto Custom Options defines the pattern for embedding DataHub governance metadata directly within protobuf schema files using custom protobuf options. By importing a shared meta.proto file, schema authors can annotate their messages and fields with ownership, tags, domains, deprecation, and primary key metadata. These annotations are extracted during ingestion by the DatasetVisitor and its sub-visitors and mapped to corresponding DataHub aspects.
The pattern leverages protobuf's native custom option extension mechanism, ensuring that annotations are type-safe, validated at compile time, and preserved through the compilation process into the binary descriptor set.
Usage
Schema authors import meta.proto at the top of their .proto files and then apply the defined custom options at the message level (using option statements) or at the field level (using field option syntax [(meta.option_name) = value]).
Code Reference
Source Location
Visitor extraction logic:
metadata-integration/java/datahub-protobuf/src/main/java/datahub/protobuf/visitors/dataset/DatasetVisitor.java, lines 1-180metadata-integration/java/datahub-protobuf/src/main/java/datahub/protobuf/visitors/field/SchemaFieldVisitor.java, lines 1-27
Sub-visitor classes that extract specific annotation types:
datahub.protobuf.visitors.dataset.OwnershipVisitor-- extractsmeta.ownershipannotationsdatahub.protobuf.visitors.dataset.TagAssociationVisitor-- extractsmeta.tagannotationsdatahub.protobuf.visitors.dataset.DomainVisitor-- extractsmeta.domainannotationsdatahub.protobuf.visitors.dataset.DeprecationVisitor-- extractsmeta.deprecationannotationsdatahub.protobuf.visitors.dataset.TermAssociationVisitor-- extracts glossary term annotationsdatahub.protobuf.visitors.dataset.InstitutionalMemoryVisitor-- extracts documentation link annotations from comments
Signature
The pattern is applied at the proto schema level (not a Java API signature):
import "meta.proto";
message MyEvent {
option (meta.ownership) = { type: "DATAOWNER", value: "urn:li:corpuser:jdoe" };
option (meta.tag) = "EventSchema";
option (meta.domain) = "urn:li:domain:engineering";
option (meta.deprecation) = "Use MyEventV2 instead";
string id = 1 [(meta.is_primary_key) = true];
string name = 2;
}
Import
In proto schemas:
import "meta.proto";
In Java visitor code:
import datahub.protobuf.visitors.dataset.DatasetVisitor;
import datahub.protobuf.visitors.dataset.OwnershipVisitor;
import datahub.protobuf.visitors.dataset.TagAssociationVisitor;
import datahub.protobuf.visitors.dataset.DomainVisitor;
import datahub.protobuf.visitors.field.SchemaFieldVisitor;
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input | Annotated .proto files |
Proto schema files that import meta.proto and use custom option annotations on messages and fields.
|
| Output | DataHub aspects | Annotations are mapped to DataHub aspect types during visitor traversal. |
Available Annotations
| Annotation | Type | Scope | DataHub Aspect | Description |
|---|---|---|---|---|
meta.ownership.type |
OwnershipType enum | Message/File | Ownership | The type of ownership (e.g., DATAOWNER, PRODUCER, DEVELOPER).
|
meta.ownership.value |
URN string | Message/File | Ownership | The URN of the owner (e.g., urn:li:corpuser:jdoe or urn:li:corpGroup:data-team).
|
meta.tag |
String | Message/Field | GlobalTags | A tag string to attach to the dataset or field (e.g., "PII", "Confidential").
|
meta.domain |
String | Message/File | Domains | A domain URN to associate the dataset with (e.g., "urn:li:domain:engineering").
|
meta.deprecation |
String | Message | Deprecation | A deprecation message indicating the schema is deprecated and what to use instead. |
meta.is_primary_key |
bool | Field | SchemaField | Marks a field as the primary key for the schema. |
Usage Examples
Message-Level Ownership and Tags
syntax = "proto3";
import "meta.proto";
message UserEvent {
option (meta.ownership) = { type: "DATAOWNER", value: "urn:li:corpGroup:user-platform" };
option (meta.tag) = "PII";
option (meta.tag) = "UserData";
option (meta.domain) = "urn:li:domain:identity";
string user_id = 1 [(meta.is_primary_key) = true];
string email = 2;
int64 created_at = 3;
}
This produces the following DataHub aspects:
- Ownership: CorpGroup
user-platformasDATAOWNER - GlobalTags:
PII,UserData - Domains:
urn:li:domain:identity - SchemaMetadata: Field
user_idmarked as primary key
Deprecation Annotation
syntax = "proto3";
import "meta.proto";
message LegacyOrderEvent {
option (meta.deprecation) = "Deprecated since 2025-01. Use OrderEventV2 instead.";
string order_id = 1;
string product_id = 2;
}
Comment-Based Annotations (GitHub and Slack References)
In addition to custom options, the ingestion pipeline extracts governance metadata from proto file comments when --github_org or --slack_id are configured:
// @datahub-project/data-team
// #data-engineering
// See also: https://wiki.example.com/schemas/payment
message PaymentEvent {
string payment_id = 1;
int64 amount_cents = 2;
}
With --github_org=datahub-project and --slack_id=T1234, this produces:
- Ownership: GitHub team URL
https://github.com/orgs/datahub-project/teams/data-team - InstitutionalMemory: Slack channel link and wiki URL as documentation links