Implementation:Datahub project Datahub ProtobufExtensionUtil
| Knowledge Sources | |
|---|---|
| Domains | Protobuf_Integration, Metadata_Extraction |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Description
ProtobufExtensionUtil is a utility class that extracts DataHub-specific metadata (tags, glossary terms, properties) from Protocol Buffer extension options. It defines the DataHubMetadataType enum representing the types of metadata that can be annotated on protobuf fields and messages: PROPERTY, TAG, TAG_LIST, TERM, OWNER, DOMAIN, and DEPRECATION.
Key capabilities include:
- Extension re-parsing -- Re-parses
FieldDescriptorProtoobjects with a populatedExtensionRegistryto resolve custom extensions. - Type filtering -- Filters option pairs by their annotated
DataHubMetadataTypeenum value. Fields without explicit annotation default toPROPERTY. - Tag extraction -- Extracts
TagPropertiesfrom options annotated asTAG,TAG_LIST, or standard protobufdeprecatedflags. Supports STRING, BOOLEAN, and ENUM java types. - Term extraction -- Extracts
GlossaryTermAssociationobjects from options annotated asTERM, supporting STRING and ENUM types. - Property extraction -- Extracts key-value property pairs from message-type extension options.
Usage
Called by visitor classes (e.g., TagVisitor, ProtobufExtensionFieldVisitor, DatasetVisitor) to extract metadata from protobuf field and message options for conversion into DataHub aspects.
Code Reference
Source Location
metadata-integration/java/datahub-protobuf/src/main/java/datahub/protobuf/visitors/ProtobufExtensionUtil.java
Signature
public class ProtobufExtensionUtil {
public static DescriptorProtos.FieldDescriptorProto extendProto(
DescriptorProtos.FieldDescriptorProto proto, ExtensionRegistry registry)
public static List<Pair<Descriptors.FieldDescriptor, Object>> filterByDataHubType(
List<Pair<Descriptors.FieldDescriptor, Object>> options,
ExtensionRegistry registry,
DataHubMetadataType filterType)
public static Stream<Map.Entry<String, String>> getProperties(
Descriptors.FieldDescriptor field, DescriptorProtos.DescriptorProto value)
public static Stream<TagProperties> extractTagPropertiesFromOptions(
List<Pair<Descriptors.FieldDescriptor, Object>> options, ExtensionRegistry registry)
public static Stream<GlossaryTermAssociation> extractTermAssociationsFromOptions(
List<Pair<Descriptors.FieldDescriptor, Object>> fieldOptions, ExtensionRegistry registry)
public enum DataHubMetadataType {
PROPERTY, TAG, TAG_LIST, TERM, OWNER, DOMAIN, DEPRECATION;
public static final String PROTOBUF_TYPE = "DataHubMetadataType";
}
}
Import
import datahub.protobuf.visitors.ProtobufExtensionUtil;
import datahub.protobuf.visitors.ProtobufExtensionUtil.DataHubMetadataType;
I/O Contract
Inputs
| Method | Parameter | Type | Description |
|---|---|---|---|
extendProto |
proto |
FieldDescriptorProto |
The field descriptor to re-parse with extensions |
extendProto |
registry |
ExtensionRegistry |
Registry containing known extensions |
filterByDataHubType |
options |
List<Pair<FieldDescriptor, Object>> |
Option pairs to filter |
filterByDataHubType |
filterType |
DataHubMetadataType |
The metadata type to filter by |
extractTagPropertiesFromOptions |
options |
List<Pair<FieldDescriptor, Object>> |
Options to extract tags from |
extractTermAssociationsFromOptions |
fieldOptions |
List<Pair<FieldDescriptor, Object>> |
Options to extract glossary terms from |
Outputs
| Method | Return Type | Description |
|---|---|---|
extendProto |
FieldDescriptorProto |
Re-parsed proto with resolved extensions |
filterByDataHubType |
List<Pair<FieldDescriptor, Object>> |
Filtered list matching the specified DataHub metadata type |
getProperties |
Stream<Map.Entry<String, String>> |
Key-value property pairs from unknown fields |
extractTagPropertiesFromOptions |
Stream<TagProperties> |
Tag properties derived from TAG, TAG_LIST, and deprecated options |
extractTermAssociationsFromOptions |
Stream<GlossaryTermAssociation> |
Glossary term associations derived from TERM options |
Usage Examples
// Extract tags from message options
List<Pair<Descriptors.FieldDescriptor, Object>> options =
ProtobufUtils.getMessageOptions(messageProto);
Stream<TagProperties> tags =
ProtobufExtensionUtil.extractTagPropertiesFromOptions(options, registry);
// Extract glossary terms from field options
List<Pair<Descriptors.FieldDescriptor, Object>> fieldOptions =
ProtobufUtils.getFieldOptions(fieldProto);
Stream<GlossaryTermAssociation> terms =
ProtobufExtensionUtil.extractTermAssociationsFromOptions(fieldOptions, registry);
// Filter options by DataHub metadata type
List<Pair<Descriptors.FieldDescriptor, Object>> tagOptions =
ProtobufExtensionUtil.filterByDataHubType(options, registry, DataHubMetadataType.TAG);
Related Pages
- Datahub_project_Datahub_ProtobufDescriptorUtils -- Provides the option extraction methods consumed by this utility
- Datahub_project_Datahub_TagVisitor -- Uses extractTagPropertiesFromOptions
- Datahub_project_Datahub_ProtobufExtensionFieldVisitor -- Uses both tag and term extraction methods
- Datahub_project_Datahub_ProtobufDatasetVisitor -- Aggregates results from visitors that use this utility