Implementation:Datahub project Datahub ProtobufDatasetVisitor
| Knowledge Sources | |
|---|---|
| Domains | Protobuf_Integration, Dataset_Metadata |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Description
DatasetVisitor is the top-level visitor in the protobuf-to-DataHub conversion pipeline. It implements ProtobufModelVisitor and orchestrates multiple sub-visitors to produce a complete set of MetadataChangeProposalWrapper events for a single dataset derived from a protobuf schema.
The visitGraph method generates up to seven MCPs covering the following aspects:
- datasetProperties -- Name, qualified name, description, and custom properties (optionally including a base64-encoded protoc output)
- institutionalMemory -- Links and references extracted from comments, deduplicated by URL
- globalTags -- Tags derived from protobuf extensions and annotations
- glossaryTerms -- Glossary term associations from protobuf extensions
- ownership -- Owners extracted from protobuf metadata
- domains -- Domain URNs associated with the dataset
- deprecation -- Deprecation status (emitted only if present, otherwise filtered out)
Each aspect type has a configurable list of sub-visitors provided through the builder pattern, with empty lists as defaults.
Usage
Instantiated and configured with appropriate sub-visitors during protobuf schema ingestion. The ProtobufGraph calls visitGraph to produce the metadata change proposals that are then emitted to DataHub.
Code Reference
Source Location
metadata-integration/java/datahub-protobuf/src/main/java/datahub/protobuf/visitors/dataset/DatasetVisitor.java
Signature
@Builder
@AllArgsConstructor
public class DatasetVisitor
implements ProtobufModelVisitor<MetadataChangeProposalWrapper<? extends RecordTemplate>> {
private final List<ProtobufModelVisitor<InstitutionalMemoryMetadata>>
institutionalMemoryMetadataVisitors;
private final List<ProtobufModelVisitor<DatasetProperties>> datasetPropertyVisitors;
private final List<ProtobufModelVisitor<TagAssociation>> tagAssociationVisitors;
private final List<ProtobufModelVisitor<GlossaryTermAssociation>> termAssociationVisitors;
private final List<ProtobufModelVisitor<Owner>> ownershipVisitors;
private final List<ProtobufModelVisitor<com.linkedin.common.urn.Urn>> domainVisitors;
private final String protocBase64;
private final ProtobufModelVisitor<String> descriptionVisitor;
private final ProtobufModelVisitor<Deprecation> deprecationVisitor;
private final boolean enableProtocCustomProperty;
@Override
public Stream<MetadataChangeProposalWrapper<? extends RecordTemplate>> visitGraph(
VisitContext context)
}
Import
import datahub.protobuf.visitors.dataset.DatasetVisitor;
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
institutionalMemoryMetadataVisitors |
List<ProtobufModelVisitor<InstitutionalMemoryMetadata>> |
Visitors that extract institutional memory entries |
datasetPropertyVisitors |
List<ProtobufModelVisitor<DatasetProperties>> |
Visitors that extract custom dataset properties |
tagAssociationVisitors |
List<ProtobufModelVisitor<TagAssociation>> |
Visitors that extract tag associations |
termAssociationVisitors |
List<ProtobufModelVisitor<GlossaryTermAssociation>> |
Visitors that extract glossary term associations |
ownershipVisitors |
List<ProtobufModelVisitor<Owner>> |
Visitors that extract ownership information |
domainVisitors |
List<ProtobufModelVisitor<Urn>> |
Visitors that extract domain URNs |
descriptionVisitor |
ProtobufModelVisitor<String> |
Visitor that generates a description (defaults to DescriptionVisitor)
|
deprecationVisitor |
ProtobufModelVisitor<Deprecation> |
Visitor that extracts deprecation info (defaults to DeprecationVisitor)
|
enableProtocCustomProperty |
boolean |
Whether to include the base64-encoded protoc output as a custom property |
protocBase64 |
String |
Base64-encoded protoc output (only used if enabled) |
Outputs
| Return Type | Description |
|---|---|
Stream<MetadataChangeProposalWrapper<? extends RecordTemplate>> |
A stream of up to 7 MCPs covering datasetProperties, institutionalMemory, globalTags, glossaryTerms, ownership, domains, and optionally deprecation |
Usage Examples
DatasetVisitor visitor = DatasetVisitor.builder()
.institutionalMemoryMetadataVisitors(List.of(new InstitutionalMemoryVisitor(slackId, githubOrg)))
.tagAssociationVisitors(List.of(tagVisitor))
.termAssociationVisitors(List.of(termVisitor))
.ownershipVisitors(List.of(ownerVisitor))
.domainVisitors(List.of(domainVisitor))
.enableProtocCustomProperty(true)
.protocBase64(encodedProtoc)
.build();
Stream<MetadataChangeProposalWrapper<? extends RecordTemplate>> mcps =
visitor.visitGraph(visitContext);
Related Pages
- Datahub_project_Datahub_InstitutionalMemoryVisitor -- Sub-visitor for institutional memory
- Datahub_project_Datahub_TagVisitor -- Sub-visitor for tags
- Datahub_project_Datahub_ProtobufExtensionFieldVisitor -- Visitor for field-level schema metadata
- Datahub_project_Datahub_MetadataChangeProposalWrapper_Java -- The output wrapper type