Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datahub project Datahub ProtobufDatasetVisitor

From Leeroopedia


Knowledge Sources
Domains Protobuf_Integration, Dataset_Metadata
Last Updated 2026-02-10 00:00 GMT

Overview

Description

DatasetVisitor is the top-level visitor in the protobuf-to-DataHub conversion pipeline. It implements ProtobufModelVisitor and orchestrates multiple sub-visitors to produce a complete set of MetadataChangeProposalWrapper events for a single dataset derived from a protobuf schema.

The visitGraph method generates up to seven MCPs covering the following aspects:

  1. datasetProperties -- Name, qualified name, description, and custom properties (optionally including a base64-encoded protoc output)
  2. institutionalMemory -- Links and references extracted from comments, deduplicated by URL
  3. globalTags -- Tags derived from protobuf extensions and annotations
  4. glossaryTerms -- Glossary term associations from protobuf extensions
  5. ownership -- Owners extracted from protobuf metadata
  6. domains -- Domain URNs associated with the dataset
  7. deprecation -- Deprecation status (emitted only if present, otherwise filtered out)

Each aspect type has a configurable list of sub-visitors provided through the builder pattern, with empty lists as defaults.

Usage

Instantiated and configured with appropriate sub-visitors during protobuf schema ingestion. The ProtobufGraph calls visitGraph to produce the metadata change proposals that are then emitted to DataHub.

Code Reference

Source Location

metadata-integration/java/datahub-protobuf/src/main/java/datahub/protobuf/visitors/dataset/DatasetVisitor.java

Signature

@Builder
@AllArgsConstructor
public class DatasetVisitor
    implements ProtobufModelVisitor<MetadataChangeProposalWrapper<? extends RecordTemplate>> {

    private final List<ProtobufModelVisitor<InstitutionalMemoryMetadata>>
        institutionalMemoryMetadataVisitors;
    private final List<ProtobufModelVisitor<DatasetProperties>> datasetPropertyVisitors;
    private final List<ProtobufModelVisitor<TagAssociation>> tagAssociationVisitors;
    private final List<ProtobufModelVisitor<GlossaryTermAssociation>> termAssociationVisitors;
    private final List<ProtobufModelVisitor<Owner>> ownershipVisitors;
    private final List<ProtobufModelVisitor<com.linkedin.common.urn.Urn>> domainVisitors;
    private final String protocBase64;
    private final ProtobufModelVisitor<String> descriptionVisitor;
    private final ProtobufModelVisitor<Deprecation> deprecationVisitor;
    private final boolean enableProtocCustomProperty;

    @Override
    public Stream<MetadataChangeProposalWrapper<? extends RecordTemplate>> visitGraph(
        VisitContext context)
}

Import

import datahub.protobuf.visitors.dataset.DatasetVisitor;

I/O Contract

Inputs

Parameter Type Description
institutionalMemoryMetadataVisitors List<ProtobufModelVisitor<InstitutionalMemoryMetadata>> Visitors that extract institutional memory entries
datasetPropertyVisitors List<ProtobufModelVisitor<DatasetProperties>> Visitors that extract custom dataset properties
tagAssociationVisitors List<ProtobufModelVisitor<TagAssociation>> Visitors that extract tag associations
termAssociationVisitors List<ProtobufModelVisitor<GlossaryTermAssociation>> Visitors that extract glossary term associations
ownershipVisitors List<ProtobufModelVisitor<Owner>> Visitors that extract ownership information
domainVisitors List<ProtobufModelVisitor<Urn>> Visitors that extract domain URNs
descriptionVisitor ProtobufModelVisitor<String> Visitor that generates a description (defaults to DescriptionVisitor)
deprecationVisitor ProtobufModelVisitor<Deprecation> Visitor that extracts deprecation info (defaults to DeprecationVisitor)
enableProtocCustomProperty boolean Whether to include the base64-encoded protoc output as a custom property
protocBase64 String Base64-encoded protoc output (only used if enabled)

Outputs

Return Type Description
Stream<MetadataChangeProposalWrapper<? extends RecordTemplate>> A stream of up to 7 MCPs covering datasetProperties, institutionalMemory, globalTags, glossaryTerms, ownership, domains, and optionally deprecation

Usage Examples

DatasetVisitor visitor = DatasetVisitor.builder()
    .institutionalMemoryMetadataVisitors(List.of(new InstitutionalMemoryVisitor(slackId, githubOrg)))
    .tagAssociationVisitors(List.of(tagVisitor))
    .termAssociationVisitors(List.of(termVisitor))
    .ownershipVisitors(List.of(ownerVisitor))
    .domainVisitors(List.of(domainVisitor))
    .enableProtocCustomProperty(true)
    .protocBase64(encodedProtoc)
    .build();

Stream<MetadataChangeProposalWrapper<? extends RecordTemplate>> mcps =
    visitor.visitGraph(visitContext);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment