Implementation:Mlflow Mlflow Proto Plugin
| Knowledge Sources | |
|---|---|
| Domains | CodeGeneration, Protobuf |
| Last Updated | 2026-02-13 20:00 GMT |
Overview
Custom protoc plugin that generates a JSON documentation file (protos.json) from protobuf service, message, enum, and RPC definitions by implementing the protoc plugin protocol.
Description
This script implements the protoc plugin protocol: it reads a CodeGeneratorRequest from stdin, processes each proto file descriptor, and writes a CodeGeneratorResponse containing the generated protos.json file to stdout.
The code defines a rich set of dataclasses representing the protobuf schema:
- ProtoMessageField -- A message field with description, type, visibility, deprecation status, and oneof support.
- ProtoMessage -- A message containing fields, nested enums, and nested messages.
- ProtoEnumValue / ProtoEnum -- Enum values and their parent enum types.
- ProtoServiceMethod -- An RPC method with request/response paths and RPC options.
- ProtoService -- A service containing methods.
- DatabricksRpcOptionsDescription -- Extracts Databricks-specific RPC extensions including endpoint path, HTTP method, visibility, API version (since_major/since_minor), error codes, and documentation title.
- ProtoFile / ProtoAllContent -- Top-level containers for the output structure.
The ProtobufDocGenerator class handles the core processing logic, with methods for extracting field types, visibility, documentation from source code comments, and RPC options from custom Databricks extensions. It uses descriptor_pb2 for protobuf introspection and databricks_pb2 for custom MLflow extensions.
The ProtocPlugin class orchestrates the request processing, iterating through requested files and producing the JSON output.
Usage
This script is invoked by protoc as a plugin during the proto generation pipeline (via dev/generate_protos.py). It is not typically run directly.
Code Reference
Source Location
- Repository: Mlflow_Mlflow
- File: dev/proto_plugin.py
- Lines: 1-558
Signature
class Visibility(Enum):
PUBLIC = "public"
INTERNAL = "internal"
PUBLIC_UNDOCUMENTED = "public_undocumented"
PUBLIC_UNDOCUMENTED_READ_ONLY = "public_undocumented_read_only"
class ProtobufDocGenerator:
def get_field_type_name(self, field: descriptor_pb2.FieldDescriptorProto) -> str: ...
def get_visibility(self, options) -> Visibility: ...
def extract_rpc_options(self, options) -> DatabricksRpcOptionsDescription | None: ...
def process_field(self, field, parent_path, field_index, source_info, message_path) -> ProtoMessageField: ...
def process_enum(self, enum, parent_path, enum_index, source_info, parent_path_numbers, is_nested=False) -> ProtoEnum: ...
def process_message(self, msg, parent_path, msg_index, source_info, parent_path_numbers=None, is_nested=False) -> ProtoMessage: ...
def process_service(self, service, parent_path, service_index, source_info) -> ProtoService: ...
def process_file(self, file, requested_vis) -> ProtoFile: ...
class ProtocPlugin:
def process_request(self, request: plugin_pb2.CodeGeneratorRequest) -> plugin_pb2.CodeGeneratorResponse: ...
def main(): ...
Import
# This script is invoked by protoc as a plugin, not run directly.
# It is called via dev/proto-plugin.sh during proto generation:
python dev/generate_protos.py
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| stdin | binary | Yes | Serialized protobuf CodeGeneratorRequest containing file descriptors from protoc |
Outputs
| Name | Type | Description |
|---|---|---|
| stdout | binary | Serialized protobuf CodeGeneratorResponse containing the generated protos.json file |
| protos.json | JSON (embedded) | Complete documentation of all services, messages, enums, methods, and RPC options |
Usage Examples
Basic Usage
# Typically invoked via the generate_protos.py pipeline:
python dev/generate_protos.py
# The plugin is called by protoc internally:
# protoc --plugin=protoc-gen-doc=dev/proto-plugin.sh --doc_out=mlflow/protos ...