Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ucbepic Docetl BaseSchemas

From Leeroopedia


Knowledge Sources
Domains Data_Processing, Schema_Validation
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for foundational Pydantic data models that define the structure of DocETL pipelines provided by DocETL.

Description

The base_schemas module defines the core Pydantic models used to represent and validate DocETL pipeline configurations. It includes ToolFunction and Tool for LLM tool definitions, ParsingTool for custom data parsing functions, PipelineStep for individual processing steps with their operations, PipelineOutput for output configuration (type, path, intermediate directory), and PipelineSpec that composes steps and output into a complete pipeline specification. These models are used throughout the codebase to ensure pipeline configurations conform to expected structures.

Usage

Use these schemas when parsing, validating, or constructing DocETL pipeline configurations programmatically. They are the canonical type definitions for pipeline structure elements.

Code Reference

Source Location

Signature

class ToolFunction(BaseModel):
    name: str
    description: str
    parameters: dict[str, Any]

class Tool(BaseModel):
    code: str
    function: ToolFunction

class ParsingTool(BaseModel):
    name: str
    function_code: str

class PipelineStep(BaseModel):
    name: str
    operations: list[dict[str, Any] | str]
    input: str | None = None

class PipelineOutput(BaseModel):
    type: str
    path: str
    intermediate_dir: str | None = None

class PipelineSpec(BaseModel):
    steps: list[PipelineStep]
    output: PipelineOutput

Import

from docetl.base_schemas import (
    ToolFunction,
    Tool,
    ParsingTool,
    PipelineStep,
    PipelineOutput,
    PipelineSpec,
)

I/O Contract

Inputs

Name Type Required Description
name str Yes Name of the tool, parsing tool, or pipeline step
operations list[dict or str] Yes List of operation names or operation config dicts for a step
input str or None No Input dataset name or previous step name (None uses previous step output)
type str Yes Output type (e.g., "file")
path str Yes Output file path
intermediate_dir str or None No Directory for intermediate results
function_code str Yes Python code defining a parsing function (for ParsingTool)

Outputs

Name Type Description
validated_model BaseModel A validated Pydantic model instance representing the pipeline element

Usage Examples

from docetl.base_schemas import PipelineStep, PipelineOutput, PipelineSpec

# Define a pipeline step
step = PipelineStep(
    name="extract_step",
    input="raw_documents",
    operations=["extract_entities", "classify_entities"]
)

# Define pipeline output
output = PipelineOutput(
    type="file",
    path="/output/results.json",
    intermediate_dir="/output/intermediates"
)

# Compose into a full pipeline spec
spec = PipelineSpec(steps=[step], output=output)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment