Implementation:Ucbepic Docetl BaseSchemas

Knowledge Sources	Ucbepic_Docetl
Domains	Data_Processing, Schema_Validation
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for foundational Pydantic data models that define the structure of DocETL pipelines provided by DocETL.

Description

The base_schemas module defines the core Pydantic models used to represent and validate DocETL pipeline configurations. It includes ToolFunction and Tool for LLM tool definitions, ParsingTool for custom data parsing functions, PipelineStep for individual processing steps with their operations, PipelineOutput for output configuration (type, path, intermediate directory), and PipelineSpec that composes steps and output into a complete pipeline specification. These models are used throughout the codebase to ensure pipeline configurations conform to expected structures.

Usage

Use these schemas when parsing, validating, or constructing DocETL pipeline configurations programmatically. They are the canonical type definitions for pipeline structure elements.

Code Reference

Source Location

Repository: Ucbepic_Docetl
File: docetl/base_schemas.py
Lines: 1-130

Signature

class ToolFunction(BaseModel):
    name: str
    description: str
    parameters: dict[str, Any]

class Tool(BaseModel):
    code: str
    function: ToolFunction

class ParsingTool(BaseModel):
    name: str
    function_code: str

class PipelineStep(BaseModel):
    name: str
    operations: list[dict[str, Any] | str]
    input: str | None = None

class PipelineOutput(BaseModel):
    type: str
    path: str
    intermediate_dir: str | None = None

class PipelineSpec(BaseModel):
    steps: list[PipelineStep]
    output: PipelineOutput

Import

from docetl.base_schemas import (
    ToolFunction,
    Tool,
    ParsingTool,
    PipelineStep,
    PipelineOutput,
    PipelineSpec,
)

I/O Contract

Inputs

Name	Type	Required	Description
name	str	Yes	Name of the tool, parsing tool, or pipeline step
operations	list[dict or str]	Yes	List of operation names or operation config dicts for a step
input	str or None	No	Input dataset name or previous step name (None uses previous step output)
type	str	Yes	Output type (e.g., "file")
path	str	Yes	Output file path
intermediate_dir	str or None	No	Directory for intermediate results
function_code	str	Yes	Python code defining a parsing function (for ParsingTool)

Outputs

Name	Type	Description
validated_model	BaseModel	A validated Pydantic model instance representing the pipeline element

Usage Examples

from docetl.base_schemas import PipelineStep, PipelineOutput, PipelineSpec

# Define a pipeline step
step = PipelineStep(
    name="extract_step",
    input="raw_documents",
    operations=["extract_entities", "classify_entities"]
)

# Define pipeline output
output = PipelineOutput(
    type="file",
    path="/output/results.json",
    intermediate_dir="/output/intermediates"
)

# Compose into a full pipeline spec
spec = PipelineSpec(steps=[step], output=output)

Related Pages

Environment:Ucbepic_Docetl_Python_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment