Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ucbepic Docetl Directive DocSummarization

From Leeroopedia


Knowledge Sources
Domains Pipeline_Optimization, LLM_Operations
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for adding a Map summarization operator at the beginning of the pipeline to shorten documents before downstream operations provided by the DocETL reasoning optimizer.

Description

The DocSummarizationDirective class adds a Map summarization operator at the very beginning of the pipeline to shorten the document before any downstream operations. This reduces the number of tokens processed in later steps, saving cost and improving efficiency. The summary is constructed to include all information required by any downstream operator that references the document key being summarized.

Usage

The MOAR agent applies this directive when documents are too long or detailed for the downstream pipeline. Target ops should include all operators that use the document key being summarized, and the summary model should be cheap.

Code Reference

Source Location

Signature

class DocSummarizationDirective(Directive):
    name = "doc_summarization"
    description = "Adds a Map summarization operator at the beginning of the pipeline to shorten documents before downstream operations."

    def check_applicability(self, ...) -> Tuple[bool, str]: ...
    def apply(self, ...) -> Tuple[List[Dict], List[Dict], str, dict]: ...

Import

from docetl.reasoning_optimizer.directives.doc_summarization import DocSummarizationDirective

I/O Contract

Inputs

Name Type Required Description
op_config Dict Yes Operation configuration to transform
pipeline_ops List[Dict] Yes Full pipeline operations list
op_idx int Yes Index of target operation
dataset_descriptions Dict Yes Dataset schema descriptions

Outputs

Name Type Description
new_ops List[Dict] Transformed operation configs
new_steps List[Dict] Updated pipeline steps
explanation str Human-readable description of changes
metadata dict Additional metadata about the transformation

Usage Examples

# Directives are typically invoked by the MOAR agent automatically
# Example of manual invocation:
from docetl.reasoning_optimizer.directives.doc_summarization import DocSummarizationDirective

directive = DocSummarizationDirective()
applicable, reason = directive.check_applicability(op_config, pipeline_ops, op_idx, dataset_descriptions)
if applicable:
    new_ops, new_steps, explanation, metadata = directive.apply(op_config, pipeline_ops, op_idx, dataset_descriptions)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment