Implementation:Ucbepic Docetl Directive DocSummarization
| Knowledge Sources | |
|---|---|
| Domains | Pipeline_Optimization, LLM_Operations |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for adding a Map summarization operator at the beginning of the pipeline to shorten documents before downstream operations provided by the DocETL reasoning optimizer.
Description
The DocSummarizationDirective class adds a Map summarization operator at the very beginning of the pipeline to shorten the document before any downstream operations. This reduces the number of tokens processed in later steps, saving cost and improving efficiency. The summary is constructed to include all information required by any downstream operator that references the document key being summarized.
Usage
The MOAR agent applies this directive when documents are too long or detailed for the downstream pipeline. Target ops should include all operators that use the document key being summarized, and the summary model should be cheap.
Code Reference
Source Location
- Repository: Ucbepic_Docetl
- File: docetl/reasoning_optimizer/directives/doc_summarization.py
- Lines: 1-345
Signature
class DocSummarizationDirective(Directive):
name = "doc_summarization"
description = "Adds a Map summarization operator at the beginning of the pipeline to shorten documents before downstream operations."
def check_applicability(self, ...) -> Tuple[bool, str]: ...
def apply(self, ...) -> Tuple[List[Dict], List[Dict], str, dict]: ...
Import
from docetl.reasoning_optimizer.directives.doc_summarization import DocSummarizationDirective
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| op_config | Dict | Yes | Operation configuration to transform |
| pipeline_ops | List[Dict] | Yes | Full pipeline operations list |
| op_idx | int | Yes | Index of target operation |
| dataset_descriptions | Dict | Yes | Dataset schema descriptions |
Outputs
| Name | Type | Description |
|---|---|---|
| new_ops | List[Dict] | Transformed operation configs |
| new_steps | List[Dict] | Updated pipeline steps |
| explanation | str | Human-readable description of changes |
| metadata | dict | Additional metadata about the transformation |
Usage Examples
# Directives are typically invoked by the MOAR agent automatically
# Example of manual invocation:
from docetl.reasoning_optimizer.directives.doc_summarization import DocSummarizationDirective
directive = DocSummarizationDirective()
applicable, reason = directive.check_applicability(op_config, pipeline_ops, op_idx, dataset_descriptions)
if applicable:
new_ops, new_steps, explanation, metadata = directive.apply(op_config, pipeline_ops, op_idx, dataset_descriptions)