Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ucbepic Docetl Directive ChunkHeaderSummary

From Leeroopedia


Knowledge Sources
Domains Pipeline_Optimization, LLM_Operations
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for enhancing a Split-Gather pipeline with header extraction and chunk summarization provided by the DocETL reasoning optimizer.

Description

The ChunkHeaderSummaryDirective class transforms an existing Split -> Gather pipeline by inserting a Map operation between them that extracts headers and creates summaries from each chunk. The Gather operation is then modified to use summaries for middle chunks and headers for document structure. This directive enhances chunking pipelines with header extraction and chunk summarization capabilities for hierarchically structured documents.

Usage

The MOAR agent applies this directive when there is an existing chunking pipeline (Split -> Gather) processing documents with clear hierarchical structure (legal contracts, technical manuals, research papers), and it is evident that chunk-level analysis is not accurate because the chunk needs headers and summaries from other chunks to make sense.

Code Reference

Source Location

Signature

class ChunkHeaderSummaryDirective(Directive):
    name = "chunk_header_summary"
    description = "Transforms Split -> Gather into Split -> Map -> Gather with header extraction and chunk summarization."

    def check_applicability(self, ...) -> Tuple[bool, str]: ...
    def apply(self, ...) -> Tuple[List[Dict], List[Dict], str, dict]: ...

Import

from docetl.reasoning_optimizer.directives.chunk_header_summary import ChunkHeaderSummaryDirective

I/O Contract

Inputs

Name Type Required Description
op_config Dict Yes Operation configuration to transform
pipeline_ops List[Dict] Yes Full pipeline operations list
op_idx int Yes Index of target operation
dataset_descriptions Dict Yes Dataset schema descriptions

Outputs

Name Type Description
new_ops List[Dict] Transformed operation configs
new_steps List[Dict] Updated pipeline steps
explanation str Human-readable description of changes
metadata dict Additional metadata about the transformation

Usage Examples

# Directives are typically invoked by the MOAR agent automatically
# Example of manual invocation:
from docetl.reasoning_optimizer.directives.chunk_header_summary import ChunkHeaderSummaryDirective

directive = ChunkHeaderSummaryDirective()
applicable, reason = directive.check_applicability(op_config, pipeline_ops, op_idx, dataset_descriptions)
if applicable:
    new_ops, new_steps, explanation, metadata = directive.apply(op_config, pipeline_ops, op_idx, dataset_descriptions)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment