Implementation:Ucbepic Docetl ReduceOperation Execute

Knowledge Sources	DocETL
Domains	NLP, Data_Aggregation
Last Updated	2026-02-08 01:40 GMT

Overview

Concrete operation for grouping and reducing document records using LLM-powered synthesis provided by DocETL's operations module.

Description

ReduceOperation groups input documents by one or more reduce keys, then synthesizes each group into a single output record using an LLM prompt. It supports multiple reduction strategies: batch reduce (all items in one call), incremental fold (process items in batches with a fold prompt), and parallel fold with merge. The operation also supports value sampling for large groups and gleaning for quality refinement.

Usage

Use ReduceOperation to merge chunk-level results back into per-document summaries. Set reduce_key to the document ID from SplitOperation. For very large groups, configure fold_prompt and fold_batch_size.

Code Reference

Source Location

Repository: docetl
File: docetl/operations/reduce.py
Lines: L42-1047

Signature

class ReduceOperation(BaseOperation):
    class schema(BaseOperation.schema):
        type: str = "reduce"
        reduce_key: str | list[str]
        output: dict[str, Any]
        prompt: str
        model: str | None = None
        fold_prompt: str | None = None
        fold_batch_size: int | None = None
        merge_prompt: str | None = None
        merge_batch_size: int | None = None
        pass_through: bool | None = None

    def execute(self, input_data: list[dict]) -> tuple[list[dict], float]:
        """Group and reduce documents. Returns (reduced_results, total_cost)."""

Import

from docetl.operations.reduce import ReduceOperation

I/O Contract

Inputs

Name	Type	Required	Description
reduce_key	str or list[str]	Yes	Field(s) to group by (typically document ID)
prompt	str	Yes	Jinja2 template with Template:Inputs for group items
output.schema	dict	Yes	Expected output fields and types
fold_prompt	str	No	Template for incremental fold reduction
fold_batch_size	int	No	Items per fold batch
input_data	list[dict]	Yes	Per-chunk results from MapOperation

Outputs

Name	Type	Description
results	list[dict]	One merged result per group (per document)
cost	float	Total LLM API cost

Usage Examples

operations:
  - name: merge_chunks
    type: reduce
    reduce_key: split_docs_id
    prompt: |
      Combine the following chunk analyses into a single document summary:
      {% for item in inputs %}
      Chunk {{ item.split_docs_chunk_num }}:
      Findings: {{ item.key_findings }}
      {% endfor %}
    output:
      schema:
        combined_findings: "list[str]"
        document_summary: "string"
    model: gpt-4o

Related Pages

Implements Principle

Principle:Ucbepic_Docetl_Chunk_Result_Reduction

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment