Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ucbepic Docetl ReduceOperation Execute

From Leeroopedia


Knowledge Sources
Domains NLP, Data_Aggregation
Last Updated 2026-02-08 01:40 GMT

Overview

Concrete operation for grouping and reducing document records using LLM-powered synthesis provided by DocETL's operations module.

Description

ReduceOperation groups input documents by one or more reduce keys, then synthesizes each group into a single output record using an LLM prompt. It supports multiple reduction strategies: batch reduce (all items in one call), incremental fold (process items in batches with a fold prompt), and parallel fold with merge. The operation also supports value sampling for large groups and gleaning for quality refinement.

Usage

Use ReduceOperation to merge chunk-level results back into per-document summaries. Set reduce_key to the document ID from SplitOperation. For very large groups, configure fold_prompt and fold_batch_size.

Code Reference

Source Location

  • Repository: docetl
  • File: docetl/operations/reduce.py
  • Lines: L42-1047

Signature

class ReduceOperation(BaseOperation):
    class schema(BaseOperation.schema):
        type: str = "reduce"
        reduce_key: str | list[str]
        output: dict[str, Any]
        prompt: str
        model: str | None = None
        fold_prompt: str | None = None
        fold_batch_size: int | None = None
        merge_prompt: str | None = None
        merge_batch_size: int | None = None
        pass_through: bool | None = None

    def execute(self, input_data: list[dict]) -> tuple[list[dict], float]:
        """Group and reduce documents. Returns (reduced_results, total_cost)."""

Import

from docetl.operations.reduce import ReduceOperation

I/O Contract

Inputs

Name Type Required Description
reduce_key str or list[str] Yes Field(s) to group by (typically document ID)
prompt str Yes Jinja2 template with Template:Inputs for group items
output.schema dict Yes Expected output fields and types
fold_prompt str No Template for incremental fold reduction
fold_batch_size int No Items per fold batch
input_data list[dict] Yes Per-chunk results from MapOperation

Outputs

Name Type Description
results list[dict] One merged result per group (per document)
cost float Total LLM API cost

Usage Examples

operations:
  - name: merge_chunks
    type: reduce
    reduce_key: split_docs_id
    prompt: |
      Combine the following chunk analyses into a single document summary:
      {% for item in inputs %}
      Chunk {{ item.split_docs_chunk_num }}:
      Findings: {{ item.key_findings }}
      {% endfor %}
    output:
      schema:
        combined_findings: "list[str]"
        document_summary: "string"
    model: gpt-4o

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment