Principle:Ucbepic Docetl Chunk Result Reduction

Knowledge Sources	DocETL Docs DocETL
Domains	NLP, Data_Aggregation
Last Updated	2026-02-08 01:40 GMT

Overview

An aggregation principle that merges per-chunk LLM results back into per-document summaries using group-by reduction with LLM-powered synthesis.

Description

Chunk Result Reduction reassembles chunk-level analysis results into coherent per-document outputs. After each chunk has been independently processed by MapOperation, the reduce operation groups chunks by their original document ID and synthesizes a unified result using an LLM prompt.

Strategies for handling large groups include:

Batch Reduce: Process all chunks in a single LLM call (for small groups)
Fold and Merge: Incrementally fold chunks into a running summary (for large groups)
Parallel Fold: Process fold batches in parallel with a final merge step

Usage

Apply this principle after chunk-level processing to produce per-document results. The reduce key should be the document ID generated by the split operation.

Theoretical Basis

Group-by reduction with LLM synthesis:

Grouping: Group chunk results by reduce_key (document ID)
Sorting: Order chunks within each group
Strategy Selection: Choose batch, fold, or parallel fold based on group size
LLM Synthesis: Use prompt template to merge chunk results into a unified output
Result Assembly: Produce one output record per document

Related Pages

Implemented By

Implementation:Ucbepic_Docetl_ReduceOperation_Execute

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment