Principle:Ucbepic Docetl Chunk Result Reduction
| Knowledge Sources | |
|---|---|
| Domains | NLP, Data_Aggregation |
| Last Updated | 2026-02-08 01:40 GMT |
Overview
An aggregation principle that merges per-chunk LLM results back into per-document summaries using group-by reduction with LLM-powered synthesis.
Description
Chunk Result Reduction reassembles chunk-level analysis results into coherent per-document outputs. After each chunk has been independently processed by MapOperation, the reduce operation groups chunks by their original document ID and synthesizes a unified result using an LLM prompt.
Strategies for handling large groups include:
- Batch Reduce: Process all chunks in a single LLM call (for small groups)
- Fold and Merge: Incrementally fold chunks into a running summary (for large groups)
- Parallel Fold: Process fold batches in parallel with a final merge step
Usage
Apply this principle after chunk-level processing to produce per-document results. The reduce key should be the document ID generated by the split operation.
Theoretical Basis
Group-by reduction with LLM synthesis:
- Grouping: Group chunk results by reduce_key (document ID)
- Sorting: Order chunks within each group
- Strategy Selection: Choose batch, fold, or parallel fold based on group size
- LLM Synthesis: Use prompt template to merge chunk results into a unified output
- Result Assembly: Produce one output record per document