Implementation:Ucbepic Docetl GatherOperation Execute
Appearance
| Knowledge Sources | |
|---|---|
| Domains | NLP, Context_Management |
| Last Updated | 2026-02-08 01:40 GMT |
Overview
Concrete operation for enriching document chunks with peripheral context provided by DocETL's operations module.
Description
GatherOperation groups chunked documents by document ID, sorts them by order key, then renders each chunk with configurable peripheral context (previous/next chunks) and optional hierarchy headers. The output adds a _{content_key}_rendered field containing the contextually enriched chunk text.
Usage
Use GatherOperation after SplitOperation and before MapOperation in a chunking pipeline. Configure peripheral_chunks to control how much surrounding context each chunk receives.
Code Reference
Source Location
- Repository: docetl
- File: docetl/operations/gather.py
- Lines: L8-328
Signature
class GatherOperation(BaseOperation):
class schema(BaseOperation.schema):
type: str = "gather"
content_key: str
doc_id_key: str
order_key: str
peripheral_chunks: dict[str, Any] | None = None
doc_header_key: str | None = None
main_chunk_start: str | None = None
main_chunk_end: str | None = None
def execute(self, input_data: list[dict]) -> tuple[list[dict], float]:
"""Add peripheral context to chunks. Returns (enriched_chunks, cost=0.0)."""
Import
from docetl.operations.gather import GatherOperation
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| content_key | str | Yes | Chunk content field name |
| doc_id_key | str | Yes | Document ID field name |
| order_key | str | Yes | Chunk order field name |
| peripheral_chunks | dict | No | Config for previous/next chunk inclusion |
| doc_header_key | str | No | Field containing document section headers |
| input_data | list[dict] | Yes | Chunked documents from SplitOperation |
Outputs
| Name | Type | Description |
|---|---|---|
| results | list[dict] | Chunks with added {content_key}_rendered field |
| cost | float | Always 0.0 (no LLM calls) |
Usage Examples
operations:
- name: gather_context
type: gather
content_key: content_chunk
doc_id_key: split_docs_id
order_key: split_docs_chunk_num
peripheral_chunks:
previous:
head:
count: 1
next:
tail:
count: 1
doc_header_key: headers
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment