Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ucbepic Docetl GatherOperation Execute

From Leeroopedia


Knowledge Sources
Domains NLP, Context_Management
Last Updated 2026-02-08 01:40 GMT

Overview

Concrete operation for enriching document chunks with peripheral context provided by DocETL's operations module.

Description

GatherOperation groups chunked documents by document ID, sorts them by order key, then renders each chunk with configurable peripheral context (previous/next chunks) and optional hierarchy headers. The output adds a _{content_key}_rendered field containing the contextually enriched chunk text.

Usage

Use GatherOperation after SplitOperation and before MapOperation in a chunking pipeline. Configure peripheral_chunks to control how much surrounding context each chunk receives.

Code Reference

Source Location

  • Repository: docetl
  • File: docetl/operations/gather.py
  • Lines: L8-328

Signature

class GatherOperation(BaseOperation):
    class schema(BaseOperation.schema):
        type: str = "gather"
        content_key: str
        doc_id_key: str
        order_key: str
        peripheral_chunks: dict[str, Any] | None = None
        doc_header_key: str | None = None
        main_chunk_start: str | None = None
        main_chunk_end: str | None = None

    def execute(self, input_data: list[dict]) -> tuple[list[dict], float]:
        """Add peripheral context to chunks. Returns (enriched_chunks, cost=0.0)."""

Import

from docetl.operations.gather import GatherOperation

I/O Contract

Inputs

Name Type Required Description
content_key str Yes Chunk content field name
doc_id_key str Yes Document ID field name
order_key str Yes Chunk order field name
peripheral_chunks dict No Config for previous/next chunk inclusion
doc_header_key str No Field containing document section headers
input_data list[dict] Yes Chunked documents from SplitOperation

Outputs

Name Type Description
results list[dict] Chunks with added {content_key}_rendered field
cost float Always 0.0 (no LLM calls)

Usage Examples

operations:
  - name: gather_context
    type: gather
    content_key: content_chunk
    doc_id_key: split_docs_id
    order_key: split_docs_chunk_num
    peripheral_chunks:
      previous:
        head:
          count: 1
      next:
        tail:
          count: 1
    doc_header_key: headers

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment