Principle:Ucbepic Docetl Peripheral Context Gathering
| Knowledge Sources | |
|---|---|
| Domains | NLP, Context_Management |
| Last Updated | 2026-02-08 01:40 GMT |
Overview
A context enrichment principle that adds surrounding chunk content and document hierarchy headers to each chunk, providing the LLM with broader document context.
Description
Peripheral Context Gathering addresses the information loss inherent in document splitting by enriching each chunk with context from neighboring chunks. This includes:
- Previous/Next Chunks: Configurable head, middle, and tail summaries of surrounding chunks
- Hierarchy Headers: Document section headers providing structural context
- Rendered Output: A formatted chunk with delimiters marking the main chunk boundaries
This enrichment helps the LLM understand each chunk in the context of the broader document structure.
Usage
Apply this principle after document splitting and before per-chunk LLM processing. It is especially valuable for documents with strong sequential dependencies (narratives, legal filings, meeting transcripts).
Theoretical Basis
Context gathering follows a sliding window with hierarchy:
- Grouping: Group chunks by document ID
- Ordering: Sort chunks within each document by sequence number
- Context Assembly: For each chunk, gather configurable amounts of previous and next chunks
- Header Injection: Prepend document section headers for structural context
- Rendering: Format with delimiters marking main chunk boundaries