Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Run llama Llama index DocumentContextExtractor

From Leeroopedia
Knowledge Sources
Domains Metadata Extraction, RAG, Contextual Retrieval
Last Updated 2026-02-11 19:00 GMT

Overview

An LLM-based context extractor that generates contextual metadata for document chunks by analyzing each chunk within the context of its parent document, implementing the Anthropic "Contextual Retrieval" approach to enhance RAG accuracy.

Description

DocumentContextExtractor extends BaseExtractor to enrich document nodes with contextual metadata. For each node, it retrieves the parent document from a docstore, sends both the full document and the individual chunk to an LLM, and stores the generated context as node metadata.

The extractor implements several key design decisions:

  • Prompt strategies: Two built-in prompt constants are provided as class variables:
    • ORIGINAL_CONTEXT_PROMPT -- Generates a short succinct context to situate a chunk within the overall document (from the Anthropic cookbook)
    • SUCCINCT_CONTEXT_PROMPT -- Generates keyword-laden descriptions for better search matching
  • Rate limit handling: Exponential backoff retry logic with 5 retries starting at 60-second base delay, with jitter
  • Oversized document strategies: Configurable handling via OversizeStrategy literal type ("warn", "error", or "ignore")
  • Token counting: Uses @lru_cache(maxsize=1000) for cached token counting to avoid redundant computation on repeated documents
  • Prompt caching: Sends the document text with cache_control: ephemeral headers to leverage Anthropic API prompt caching
  • Sorting optimization: Sorts nodes by source document ID before processing to maximize prompt cache hits, which can save significant API costs
  • Skip logic: Nodes that already have the metadata key set are skipped to avoid reprocessing

Usage

Use this extractor when you want to improve retrieval accuracy for RAG pipelines where individual chunks may lack standalone semantic meaning. It is particularly effective for documents where chunk boundaries split related content across nodes. Requires an LLM with async chat support (the achat method) and a configured docstore containing the parent documents.

Code Reference

Source Location

  • Repository: Run_llama_Llama_index
  • File: llama-index-core/llama_index/core/extractors/document_context.py
  • Lines: 1-351

Signature

class DocumentContextExtractor(BaseExtractor):
    # Pydantic fields
    llm: LLM
    docstore: BaseDocumentStore
    key: str
    prompt: str
    doc_ids: Set[str]
    max_context_length: int
    max_output_tokens: int
    oversized_document_strategy: OversizeStrategy
    num_workers: int = DEFAULT_NUM_WORKERS

    def __init__(
        self,
        docstore: BaseDocumentStore,
        llm: Optional[LLM] = None,
        max_context_length: int = 1000,
        key: str = DEFAULT_KEY,
        prompt: str = ORIGINAL_CONTEXT_PROMPT,
        num_workers: int = DEFAULT_NUM_WORKERS,
        max_output_tokens: int = 512,
        oversized_document_strategy: OversizeStrategy = "warn",
        **kwargs: Any,
    ) -> None

Import

from llama_index.core.extractors.document_context import DocumentContextExtractor

I/O Contract

Inputs

Name Type Required Description
docstore BaseDocumentStore Yes Storage for parent documents; used to retrieve full document text for each node
llm Optional[LLM] No Language model instance with achat method; defaults to Settings.llm
max_context_length int No Maximum allowed document context length in tokens (default: 1000)
key str No Metadata key for storing extracted context (default: "context")
prompt str No Prompt template for context generation (default: ORIGINAL_CONTEXT_PROMPT)
num_workers int No Number of parallel workers for async processing (default: DEFAULT_NUM_WORKERS)
max_output_tokens int No Maximum tokens in generated context (default: 512)
oversized_document_strategy OversizeStrategy No Strategy for handling documents exceeding max_context_length: "warn", "error", or "ignore" (default: "warn")

Outputs

Name Type Description
metadata_list List[Dict] List of metadata dictionaries, one per input node, containing the generated context under the configured key

Key Methods

aextract

async def aextract(self, nodes: Sequence[BaseNode]) -> List[Dict]

Main entry point. Sorts nodes by source document ID for prompt cache optimization, retrieves parent documents, and dispatches parallel context generation jobs via run_jobs.

_agenerate_node_context

async def _agenerate_node_context(
    self,
    node: Union[Node, TextNode],
    metadata: Dict,
    document: Union[Node, TextNode],
    prompt: str,
    key: str,
) -> Dict

Generates context for a single node by sending the parent document and chunk to the LLM. Implements exponential backoff retry (5 retries, 60s base delay) for rate limit handling. Uses Anthropic prompt caching headers.

_get_document

async def _get_document(self, doc_id: str) -> Optional[Union[Node, TextNode]]

Retrieves a document from the docstore by ID. Validates that the document is a text node and applies the oversized document strategy if the token count exceeds max_context_length.

_count_tokens

@staticmethod
@lru_cache(maxsize=1000)
def _count_tokens(text: str) -> int

Cached token counting using Settings.tokenizer. The LRU cache avoids redundant tokenization on repeated documents.

Helper Functions

is_text_node

def is_text_node(node: BaseNode) -> TypeGuard[Union[Node, TextNode]]

Module-level type guard function that checks whether a node is an instance of Node or TextNode. Used throughout the extractor for type narrowing.

Constants

Name Description
ORIGINAL_CONTEXT_PROMPT Prompts the LLM to generate a short succinct context situating a chunk within its document, designed for improving search retrieval
SUCCINCT_CONTEXT_PROMPT Generates keyword-laden phrases describing main topics, entities, and actions; replaces pronouns with specific referents for better search matching
OversizeStrategy Literal type alias for "warn", "error", or "ignore"

Usage Examples

Basic Usage

from llama_index.core.extractors.document_context import DocumentContextExtractor
from llama_index.core.storage.docstore import SimpleDocumentStore

docstore = SimpleDocumentStore()
# Add documents to docstore...

extractor = DocumentContextExtractor(
    docstore=docstore,
    llm=my_llm,
    max_context_length=64000,
    max_output_tokens=256,
)

# Extract context for nodes asynchronously
metadata_list = await extractor.aextract(nodes)

Using Succinct Prompt

extractor = DocumentContextExtractor(
    docstore=docstore,
    llm=my_llm,
    prompt=DocumentContextExtractor.SUCCINCT_CONTEXT_PROMPT,
    max_context_length=128000,
    oversized_document_strategy="ignore",
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment