Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Confident ai Deepeval ContextGenerator

From Leeroopedia
Sources Domains Last Updated
DeepEval Synthetic_Data, LLM_Evaluation 2026-02-14 09:00 GMT

Overview

The ContextGenerator class generates diverse evaluation contexts from source documents by combining document chunking, embedding, and vector similarity search.

Description

ContextGenerator orchestrates the full pipeline from document loading through context extraction. It accepts document file paths and an embedding model, chunks the documents according to configurable size and overlap parameters, embeds the chunks into a vector store, and retrieves semantically related chunk groups as contexts for downstream golden generation. Configurable thresholds control both the quality filtering and similarity matching of generated contexts.

Usage

Used internally by the Synthesizer during generate_goldens_from_docs to produce context groups from raw documents. Can also be instantiated directly for custom context extraction workflows.

Code Reference

Source Location: Repository: confident-ai/deepeval, File: deepeval/synthesizer/chunking/context_generator.py (L102-147)

Signature:

class ContextGenerator:
    def __init__(
        self,
        embedder: DeepEvalBaseEmbeddingModel,
        document_paths: Optional[List[str]] = None,
        encoding: Optional[str] = None,
        model: Optional[Union[str, DeepEvalBaseLLM]] = None,
        chunk_size: int = 1024,
        chunk_overlap: int = 0,
        max_retries: int = 3,
        filter_threshold: float = 0.5,
        similarity_threshold: float = 0.5,
    ):
        ...

Import:

from deepeval.synthesizer.chunking.context_generator import ContextGenerator

I/O Contract

Inputs:

Parameter Type Required Description
embedder DeepEvalBaseEmbeddingModel Yes Embedding model for vectorizing document chunks
document_paths Optional[List[str]] No File paths to source documents (PDF, TXT, DOCX, MD)
encoding Optional[str] No Encoding scheme for tokenization
model Optional[Union[str, DeepEvalBaseLLM]] No LLM model for context processing
chunk_size int No Number of tokens per chunk (default: 1024)
chunk_overlap int No Number of overlapping tokens between adjacent chunks (default: 0)
max_retries int No Maximum retry attempts for generation failures (default: 3)
filter_threshold float No Minimum quality threshold for filtering contexts (default: 0.5)
similarity_threshold float No Minimum similarity score for grouping chunks into contexts (default: 0.5)

Outputs:

  • ContextGenerator instance -- configured generator that produces context groups (List[List[str]]) from loaded documents

Usage Examples

from deepeval.synthesizer.chunking.context_generator import ContextGenerator

context_gen = ContextGenerator(
    embedder=my_embedding_model,
    document_paths=["data/manual.pdf", "data/faq.txt"],
    chunk_size=1024,
    chunk_overlap=0,
    filter_threshold=0.5,
    similarity_threshold=0.5,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment