Implementation:Confident ai Deepeval ContextGenerator
| Sources | Domains | Last Updated |
|---|---|---|
| DeepEval | Synthetic_Data, LLM_Evaluation | 2026-02-14 09:00 GMT |
Overview
The ContextGenerator class generates diverse evaluation contexts from source documents by combining document chunking, embedding, and vector similarity search.
Description
ContextGenerator orchestrates the full pipeline from document loading through context extraction. It accepts document file paths and an embedding model, chunks the documents according to configurable size and overlap parameters, embeds the chunks into a vector store, and retrieves semantically related chunk groups as contexts for downstream golden generation. Configurable thresholds control both the quality filtering and similarity matching of generated contexts.
Usage
Used internally by the Synthesizer during generate_goldens_from_docs to produce context groups from raw documents. Can also be instantiated directly for custom context extraction workflows.
Code Reference
Source Location: Repository: confident-ai/deepeval, File: deepeval/synthesizer/chunking/context_generator.py (L102-147)
Signature:
class ContextGenerator:
def __init__(
self,
embedder: DeepEvalBaseEmbeddingModel,
document_paths: Optional[List[str]] = None,
encoding: Optional[str] = None,
model: Optional[Union[str, DeepEvalBaseLLM]] = None,
chunk_size: int = 1024,
chunk_overlap: int = 0,
max_retries: int = 3,
filter_threshold: float = 0.5,
similarity_threshold: float = 0.5,
):
...
Import:
from deepeval.synthesizer.chunking.context_generator import ContextGenerator
I/O Contract
Inputs:
| Parameter | Type | Required | Description |
|---|---|---|---|
| embedder | DeepEvalBaseEmbeddingModel | Yes | Embedding model for vectorizing document chunks |
| document_paths | Optional[List[str]] | No | File paths to source documents (PDF, TXT, DOCX, MD) |
| encoding | Optional[str] | No | Encoding scheme for tokenization |
| model | Optional[Union[str, DeepEvalBaseLLM]] | No | LLM model for context processing |
| chunk_size | int | No | Number of tokens per chunk (default: 1024) |
| chunk_overlap | int | No | Number of overlapping tokens between adjacent chunks (default: 0) |
| max_retries | int | No | Maximum retry attempts for generation failures (default: 3) |
| filter_threshold | float | No | Minimum quality threshold for filtering contexts (default: 0.5) |
| similarity_threshold | float | No | Minimum similarity score for grouping chunks into contexts (default: 0.5) |
Outputs:
- ContextGenerator instance -- configured generator that produces context groups (List[List[str]]) from loaded documents
Usage Examples
from deepeval.synthesizer.chunking.context_generator import ContextGenerator
context_gen = ContextGenerator(
embedder=my_embedding_model,
document_paths=["data/manual.pdf", "data/faq.txt"],
chunk_size=1024,
chunk_overlap=0,
filter_threshold=0.5,
similarity_threshold=0.5,
)