Implementation:Confident ai Deepeval DocumentChunker
| Sources | Domains | Last Updated |
|---|---|---|
| DeepEval | Synthetic_Data, LLM_Evaluation | 2026-02-14 09:00 GMT |
Overview
The DocumentChunker class is responsible for splitting documents into manageable text chunks for synthetic evaluation data generation in the DeepEval framework.
Description
DocumentChunker takes a DeepEvalBaseEmbeddingModel and uses it to process documents into chunks suitable for context extraction and golden generation. It serves as the foundational text segmentation component in the synthetic data generation pipeline, handling document loading and token-based splitting.
Usage
Used internally by the Synthesizer when generating goldens from documents. Can also be instantiated directly for custom chunking workflows.
Code Reference
Source Location: Repository: confident-ai/deepeval, File: deepeval/synthesizer/chunking/doc_chunker.py (L73-88)
Signature:
class DocumentChunker:
def __init__(self, embedder: DeepEvalBaseEmbeddingModel):
...
Import:
from deepeval.synthesizer.chunking.doc_chunker import DocumentChunker
I/O Contract
Inputs:
| Parameter | Type | Required | Description |
|---|---|---|---|
| embedder | DeepEvalBaseEmbeddingModel | Yes | Embedding model used for chunk processing and semantic-aware splitting |
Outputs:
- DocumentChunker instance -- configured chunker ready to split documents into text segments for downstream golden generation
Usage Examples
from deepeval.synthesizer.chunking.doc_chunker import DocumentChunker
from deepeval.models import DeepEvalBaseEmbeddingModel
# Initialize with an embedding model
chunker = DocumentChunker(embedder=my_embedding_model)