Implementation:Confident ai Deepeval DocumentChunker

Sources	Domains	Last Updated
DeepEval	Synthetic_Data, LLM_Evaluation	2026-02-14 09:00 GMT

Overview

The DocumentChunker class is responsible for splitting documents into manageable text chunks for synthetic evaluation data generation in the DeepEval framework.

Description

DocumentChunker takes a DeepEvalBaseEmbeddingModel and uses it to process documents into chunks suitable for context extraction and golden generation. It serves as the foundational text segmentation component in the synthetic data generation pipeline, handling document loading and token-based splitting.

Usage

Used internally by the Synthesizer when generating goldens from documents. Can also be instantiated directly for custom chunking workflows.

Code Reference

Source Location: Repository: confident-ai/deepeval, File: deepeval/synthesizer/chunking/doc_chunker.py (L73-88)

Signature:

class DocumentChunker:
    def __init__(self, embedder: DeepEvalBaseEmbeddingModel):
        ...

Import:

from deepeval.synthesizer.chunking.doc_chunker import DocumentChunker

I/O Contract

Inputs:

Parameter	Type	Required	Description
embedder	DeepEvalBaseEmbeddingModel	Yes	Embedding model used for chunk processing and semantic-aware splitting

Outputs:

DocumentChunker instance -- configured chunker ready to split documents into text segments for downstream golden generation

Usage Examples

from deepeval.synthesizer.chunking.doc_chunker import DocumentChunker
from deepeval.models import DeepEvalBaseEmbeddingModel

# Initialize with an embedding model
chunker = DocumentChunker(embedder=my_embedding_model)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment