Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Confident ai Deepeval DocumentChunker

From Leeroopedia
Sources Domains Last Updated
DeepEval Synthetic_Data, LLM_Evaluation 2026-02-14 09:00 GMT

Overview

The DocumentChunker class is responsible for splitting documents into manageable text chunks for synthetic evaluation data generation in the DeepEval framework.

Description

DocumentChunker takes a DeepEvalBaseEmbeddingModel and uses it to process documents into chunks suitable for context extraction and golden generation. It serves as the foundational text segmentation component in the synthetic data generation pipeline, handling document loading and token-based splitting.

Usage

Used internally by the Synthesizer when generating goldens from documents. Can also be instantiated directly for custom chunking workflows.

Code Reference

Source Location: Repository: confident-ai/deepeval, File: deepeval/synthesizer/chunking/doc_chunker.py (L73-88)

Signature:

class DocumentChunker:
    def __init__(self, embedder: DeepEvalBaseEmbeddingModel):
        ...

Import:

from deepeval.synthesizer.chunking.doc_chunker import DocumentChunker

I/O Contract

Inputs:

Parameter Type Required Description
embedder DeepEvalBaseEmbeddingModel Yes Embedding model used for chunk processing and semantic-aware splitting

Outputs:

  • DocumentChunker instance -- configured chunker ready to split documents into text segments for downstream golden generation

Usage Examples

from deepeval.synthesizer.chunking.doc_chunker import DocumentChunker
from deepeval.models import DeepEvalBaseEmbeddingModel

# Initialize with an embedding model
chunker = DocumentChunker(embedder=my_embedding_model)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment