Implementation:Confident ai Deepeval Synthesizer Generate Goldens From Docs

Sources	Domains	Last Updated
DeepEval	Synthetic_Data, LLM_Evaluation, Data_Management	2026-02-14 09:00 GMT

Overview

The generate_goldens_from_docs method on the Synthesizer class generates evaluation goldens directly from source document files, automating the full pipeline from document loading through golden generation.

Description

This method accepts a list of file paths pointing to source documents (PDF, TXT, DOCX, MD) and produces a list of Golden objects. Internally, it loads the documents, chunks them according to the context construction configuration, extracts context groups using embedding similarity, and generates evaluation queries and expected answers using the configured LLM. The method supports configuring the number of goldens per context, whether to include expected outputs, and custom context construction parameters.

Usage

Call this method on an instantiated Synthesizer to generate evaluation data from document files on disk.

Code Reference

Source Location: Repository: confident-ai/deepeval, File: deepeval/synthesizer/synthesizer.py (L158-357)

Signature:

def generate_goldens_from_docs(
    self,
    document_paths: List[str],
    include_expected_output: bool = True,
    max_goldens_per_context: int = 2,
    context_construction_config: Optional[ContextConstructionConfig] = None,
) -> List[Golden]:
    ...

Import:

from deepeval.synthesizer import Synthesizer

I/O Contract

Inputs:

Parameter	Type	Required	Description
document_paths	List[str]	Yes	File paths to source documents (supported formats: PDF, TXT, DOCX, MD)
include_expected_output	bool	No	Whether to generate expected answers for each golden (default: True)
max_goldens_per_context	int	No	Maximum number of goldens to generate per context group (default: 2)
context_construction_config	Optional[ContextConstructionConfig]	No	Configuration for chunking and context extraction (chunk size, overlap, embedder, etc.)

Outputs:

List[Golden] -- list of generated evaluation goldens, each containing input (query), expected_output (answer), context (source passages), and source_file (origin document path)

Usage Examples

from deepeval.synthesizer import Synthesizer

synthesizer = Synthesizer(model="gpt-4o")
goldens = synthesizer.generate_goldens_from_docs(
    document_paths=["data/manual.pdf", "data/faq.txt"],
    max_goldens_per_context=3,
)

Related Pages

Principle:Confident_ai_Deepeval_Golden_Generation_from_Documents

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment