Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Confident ai Deepeval Synthesizer Generate Goldens From Docs

From Leeroopedia
Sources Domains Last Updated
DeepEval Synthetic_Data, LLM_Evaluation, Data_Management 2026-02-14 09:00 GMT

Overview

The generate_goldens_from_docs method on the Synthesizer class generates evaluation goldens directly from source document files, automating the full pipeline from document loading through golden generation.

Description

This method accepts a list of file paths pointing to source documents (PDF, TXT, DOCX, MD) and produces a list of Golden objects. Internally, it loads the documents, chunks them according to the context construction configuration, extracts context groups using embedding similarity, and generates evaluation queries and expected answers using the configured LLM. The method supports configuring the number of goldens per context, whether to include expected outputs, and custom context construction parameters.

Usage

Call this method on an instantiated Synthesizer to generate evaluation data from document files on disk.

Code Reference

Source Location: Repository: confident-ai/deepeval, File: deepeval/synthesizer/synthesizer.py (L158-357)

Signature:

def generate_goldens_from_docs(
    self,
    document_paths: List[str],
    include_expected_output: bool = True,
    max_goldens_per_context: int = 2,
    context_construction_config: Optional[ContextConstructionConfig] = None,
) -> List[Golden]:
    ...

Import:

from deepeval.synthesizer import Synthesizer

I/O Contract

Inputs:

Parameter Type Required Description
document_paths List[str] Yes File paths to source documents (supported formats: PDF, TXT, DOCX, MD)
include_expected_output bool No Whether to generate expected answers for each golden (default: True)
max_goldens_per_context int No Maximum number of goldens to generate per context group (default: 2)
context_construction_config Optional[ContextConstructionConfig] No Configuration for chunking and context extraction (chunk size, overlap, embedder, etc.)

Outputs:

  • List[Golden] -- list of generated evaluation goldens, each containing input (query), expected_output (answer), context (source passages), and source_file (origin document path)

Usage Examples

from deepeval.synthesizer import Synthesizer

synthesizer = Synthesizer(model="gpt-4o")
goldens = synthesizer.generate_goldens_from_docs(
    document_paths=["data/manual.pdf", "data/faq.txt"],
    max_goldens_per_context=3,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment