Implementation:Confident ai Deepeval Synthesizer Generate Goldens From Docs
| Sources | Domains | Last Updated |
|---|---|---|
| DeepEval | Synthetic_Data, LLM_Evaluation, Data_Management | 2026-02-14 09:00 GMT |
Overview
The generate_goldens_from_docs method on the Synthesizer class generates evaluation goldens directly from source document files, automating the full pipeline from document loading through golden generation.
Description
This method accepts a list of file paths pointing to source documents (PDF, TXT, DOCX, MD) and produces a list of Golden objects. Internally, it loads the documents, chunks them according to the context construction configuration, extracts context groups using embedding similarity, and generates evaluation queries and expected answers using the configured LLM. The method supports configuring the number of goldens per context, whether to include expected outputs, and custom context construction parameters.
Usage
Call this method on an instantiated Synthesizer to generate evaluation data from document files on disk.
Code Reference
Source Location: Repository: confident-ai/deepeval, File: deepeval/synthesizer/synthesizer.py (L158-357)
Signature:
def generate_goldens_from_docs(
self,
document_paths: List[str],
include_expected_output: bool = True,
max_goldens_per_context: int = 2,
context_construction_config: Optional[ContextConstructionConfig] = None,
) -> List[Golden]:
...
Import:
from deepeval.synthesizer import Synthesizer
I/O Contract
Inputs:
| Parameter | Type | Required | Description |
|---|---|---|---|
| document_paths | List[str] | Yes | File paths to source documents (supported formats: PDF, TXT, DOCX, MD) |
| include_expected_output | bool | No | Whether to generate expected answers for each golden (default: True) |
| max_goldens_per_context | int | No | Maximum number of goldens to generate per context group (default: 2) |
| context_construction_config | Optional[ContextConstructionConfig] | No | Configuration for chunking and context extraction (chunk size, overlap, embedder, etc.) |
Outputs:
- List[Golden] -- list of generated evaluation goldens, each containing input (query), expected_output (answer), context (source passages), and source_file (origin document path)
Usage Examples
from deepeval.synthesizer import Synthesizer
synthesizer = Synthesizer(model="gpt-4o")
goldens = synthesizer.generate_goldens_from_docs(
document_paths=["data/manual.pdf", "data/faq.txt"],
max_goldens_per_context=3,
)