Principle:Confident ai Deepeval Golden Generation from Documents
| Knowledge Sources | |
|---|---|
| Domains | |
| Last Updated | 2026-02-14 09:00 GMT |
Overview
Golden generation from documents is the end-to-end pipeline for automatically producing evaluation goldens directly from source document files. It chains document loading, chunking, context extraction, and LLM-based golden generation into a single operation.
Description
This principle captures the complete workflow for transforming raw documents into structured evaluation data. The pipeline proceeds through four stages:
- Document loading -- reading source files in various formats (PDF, TXT, DOCX, MD) into raw text.
- Chunking -- splitting loaded text into token-bounded segments using configurable chunk size and overlap parameters.
- Context extraction -- grouping related chunks into context sets using embedding similarity, producing the grounding material for question generation.
- Golden generation -- using an LLM to synthesize evaluation queries and expected answers conditioned on each context group.
The output is a list of Golden objects, each containing an input (evaluation query), expected_output (ground-truth answer), and context (source passages that ground the golden). This makes the generated data immediately usable for evaluating RAG systems, question-answering applications, and other LLM-based tools.
Usage
Golden generation from documents is the primary entry point for teams that have existing documentation, knowledge bases, or manuals and want to generate evaluation datasets without manual authoring. It is particularly suited for:
- Bootstrapping evaluation suites from product documentation
- Generating regression tests from updated technical manuals
- Creating evaluation data from FAQ collections or support articles
Theoretical Basis
This pipeline combines techniques from information retrieval and natural language generation:
- Document-grounded question generation -- generating questions that are answerable from and grounded in specific document passages, ensuring factual accuracy and relevance.
- Context-aware answer synthesis -- producing expected answers by conditioning the LLM on the extracted context, rather than relying on the model's parametric knowledge alone.
The abstract pipeline follows this pattern:
GOLDEN_GENERATION_FROM_DOCS(document_paths, config):
1. LOAD documents from file paths (PDF, TXT, DOCX, MD)
2. CHUNK documents into token-bounded segments
3. EMBED chunks and CONSTRUCT context groups via similarity search
4. FOR each context group:
a. GENERATE up to max_goldens_per_context evaluation queries
b. IF include_expected_output: SYNTHESIZE expected answers from context
c. CREATE Golden(input, expected_output, context, source_file)
5. RETURN List[Golden]
Key properties:
- End-to-end automation -- from raw files to structured evaluation data in a single call.
- Source traceability -- each golden retains a reference to its source file and context passages.
- Configurable density -- the max_goldens_per_context parameter controls how many evaluation items are generated per context group.