Principle:Confident ai Deepeval Synthetic Data Synthesis
| Knowledge Sources | |
|---|---|
| Domains | |
| Last Updated | 2026-02-14 09:00 GMT |
Overview
Synthetic data synthesis is the process of using large language models to automatically generate evaluation test data from source material. It addresses the fundamental bottleneck in LLM evaluation: the cost and effort of manually crafting diverse, high-quality evaluation datasets.
Description
Creating evaluation datasets manually is time-consuming, expensive, and often results in limited coverage of edge cases and failure modes. Synthetic data synthesis automates this by leveraging LLMs to:
- Generate diverse evaluation queries -- produce a wide range of questions and prompts from source documents or contexts, covering various difficulty levels and question types.
- Synthesize expected outputs -- automatically generate ground-truth answers grounded in the source material, providing the expected_output field for evaluation test cases.
- Apply query evolution -- transform simple, straightforward queries into more complex variants through reasoning, multi-hop, and conditional evolution strategies.
- Filter for quality -- automatically discard low-quality or malformed generated data using configurable filtration criteria.
- Style output formatting -- control the format and tone of generated data to match specific evaluation requirements.
In the DeepEval framework, synthetic data synthesis is orchestrated by the Synthesizer class, which provides a unified interface for generating evaluation goldens from documents, pre-prepared contexts, or existing golden templates.
Usage
Synthetic data synthesis is used whenever evaluation datasets need to be created or expanded without manual authoring effort. It is especially valuable for:
- Bootstrapping evaluation suites for new LLM applications
- Expanding test coverage to include diverse question types and difficulty levels
- Generating regression test data from updated documentation or knowledge bases
- Creating adversarial or edge-case evaluation scenarios through query evolution
Theoretical Basis
Synthetic data synthesis for LLM evaluation draws from several established techniques in NLP and machine learning:
- LLM-based data augmentation -- using language models as generators to produce training or evaluation data, a technique that has been shown to improve coverage and diversity compared to manual authoring alone.
- Query evolution -- systematically transforming simple queries into more complex forms (multi-hop reasoning, conditional logic, comparative analysis) to test deeper model capabilities. This draws from curriculum learning and adversarial example generation.
- Quality filtering -- applying automated quality checks to generated data using LLM-as-judge techniques, ensuring that only coherent, answerable, and well-formed goldens are retained.
Reference: Synthetic data generation techniques for NLP evaluation, including approaches inspired by the Evol-Instruct method for instruction-following data generation.
The abstract synthesis process follows this pattern:
SYNTHETIC_DATA_SYNTHESIS(source_material, llm, config):
1. EXTRACT contexts from source material (documents or pre-prepared)
2. FOR each context group:
a. GENERATE base queries conditioned on context
b. IF evolution_config: EVOLVE queries through complexity transformations
c. GENERATE expected outputs grounded in context
d. IF filtration_config: FILTER low-quality query-answer pairs
e. IF styling_config: APPLY output formatting
3. RETURN List[Golden] with input, expected_output, context fields
Key properties:
- Groundedness -- all generated data is anchored to source material, ensuring factual accuracy.
- Diversity -- evolution strategies produce varied question types from the same source.
- Scalability -- async generation with configurable concurrency enables large-scale dataset creation.
- Quality control -- filtration ensures that generated goldens meet minimum quality standards.