Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Confident ai Deepeval Synthetic Data Synthesis

From Leeroopedia
Metadata
Knowledge Sources
Domains
Last Updated 2026-02-14 09:00 GMT

Overview

Synthetic data synthesis is the process of using large language models to automatically generate evaluation test data from source material. It addresses the fundamental bottleneck in LLM evaluation: the cost and effort of manually crafting diverse, high-quality evaluation datasets.

Description

Creating evaluation datasets manually is time-consuming, expensive, and often results in limited coverage of edge cases and failure modes. Synthetic data synthesis automates this by leveraging LLMs to:

  • Generate diverse evaluation queries -- produce a wide range of questions and prompts from source documents or contexts, covering various difficulty levels and question types.
  • Synthesize expected outputs -- automatically generate ground-truth answers grounded in the source material, providing the expected_output field for evaluation test cases.
  • Apply query evolution -- transform simple, straightforward queries into more complex variants through reasoning, multi-hop, and conditional evolution strategies.
  • Filter for quality -- automatically discard low-quality or malformed generated data using configurable filtration criteria.
  • Style output formatting -- control the format and tone of generated data to match specific evaluation requirements.

In the DeepEval framework, synthetic data synthesis is orchestrated by the Synthesizer class, which provides a unified interface for generating evaluation goldens from documents, pre-prepared contexts, or existing golden templates.

Usage

Synthetic data synthesis is used whenever evaluation datasets need to be created or expanded without manual authoring effort. It is especially valuable for:

  • Bootstrapping evaluation suites for new LLM applications
  • Expanding test coverage to include diverse question types and difficulty levels
  • Generating regression test data from updated documentation or knowledge bases
  • Creating adversarial or edge-case evaluation scenarios through query evolution

Theoretical Basis

Synthetic data synthesis for LLM evaluation draws from several established techniques in NLP and machine learning:

  • LLM-based data augmentation -- using language models as generators to produce training or evaluation data, a technique that has been shown to improve coverage and diversity compared to manual authoring alone.
  • Query evolution -- systematically transforming simple queries into more complex forms (multi-hop reasoning, conditional logic, comparative analysis) to test deeper model capabilities. This draws from curriculum learning and adversarial example generation.
  • Quality filtering -- applying automated quality checks to generated data using LLM-as-judge techniques, ensuring that only coherent, answerable, and well-formed goldens are retained.

Reference: Synthetic data generation techniques for NLP evaluation, including approaches inspired by the Evol-Instruct method for instruction-following data generation.

The abstract synthesis process follows this pattern:

SYNTHETIC_DATA_SYNTHESIS(source_material, llm, config):
    1. EXTRACT contexts from source material (documents or pre-prepared)
    2. FOR each context group:
        a. GENERATE base queries conditioned on context
        b. IF evolution_config: EVOLVE queries through complexity transformations
        c. GENERATE expected outputs grounded in context
        d. IF filtration_config: FILTER low-quality query-answer pairs
        e. IF styling_config: APPLY output formatting
    3. RETURN List[Golden] with input, expected_output, context fields

Key properties:

  • Groundedness -- all generated data is anchored to source material, ensuring factual accuracy.
  • Diversity -- evolution strategies produce varied question types from the same source.
  • Scalability -- async generation with configurable concurrency enables large-scale dataset creation.
  • Quality control -- filtration ensures that generated goldens meet minimum quality standards.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment