Workflow:Confident ai Deepeval Synthetic Dataset Generation

Knowledge Sources	DeepEval DeepEval Docs
Domains	LLM_Evaluation, Data_Engineering, Synthetic_Data
Last Updated	2026-02-14 09:00 GMT

Overview

End-to-end process for generating synthetic evaluation datasets from documents, contexts, or existing goldens using DeepEval's Synthesizer and ConversationSimulator.

Description

This workflow covers automated generation of evaluation test data for LLM applications. The Synthesizer class generates Golden objects (query-answer pairs with context) from various sources: raw documents, text contexts, or existing goldens. It chunks documents, generates relevant questions, and produces expected answers. For conversational agents, the ConversationSimulator extends goldens into multi-turn conversation scenarios by simulating user interactions. The generated datasets can be saved in JSON, CSV, or JSONL formats and optionally pushed to Confident AI for cloud management.

Usage

Execute this workflow when you need evaluation data but lack manually curated test cases. This applies when bootstrapping a new evaluation suite from existing documentation, generating diverse test scenarios from a knowledge base, augmenting a small hand-curated dataset, or creating multi-turn conversational test scenarios for chatbot evaluation.

Execution Steps

Step 1: Prepare Source Material

Gather the source documents or contexts from which synthetic test data will be generated. Sources can be text files (TXT, Markdown, PDF, DOCX), raw text strings, or existing Golden objects that will be used as seeds for generating variations.

Source types:

Documents: Text files that will be chunked and used as context for question generation
Contexts: Pre-extracted text passages provided as lists of strings
Existing goldens: Golden objects used as seeds for generating similar but varied test cases

Step 2: Configure the Synthesizer

Initialize the Synthesizer with configuration options controlling the generation model, chunking strategy, and output format. The synthesizer uses an LLM to generate questions and answers from the provided context.

Configuration options:

Model selection for generation (defaults to OpenAI)
Chunk size and overlap for document processing
Number of goldens to generate per context
Embedding model for context generation (uses ChromaDB internally)

Step 3: Generate Goldens

Invoke the appropriate generation method based on your source material. The Synthesizer processes the input, creates contextually relevant questions, and generates expected answers. Each generated golden includes the input question, expected output, and the source context.

Generation methods:

From documents: Chunks documents, generates contexts via embeddings, then creates goldens
From contexts: Directly uses provided text passages to generate question-answer pairs
From goldens: Uses existing goldens as templates to generate variations and augmentations
From scratch: Generates goldens without any source material using the LLM's knowledge

Step 4: Simulate Conversations (Optional)

For conversational agent evaluation, use the ConversationSimulator to extend single-turn goldens into multi-turn conversation scenarios. The simulator takes a chatbot callback function and conversational goldens, then simulates realistic user interactions.

Simulation features:

Extends existing turns or starts fresh conversations
Supports early stopping when conversation goals are met
Configurable maximum number of turns
Both sync and async chatbot callback support

Step 5: Review and Export Dataset

Review the generated goldens for quality and export the dataset in the desired format. Optionally push the dataset to Confident AI for cloud-based management, annotation, and sharing with team members.

Export options:

Local formats: JSON, CSV, JSONL via dataset.save()
Cloud: Push to Confident AI via dataset.push()
Review: Interactive review via dataset.review()

Execution Diagram

GitHub URL

Workflow Repository