Workflow:Vibrantlabsai Ragas Testset Generation

Knowledge Sources	Ragas Ragas Docs Testset Gen Guide
Domains	LLM_Ops, Data_Engineering, Test_Generation
Last Updated	2026-02-12 10:00 GMT

Overview

End-to-end process for generating synthetic evaluation test datasets from source documents using knowledge graphs, query synthesizers, and persona-based diversity.

Description

This workflow covers the generation of synthetic test data for evaluating RAG pipelines. Rather than manually creating test questions and answers, Ragas builds a knowledge graph from source documents, extracts entities and relationships, generates diverse user personas, and synthesizes query-context-answer triplets of varying complexity (single-hop, multi-hop) and styles (specific, abstract). The output is a Testset containing realistic evaluation samples that can be directly used with the evaluation pipeline.

Key outputs:

A Testset with synthetic queries, reference contexts, and reference answers
Support for multiple query types: SingleHop, MultiHop, Specific, Abstract
Persona-driven diversity in query generation

Usage

Execute this workflow when you need to create evaluation datasets for a RAG system but lack manually curated test data. This is appropriate when you have source documents that your RAG system indexes and want to automatically generate diverse, realistic test questions with ground truth answers covering different levels of reasoning complexity.

Execution Steps

Step 1: Load_Source_Documents

Load the source documents that your RAG system indexes. Documents can be loaded using LangChain document loaders, LlamaIndex readers, or any method that produces text content with metadata. Each document becomes a node in the knowledge graph.

Key considerations:

Documents should represent the knowledge base of your RAG system
Supported loaders include LangChain DirectoryLoader, LlamaIndex SimpleDirectoryReader
Pre-chunked documents can be used to bypass internal splitting

Step 2: Build_Knowledge_Graph

Create a KnowledgeGraph from the loaded documents. The graph construction pipeline applies a sequence of transformations: document splitting into hierarchical chunks, embedding generation for semantic similarity, entity extraction (NER, keyphrases), and relationship building between chunks based on shared entities or semantic overlap.

What happens:

Documents are split into hierarchical nodes (parent-child relationships)
Embeddings are computed for each chunk
Named entities and keyphrases are extracted using LLM-based extractors
Relationships are built using Jaccard similarity, cosine similarity, or overlap scores
The result is a graph with nodes (chunks) connected by semantic relationships

Step 3: Generate_Personas

Generate diverse user personas from the knowledge graph to ensure test questions represent different user perspectives. The persona generator clusters document summaries by embedding similarity, selects representative summaries from each cluster, and uses an LLM to generate personas with names and role descriptions.

Key considerations:

Personas drive query diversity by representing different user types
The number of personas can be configured (default determined by cluster count)
Custom personas can be provided instead of auto-generation

Step 4: Configure_Query_Distribution

Define the distribution of query types to generate. The distribution specifies what fraction of queries should be single-hop vs multi-hop, and specific vs abstract. Each query type uses a different QuerySynthesizer that finds appropriate node sets from the knowledge graph.

Query type categories:

SingleHopSpecific: Fact-based questions answerable from one document
SingleHopAbstract: Interpretive questions requiring reasoning from one document
MultiHopSpecific: Questions requiring facts from multiple documents
MultiHopAbstract: Questions requiring synthesis across multiple documents

Step 5: Synthesize_Test_Samples

Run the TestsetGenerator to produce test samples according to the configured distribution. For each query type, the synthesizer finds qualified node sets from the graph, creates scenario combinations of (nodes, persona, query_style, query_length), and uses an LLM to generate the actual query, reference context, and reference answer.

What happens:

QuerySynthesizers generate scenarios by querying the knowledge graph
For multi-hop queries, the graph's find_two_nodes_single_rel() or find_indirect_clusters() methods find connected node sets
Each scenario is transformed into a (query, reference_contexts, reference_answer) triplet
The LLM generates natural language queries and answers from the document context

Step 6: Export_Testset

Export the generated Testset for use in evaluation. The Testset can be converted to an EvaluationDataset for direct use with the evaluate() function, exported as a pandas DataFrame, or saved to various formats. The Testset schema includes query, reference_contexts, reference_answer, and metadata about query type.

Key considerations:

Testset can be directly converted to EvaluationDataset via to_evaluation_dataset()
Results can be saved to HuggingFace datasets or local files
The knowledge graph can be saved and reused for future generation runs

Execution Diagram

GitHub URL

Workflow Repository