Workflow:Vibrantlabsai Ragas Testset Generation
| Knowledge Sources | |
|---|---|
| Domains | LLM_Ops, Data_Engineering, Test_Generation |
| Last Updated | 2026-02-12 10:00 GMT |
Overview
End-to-end process for generating synthetic evaluation test datasets from source documents using knowledge graphs, query synthesizers, and persona-based diversity.
Description
This workflow covers the generation of synthetic test data for evaluating RAG pipelines. Rather than manually creating test questions and answers, Ragas builds a knowledge graph from source documents, extracts entities and relationships, generates diverse user personas, and synthesizes query-context-answer triplets of varying complexity (single-hop, multi-hop) and styles (specific, abstract). The output is a Testset containing realistic evaluation samples that can be directly used with the evaluation pipeline.
Key outputs:
- A Testset with synthetic queries, reference contexts, and reference answers
- Support for multiple query types: SingleHop, MultiHop, Specific, Abstract
- Persona-driven diversity in query generation
Usage
Execute this workflow when you need to create evaluation datasets for a RAG system but lack manually curated test data. This is appropriate when you have source documents that your RAG system indexes and want to automatically generate diverse, realistic test questions with ground truth answers covering different levels of reasoning complexity.
Execution Steps
Step 1: Load_Source_Documents
Load the source documents that your RAG system indexes. Documents can be loaded using LangChain document loaders, LlamaIndex readers, or any method that produces text content with metadata. Each document becomes a node in the knowledge graph.
Key considerations:
- Documents should represent the knowledge base of your RAG system
- Supported loaders include LangChain DirectoryLoader, LlamaIndex SimpleDirectoryReader
- Pre-chunked documents can be used to bypass internal splitting
Step 2: Build_Knowledge_Graph
Create a KnowledgeGraph from the loaded documents. The graph construction pipeline applies a sequence of transformations: document splitting into hierarchical chunks, embedding generation for semantic similarity, entity extraction (NER, keyphrases), and relationship building between chunks based on shared entities or semantic overlap.
What happens:
- Documents are split into hierarchical nodes (parent-child relationships)
- Embeddings are computed for each chunk
- Named entities and keyphrases are extracted using LLM-based extractors
- Relationships are built using Jaccard similarity, cosine similarity, or overlap scores
- The result is a graph with nodes (chunks) connected by semantic relationships
Step 3: Generate_Personas
Generate diverse user personas from the knowledge graph to ensure test questions represent different user perspectives. The persona generator clusters document summaries by embedding similarity, selects representative summaries from each cluster, and uses an LLM to generate personas with names and role descriptions.
Key considerations:
- Personas drive query diversity by representing different user types
- The number of personas can be configured (default determined by cluster count)
- Custom personas can be provided instead of auto-generation
Step 4: Configure_Query_Distribution
Define the distribution of query types to generate. The distribution specifies what fraction of queries should be single-hop vs multi-hop, and specific vs abstract. Each query type uses a different QuerySynthesizer that finds appropriate node sets from the knowledge graph.
Query type categories:
- SingleHopSpecific: Fact-based questions answerable from one document
- SingleHopAbstract: Interpretive questions requiring reasoning from one document
- MultiHopSpecific: Questions requiring facts from multiple documents
- MultiHopAbstract: Questions requiring synthesis across multiple documents
Step 5: Synthesize_Test_Samples
Run the TestsetGenerator to produce test samples according to the configured distribution. For each query type, the synthesizer finds qualified node sets from the graph, creates scenario combinations of (nodes, persona, query_style, query_length), and uses an LLM to generate the actual query, reference context, and reference answer.
What happens:
- QuerySynthesizers generate scenarios by querying the knowledge graph
- For multi-hop queries, the graph's find_two_nodes_single_rel() or find_indirect_clusters() methods find connected node sets
- Each scenario is transformed into a (query, reference_contexts, reference_answer) triplet
- The LLM generates natural language queries and answers from the document context
Step 6: Export_Testset
Export the generated Testset for use in evaluation. The Testset can be converted to an EvaluationDataset for direct use with the evaluate() function, exported as a pandas DataFrame, or saved to various formats. The Testset schema includes query, reference_contexts, reference_answer, and metadata about query type.
Key considerations:
- Testset can be directly converted to EvaluationDataset via to_evaluation_dataset()
- Results can be saved to HuggingFace datasets or local files
- The knowledge graph can be saved and reused for future generation runs