Implementation:Explodinggradients Ragas TestsetGenerator Generate

Knowledge Sources	Domains	Last Updated
explodinggradients/ragas	LLM Evaluation, Test Data Generation, Query Synthesis	2026-02-10

Overview

Description

The TestsetGenerator class is the main entry point for generating evaluation test sets in Ragas. It orchestrates the entire pipeline: loading documents into a knowledge graph, applying enrichment transforms, generating personas, creating query scenarios across multiple synthesizer types, and assembling the final Testset. The class supports both LangChain and LlamaIndex document formats through dedicated factory methods and convenience methods.

Usage

Users typically instantiate TestsetGenerator with an LLM and embedding model, then call one of the generate_with_* methods with their documents. For advanced workflows, users can construct and enrich a knowledge graph separately and pass it directly, then call .generate().

Code Reference

Source Location

Component	File	Lines
`TestsetGenerator`	`src/ragas/testset/synthesizers/generate.py`	L54-627

Signature

@dataclass
class TestsetGenerator:
    llm: BaseRagasLLM
    embedding_model: BaseRagasEmbeddings
    knowledge_graph: KnowledgeGraph = field(default_factory=KnowledgeGraph)
    persona_list: Optional[List[Persona]] = None
    llm_context: Optional[str] = None

Import

from ragas.testset import TestsetGenerator
# or
from ragas.testset.synthesizers.generate import TestsetGenerator

Key Methods

Method	Description
`generate(testset_size, query_distribution=None, num_personas=3, run_config=None, batch_size=None, callbacks=None, token_usage_parser=None, with_debugging_logs=False, raise_exceptions=True, return_executor=False) -> Union[Testset, Executor]`	Core generation method. Generates scenarios and samples from the knowledge graph using the query distribution.
`generate_with_langchain_docs(documents, testset_size, transforms=None, transforms_llm=None, transforms_embedding_model=None, query_distribution=None, run_config=None, callbacks=None, token_usage_parser=None, with_debugging_logs=False, raise_exceptions=True, return_executor=False) -> Union[Testset, Executor]`	End-to-end generation from LangChain documents. Converts documents to nodes, applies transforms, builds the knowledge graph, then calls `generate()`.
`generate_with_llamaindex_docs(documents, testset_size, transforms=None, transforms_llm=None, transforms_embedding_model=None, query_distribution=None, run_config=None, callbacks=None, token_usage_parser=None, with_debugging_logs=False, raise_exceptions=True) -> Testset`	End-to-end generation from LlamaIndex documents.
`generate_with_chunks(chunks, testset_size, transforms=None, ...) -> Union[Testset, Executor]`	Generation from pre-chunked documents (strings or LangChain Documents treated as CHUNK nodes).
`from_langchain(llm, embedding_model, knowledge_graph=None, llm_context=None) -> TestsetGenerator`	Class method factory that wraps LangChain LLM and embedding model.
`from_llama_index(llm, embedding_model, knowledge_graph=None, llm_context=None) -> TestsetGenerator`	Class method factory that wraps LlamaIndex LLM and embedding model.

I/O Contract

Constructor Parameters

Parameter	Type	Default	Description
`llm`	`BaseRagasLLM`	(required)	The language model used for transforms, persona generation, and query synthesis
`embedding_model`	`BaseRagasEmbeddings`	(required)	The embedding model used for transforms (summary embeddings, similarity computations)
`knowledge_graph`	`KnowledgeGraph`	`KnowledgeGraph()`	Pre-built knowledge graph; if empty, will be populated by `generate_with_*` methods
`persona_list`	`Optional[List[Persona]]`	`None`	Pre-defined personas; if `None`, generated automatically from the knowledge graph
`llm_context`	`Optional[str]`	`None`	Additional context string to guide LLM query and answer generation

generate() Parameters

Parameter	Type	Default	Description
`testset_size`	`int`	(required)	Number of test samples to generate
`query_distribution`	`Optional[QueryDistribution]`	`None`	List of (synthesizer, probability) tuples; defaults to equal split of SingleHopSpecific, MultiHopAbstract, MultiHopSpecific
`num_personas`	`int`	`3`	Number of personas to generate or use
`run_config`	`Optional[RunConfig]`	`None`	Execution configuration (timeouts, retries)
`batch_size`	`Optional[int]`	`None`	Batch size for parallel execution; `None` disables batching
`callbacks`	`Optional[Callbacks]`	`None`	LangChain-style callbacks for monitoring
`token_usage_parser`	`Optional[TokenUsageParser]`	`None`	Parser for computing token costs
`with_debugging_logs`	`bool`	`False`	Enable debug logging
`raise_exceptions`	`bool`	`True`	Whether to raise exceptions during generation
`return_executor`	`bool`	`False`	If `True`, returns the `Executor` for cancellable execution instead of the `Testset`

Output

Return Type	Description
`Testset`	Contains a list of `TestsetSample` objects, each with an `eval_sample` (SingleTurnSample or MultiTurnSample) and `synthesizer_name`
`Executor`	Returned when `return_executor=True`; call `.results()` to get the test set or `.cancel()` to abort

Usage Examples

Basic Generation With LangChain Documents

from ragas.testset import TestsetGenerator
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.document_loaders import DirectoryLoader

# Load documents
loader = DirectoryLoader("./docs/", glob="**/*.md")
documents = loader.load()

# Create generator from LangChain models
generator = TestsetGenerator.from_langchain(
    llm=ChatOpenAI(model="gpt-4o"),
    embedding_model=OpenAIEmbeddings(),
)

# Generate 20 test samples
testset = generator.generate_with_langchain_docs(
    documents=documents,
    testset_size=20,
)

# Convert to pandas for inspection
df = testset.to_pandas()
print(df.head())

Custom Query Distribution

from ragas.testset import TestsetGenerator
from ragas.testset.synthesizers.single_hop.specific import SingleHopSpecificQuerySynthesizer
from ragas.testset.synthesizers.multi_hop import MultiHopSpecificQuerySynthesizer
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI

llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))

# Define a custom distribution: 70% single-hop, 30% multi-hop
query_distribution = [
    (SingleHopSpecificQuerySynthesizer(llm=llm), 0.7),
    (MultiHopSpecificQuerySynthesizer(llm=llm), 0.3),
]

testset = generator.generate(
    testset_size=50,
    query_distribution=query_distribution,
    num_personas=5,
)

Generation With Pre-Chunked Documents

from ragas.testset import TestsetGenerator
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

generator = TestsetGenerator.from_langchain(
    llm=ChatOpenAI(model="gpt-4o"),
    embedding_model=OpenAIEmbeddings(),
)

# Pre-chunked text strings
chunks = [
    "Machine learning models require training data.",
    "Neural networks are a type of machine learning model.",
    "Gradient descent is used to optimize neural networks.",
    "Backpropagation computes gradients for each layer.",
]

testset = generator.generate_with_chunks(
    chunks=chunks,
    testset_size=10,
)

LlamaIndex Integration

from ragas.testset import TestsetGenerator
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import SimpleDirectoryReader

# Load documents with LlamaIndex
documents = SimpleDirectoryReader("./data/").load_data()

# Create generator from LlamaIndex models
generator = TestsetGenerator.from_llama_index(
    llm=OpenAI(model="gpt-4o"),
    embedding_model=OpenAIEmbedding(),
)

testset = generator.generate_with_llamaindex_docs(
    documents=documents,
    testset_size=20,
)

Related Pages

Principle:Explodinggradients_Ragas_Test_Query_Synthesis
Implementation:Explodinggradients_Ragas_KnowledgeGraph_Class -- the knowledge graph consumed by the generator
Implementation:Explodinggradients_Ragas_KnowledgeGraph_Enrichment_Methods -- enrichment methods used during transforms
Implementation:Explodinggradients_Ragas_Generate_Personas_From_KG -- persona generation called internally
Implementation:Explodinggradients_Ragas_Testset_Export_Methods -- export methods on the returned Testset
Implementation:Explodinggradients_Ragas_Document_Loader_Interface -- document formats accepted by generate_with_* methods
Environment:Explodinggradients_Ragas_LLM_Provider_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment