Implementation:Explodinggradients Ragas MultiTurnSample Class
MultiTurnSample Class
MultiTurnSample is a Pydantic model in the Ragas library that represents a single evaluation sample for multi-turn agent conversations. It holds the conversation messages along with optional reference data (expected outcomes, tool calls, topics, rubrics) used by evaluation metrics.
Source Location
- File:
src/ragas/dataset_schema.py(lines 100-179) - Repository: explodinggradients/ragas
Import
from ragas.dataset_schema import MultiTurnSample
Class Definition
class MultiTurnSample(BaseSample):
user_input: t.List[t.Union[HumanMessage, AIMessage, ToolMessage]]
reference: t.Optional[str] = None
reference_tool_calls: t.Optional[t.List[ToolCall]] = None
rubrics: t.Optional[t.Dict[str, str]] = None
reference_topics: t.Optional[t.List[str]] = None
MultiTurnSample inherits from BaseSample, which in turn inherits from pydantic.BaseModel.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
user_input |
List[Union[HumanMessage, AIMessage, ToolMessage]] |
(required) | The list of conversation messages representing the multi-turn interaction. Must pass structural validation. |
reference |
Optional[str] |
None |
The reference answer or expected outcome for the conversation. Used by AgentGoalAccuracyWithReference.
|
reference_tool_calls |
Optional[List[ToolCall]] |
None |
A list of expected tool calls. Used by ToolCallAccuracy and ToolCallF1.
|
reference_topics |
Optional[List[str]] |
None |
A list of allowed reference topics. Used by TopicAdherenceScore.
|
rubrics |
Optional[Dict[str, str]] |
None |
Evaluation rubrics for rubric-based metrics. |
Validation: validate_user_input
The user_input field has a Pydantic field validator that enforces structural constraints on the message sequence:
@field_validator("user_input")
@classmethod
def validate_user_input(
cls,
messages: t.List[t.Union[HumanMessage, AIMessage, ToolMessage]],
) -> t.List[t.Union[HumanMessage, AIMessage, ToolMessage]]:
The validator enforces three rules:
- All messages must be typed: Every element must be an instance of
HumanMessage,AIMessage, orToolMessage. AValueErroris raised otherwise. - ToolMessage must be preceded by an AIMessage: A
ToolMessagecannot appear in the conversation before anyAIMessagehas appeared. - ToolMessage must follow an AIMessage with tool_calls or another ToolMessage: A
ToolMessagemust immediately follow either:
- An
AIMessagethat has a non-emptytool_callsfield, OR - Another
ToolMessage(allowing contiguous blocks of tool responses)
- An
These rules ensure that tool responses are structurally valid -- they only appear after an AI message has invoked tools.
Methods
to_messages
def to_messages(self) -> List[Dict]:
Converts the user_input message list to a list of plain dictionaries using each message's model_dump() method.
pretty_repr
def pretty_repr(self) -> str:
Returns a human-readable string representation of the conversation. Each message is formatted using its own pretty_repr() method:
HumanMessage:"Human: {content}"AIMessage:"AI: {content}"followed by tool calls if present (formatted as"Tools:\n {name}: {args}")ToolMessage:"ToolOutput: {content}"
Messages are joined with newlines. This representation is used by LLM-based metrics when passing conversations to evaluator LLMs.
Inherited Methods (from BaseSample)
| Method | Return Type | Description |
|---|---|---|
to_dict() |
Dict |
Dictionary representation excluding None fields (via model_dump(exclude_none=True))
|
get_features() |
List[str] |
List of non-None field names |
to_string() |
str |
Formatted string representation of all non-None fields |
Usage Example
from ragas.dataset_schema import MultiTurnSample
from ragas.messages import HumanMessage, AIMessage, ToolCall, ToolMessage
# Create a multi-turn sample with reference data
sample = MultiTurnSample(
user_input=[
HumanMessage(content="Book a table at the best Chinese restaurant for 8pm"),
AIMessage(
content="Let me search for Chinese restaurants.",
tool_calls=[
ToolCall(name="restaurant_search", args={"cuisine": "Chinese", "time": "8pm"})
]
),
ToolMessage(content="Found: Golden Dragon, Jade Palace"),
AIMessage(
content="I'll book Golden Dragon.",
tool_calls=[
ToolCall(name="restaurant_book", args={"name": "Golden Dragon", "time": "8pm"})
]
),
ToolMessage(content="Table booked at Golden Dragon for 8pm."),
AIMessage(content="Your table at Golden Dragon is booked for 8pm!")
],
reference="A table is booked at a Chinese restaurant for 8:00pm.",
reference_tool_calls=[
ToolCall(name="restaurant_search", args={"cuisine": "Chinese", "time": "8pm"}),
ToolCall(name="restaurant_book", args={"name": "Golden Dragon", "time": "8pm"})
],
reference_topics=["Restaurant Booking", "Chinese Cuisine"]
)
# Access conversation as text
print(sample.pretty_repr())
# Output:
# Human: Book a table at the best Chinese restaurant for 8pm
# AI: Let me search for Chinese restaurants.
# Tools:
# restaurant_search: {'cuisine': 'Chinese', 'time': '8pm'}
# ToolOutput: Found: Golden Dragon, Jade Palace
# ...
# Convert to dictionary (for serialization)
sample_dict = sample.to_dict()
# Get list of available features
features = sample.get_features()
# ['user_input', 'reference', 'reference_tool_calls', 'reference_topics']
Validation Example
from ragas.dataset_schema import MultiTurnSample
from ragas.messages import HumanMessage, AIMessage, ToolMessage
# This will raise a ValueError because ToolMessage appears
# before any AIMessage in the conversation
try:
sample = MultiTurnSample(
user_input=[
HumanMessage(content="Hello"),
ToolMessage(content="Some tool output"), # Invalid: no preceding AIMessage
]
)
except ValueError as e:
print(e) # "ToolMessage must be preceded by an AIMessage..."
# This will raise a ValueError because the AIMessage before
# the ToolMessage has no tool_calls
try:
sample = MultiTurnSample(
user_input=[
HumanMessage(content="Hello"),
AIMessage(content="Hi there!"), # No tool_calls
ToolMessage(content="Some tool output"), # Invalid: preceding AI has no tool_calls
]
)
except ValueError as e:
print(e) # "ToolMessage must follow an AIMessage where tools were called."
Integration with EvaluationDataset
MultiTurnSample instances are typically collected into an EvaluationDataset:
from ragas.dataset_schema import EvaluationDataset
dataset = EvaluationDataset(samples=[sample1, sample2, sample3])
# Check if dataset contains multi-turn samples
assert dataset.is_multi_turn()
# Convert to pandas DataFrame
df = dataset.to_pandas()
# Export to JSONL
dataset.to_jsonl("evaluation_data.jsonl")
Internal Dependencies
ragas.dataset_schema.BaseSample-- parent class providing common methods (to_dict,get_features,to_string)ragas.messages.HumanMessage,ragas.messages.AIMessage,ragas.messages.ToolMessage-- typed message classesragas.messages.ToolCall-- tool call data type used inreference_tool_callsand withinAIMessagepydantic.BaseModel-- provides validation, serialization, and schema generation
Implements
See Also
- ToolCallAccuracy Metric -- consumes
user_inputandreference_tool_calls - ToolCallF1 Metric -- consumes
user_inputandreference_tool_calls - AgentGoalAccuracy Metric -- consumes
user_inputandreference - TopicAdherenceScore Metric -- consumes
user_inputandreference_topics - LangGraph Convert Messages -- produces messages for
user_input - Swarm Convert Messages -- produces messages for
user_input