Implementation:Explodinggradients Ragas MultiTurnSample Class

MultiTurnSample Class

MultiTurnSample is a Pydantic model in the Ragas library that represents a single evaluation sample for multi-turn agent conversations. It holds the conversation messages along with optional reference data (expected outcomes, tool calls, topics, rubrics) used by evaluation metrics.

Source Location

File: src/ragas/dataset_schema.py (lines 100-179)
Repository: explodinggradients/ragas

Import

from ragas.dataset_schema import MultiTurnSample

Class Definition

class MultiTurnSample(BaseSample):
    user_input: t.List[t.Union[HumanMessage, AIMessage, ToolMessage]]
    reference: t.Optional[str] = None
    reference_tool_calls: t.Optional[t.List[ToolCall]] = None
    rubrics: t.Optional[t.Dict[str, str]] = None
    reference_topics: t.Optional[t.List[str]] = None

MultiTurnSample inherits from BaseSample, which in turn inherits from pydantic.BaseModel.

Fields

Field	Type	Default	Description
`user_input`	`List[Union[HumanMessage, AIMessage, ToolMessage]]`	(required)	The list of conversation messages representing the multi-turn interaction. Must pass structural validation.
`reference`	`Optional[str]`	`None`	The reference answer or expected outcome for the conversation. Used by `AgentGoalAccuracyWithReference`.
`reference_tool_calls`	`Optional[List[ToolCall]]`	`None`	A list of expected tool calls. Used by `ToolCallAccuracy` and `ToolCallF1`.
`reference_topics`	`Optional[List[str]]`	`None`	A list of allowed reference topics. Used by `TopicAdherenceScore`.
`rubrics`	`Optional[Dict[str, str]]`	`None`	Evaluation rubrics for rubric-based metrics.

Validation: validate_user_input

The user_input field has a Pydantic field validator that enforces structural constraints on the message sequence:

@field_validator("user_input")
@classmethod
def validate_user_input(
    cls,
    messages: t.List[t.Union[HumanMessage, AIMessage, ToolMessage]],
) -> t.List[t.Union[HumanMessage, AIMessage, ToolMessage]]:

The validator enforces three rules:

All messages must be typed: Every element must be an instance of HumanMessage, AIMessage, or ToolMessage. A ValueError is raised otherwise.
ToolMessage must be preceded by an AIMessage: A ToolMessage cannot appear in the conversation before any AIMessage has appeared.
ToolMessage must follow an AIMessage with tool_calls or another ToolMessage: A ToolMessage must immediately follow either:

- An AIMessage that has a non-empty tool_calls field, OR
- Another ToolMessage (allowing contiguous blocks of tool responses)

These rules ensure that tool responses are structurally valid -- they only appear after an AI message has invoked tools.

Methods

to_messages

def to_messages(self) -> List[Dict]:

Converts the user_input message list to a list of plain dictionaries using each message's model_dump() method.

pretty_repr

def pretty_repr(self) -> str:

Returns a human-readable string representation of the conversation. Each message is formatted using its own pretty_repr() method:

HumanMessage: "Human: {content}"
AIMessage: "AI: {content}" followed by tool calls if present (formatted as "Tools:\n {name}: {args}")
ToolMessage: "ToolOutput: {content}"

Messages are joined with newlines. This representation is used by LLM-based metrics when passing conversations to evaluator LLMs.

Inherited Methods (from BaseSample)

Method	Return Type	Description
`to_dict()`	`Dict`	Dictionary representation excluding None fields (via `model_dump(exclude_none=True)`)
`get_features()`	`List[str]`	List of non-None field names
`to_string()`	`str`	Formatted string representation of all non-None fields

Usage Example

from ragas.dataset_schema import MultiTurnSample
from ragas.messages import HumanMessage, AIMessage, ToolCall, ToolMessage

# Create a multi-turn sample with reference data
sample = MultiTurnSample(
    user_input=[
        HumanMessage(content="Book a table at the best Chinese restaurant for 8pm"),
        AIMessage(
            content="Let me search for Chinese restaurants.",
            tool_calls=[
                ToolCall(name="restaurant_search", args={"cuisine": "Chinese", "time": "8pm"})
            ]
        ),
        ToolMessage(content="Found: Golden Dragon, Jade Palace"),
        AIMessage(
            content="I'll book Golden Dragon.",
            tool_calls=[
                ToolCall(name="restaurant_book", args={"name": "Golden Dragon", "time": "8pm"})
            ]
        ),
        ToolMessage(content="Table booked at Golden Dragon for 8pm."),
        AIMessage(content="Your table at Golden Dragon is booked for 8pm!")
    ],
    reference="A table is booked at a Chinese restaurant for 8:00pm.",
    reference_tool_calls=[
        ToolCall(name="restaurant_search", args={"cuisine": "Chinese", "time": "8pm"}),
        ToolCall(name="restaurant_book", args={"name": "Golden Dragon", "time": "8pm"})
    ],
    reference_topics=["Restaurant Booking", "Chinese Cuisine"]
)

# Access conversation as text
print(sample.pretty_repr())
# Output:
# Human: Book a table at the best Chinese restaurant for 8pm
# AI: Let me search for Chinese restaurants.
# Tools:
#   restaurant_search: {'cuisine': 'Chinese', 'time': '8pm'}
# ToolOutput: Found: Golden Dragon, Jade Palace
# ...

# Convert to dictionary (for serialization)
sample_dict = sample.to_dict()

# Get list of available features
features = sample.get_features()
# ['user_input', 'reference', 'reference_tool_calls', 'reference_topics']

Validation Example

from ragas.dataset_schema import MultiTurnSample
from ragas.messages import HumanMessage, AIMessage, ToolMessage

# This will raise a ValueError because ToolMessage appears
# before any AIMessage in the conversation
try:
    sample = MultiTurnSample(
        user_input=[
            HumanMessage(content="Hello"),
            ToolMessage(content="Some tool output"),  # Invalid: no preceding AIMessage
        ]
    )
except ValueError as e:
    print(e)  # "ToolMessage must be preceded by an AIMessage..."

# This will raise a ValueError because the AIMessage before
# the ToolMessage has no tool_calls
try:
    sample = MultiTurnSample(
        user_input=[
            HumanMessage(content="Hello"),
            AIMessage(content="Hi there!"),  # No tool_calls
            ToolMessage(content="Some tool output"),  # Invalid: preceding AI has no tool_calls
        ]
    )
except ValueError as e:
    print(e)  # "ToolMessage must follow an AIMessage where tools were called."

Integration with EvaluationDataset

MultiTurnSample instances are typically collected into an EvaluationDataset:

from ragas.dataset_schema import EvaluationDataset

dataset = EvaluationDataset(samples=[sample1, sample2, sample3])

# Check if dataset contains multi-turn samples
assert dataset.is_multi_turn()

# Convert to pandas DataFrame
df = dataset.to_pandas()

# Export to JSONL
dataset.to_jsonl("evaluation_data.jsonl")

Internal Dependencies

ragas.dataset_schema.BaseSample -- parent class providing common methods (to_dict, get_features, to_string)
ragas.messages.HumanMessage, ragas.messages.AIMessage, ragas.messages.ToolMessage -- typed message classes
ragas.messages.ToolCall -- tool call data type used in reference_tool_calls and within AIMessage
pydantic.BaseModel -- provides validation, serialization, and schema generation

Implements

Principle:Explodinggradients_Ragas_Multi_Turn_Evaluation_Schema

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment