Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Explodinggradients Ragas MultiTurnSample Class

From Leeroopedia


MultiTurnSample Class

MultiTurnSample is a Pydantic model in the Ragas library that represents a single evaluation sample for multi-turn agent conversations. It holds the conversation messages along with optional reference data (expected outcomes, tool calls, topics, rubrics) used by evaluation metrics.

Source Location

Import

from ragas.dataset_schema import MultiTurnSample

Class Definition

class MultiTurnSample(BaseSample):
    user_input: t.List[t.Union[HumanMessage, AIMessage, ToolMessage]]
    reference: t.Optional[str] = None
    reference_tool_calls: t.Optional[t.List[ToolCall]] = None
    rubrics: t.Optional[t.Dict[str, str]] = None
    reference_topics: t.Optional[t.List[str]] = None

MultiTurnSample inherits from BaseSample, which in turn inherits from pydantic.BaseModel.

Fields

Field Type Default Description
user_input List[Union[HumanMessage, AIMessage, ToolMessage]] (required) The list of conversation messages representing the multi-turn interaction. Must pass structural validation.
reference Optional[str] None The reference answer or expected outcome for the conversation. Used by AgentGoalAccuracyWithReference.
reference_tool_calls Optional[List[ToolCall]] None A list of expected tool calls. Used by ToolCallAccuracy and ToolCallF1.
reference_topics Optional[List[str]] None A list of allowed reference topics. Used by TopicAdherenceScore.
rubrics Optional[Dict[str, str]] None Evaluation rubrics for rubric-based metrics.

Validation: validate_user_input

The user_input field has a Pydantic field validator that enforces structural constraints on the message sequence:

@field_validator("user_input")
@classmethod
def validate_user_input(
    cls,
    messages: t.List[t.Union[HumanMessage, AIMessage, ToolMessage]],
) -> t.List[t.Union[HumanMessage, AIMessage, ToolMessage]]:

The validator enforces three rules:

  1. All messages must be typed: Every element must be an instance of HumanMessage, AIMessage, or ToolMessage. A ValueError is raised otherwise.
  2. ToolMessage must be preceded by an AIMessage: A ToolMessage cannot appear in the conversation before any AIMessage has appeared.
  3. ToolMessage must follow an AIMessage with tool_calls or another ToolMessage: A ToolMessage must immediately follow either:
    • An AIMessage that has a non-empty tool_calls field, OR
    • Another ToolMessage (allowing contiguous blocks of tool responses)

These rules ensure that tool responses are structurally valid -- they only appear after an AI message has invoked tools.

Methods

to_messages

def to_messages(self) -> List[Dict]:

Converts the user_input message list to a list of plain dictionaries using each message's model_dump() method.

pretty_repr

def pretty_repr(self) -> str:

Returns a human-readable string representation of the conversation. Each message is formatted using its own pretty_repr() method:

  • HumanMessage: "Human: {content}"
  • AIMessage: "AI: {content}" followed by tool calls if present (formatted as "Tools:\n {name}: {args}")
  • ToolMessage: "ToolOutput: {content}"

Messages are joined with newlines. This representation is used by LLM-based metrics when passing conversations to evaluator LLMs.

Inherited Methods (from BaseSample)

Method Return Type Description
to_dict() Dict Dictionary representation excluding None fields (via model_dump(exclude_none=True))
get_features() List[str] List of non-None field names
to_string() str Formatted string representation of all non-None fields

Usage Example

from ragas.dataset_schema import MultiTurnSample
from ragas.messages import HumanMessage, AIMessage, ToolCall, ToolMessage

# Create a multi-turn sample with reference data
sample = MultiTurnSample(
    user_input=[
        HumanMessage(content="Book a table at the best Chinese restaurant for 8pm"),
        AIMessage(
            content="Let me search for Chinese restaurants.",
            tool_calls=[
                ToolCall(name="restaurant_search", args={"cuisine": "Chinese", "time": "8pm"})
            ]
        ),
        ToolMessage(content="Found: Golden Dragon, Jade Palace"),
        AIMessage(
            content="I'll book Golden Dragon.",
            tool_calls=[
                ToolCall(name="restaurant_book", args={"name": "Golden Dragon", "time": "8pm"})
            ]
        ),
        ToolMessage(content="Table booked at Golden Dragon for 8pm."),
        AIMessage(content="Your table at Golden Dragon is booked for 8pm!")
    ],
    reference="A table is booked at a Chinese restaurant for 8:00pm.",
    reference_tool_calls=[
        ToolCall(name="restaurant_search", args={"cuisine": "Chinese", "time": "8pm"}),
        ToolCall(name="restaurant_book", args={"name": "Golden Dragon", "time": "8pm"})
    ],
    reference_topics=["Restaurant Booking", "Chinese Cuisine"]
)

# Access conversation as text
print(sample.pretty_repr())
# Output:
# Human: Book a table at the best Chinese restaurant for 8pm
# AI: Let me search for Chinese restaurants.
# Tools:
#   restaurant_search: {'cuisine': 'Chinese', 'time': '8pm'}
# ToolOutput: Found: Golden Dragon, Jade Palace
# ...

# Convert to dictionary (for serialization)
sample_dict = sample.to_dict()

# Get list of available features
features = sample.get_features()
# ['user_input', 'reference', 'reference_tool_calls', 'reference_topics']

Validation Example

from ragas.dataset_schema import MultiTurnSample
from ragas.messages import HumanMessage, AIMessage, ToolMessage

# This will raise a ValueError because ToolMessage appears
# before any AIMessage in the conversation
try:
    sample = MultiTurnSample(
        user_input=[
            HumanMessage(content="Hello"),
            ToolMessage(content="Some tool output"),  # Invalid: no preceding AIMessage
        ]
    )
except ValueError as e:
    print(e)  # "ToolMessage must be preceded by an AIMessage..."

# This will raise a ValueError because the AIMessage before
# the ToolMessage has no tool_calls
try:
    sample = MultiTurnSample(
        user_input=[
            HumanMessage(content="Hello"),
            AIMessage(content="Hi there!"),  # No tool_calls
            ToolMessage(content="Some tool output"),  # Invalid: preceding AI has no tool_calls
        ]
    )
except ValueError as e:
    print(e)  # "ToolMessage must follow an AIMessage where tools were called."

Integration with EvaluationDataset

MultiTurnSample instances are typically collected into an EvaluationDataset:

from ragas.dataset_schema import EvaluationDataset

dataset = EvaluationDataset(samples=[sample1, sample2, sample3])

# Check if dataset contains multi-turn samples
assert dataset.is_multi_turn()

# Convert to pandas DataFrame
df = dataset.to_pandas()

# Export to JSONL
dataset.to_jsonl("evaluation_data.jsonl")

Internal Dependencies

  • ragas.dataset_schema.BaseSample -- parent class providing common methods (to_dict, get_features, to_string)
  • ragas.messages.HumanMessage, ragas.messages.AIMessage, ragas.messages.ToolMessage -- typed message classes
  • ragas.messages.ToolCall -- tool call data type used in reference_tool_calls and within AIMessage
  • pydantic.BaseModel -- provides validation, serialization, and schema generation

Implements

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment