Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas InstanceRubrics

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

InstanceRubrics evaluates LLM responses using per-sample scoring rubrics that are provided as part of each evaluation instance, enabling fine-grained, instance-specific evaluation criteria.

Description

Unlike RubricsScore which applies a single set of rubrics across all samples, InstanceRubrics expects each evaluation sample to carry its own rubrics dictionary. This allows different scoring criteria for different questions or interaction types within the same evaluation run.

The metric works by:

  1. Extracting the rubrics from the sample's data (the "rubrics" field), raising a ValueError if rubrics are not provided for a sample.
  2. Constructing a prompt input that includes the rubrics alongside the user input, response, reference, and optionally retrieved contexts. When retrieved contexts are present, they are concatenated and appended to the user input.
  3. Generating a score using an LLM judge via a PydanticPrompt that maps the input (with rubrics) to a ScoreFeedback output containing both a feedback string and an integer score.

The metric supports both single-turn and multi-turn evaluation:

  • Single-turn: Uses SingleTurnInputWithRubric which extends the domain-specific rubrics input model by adding a required rubrics field.
  • Multi-turn: Uses MultiTurnInputWithRubric which extends the multi-turn input model. The full conversation is formatted using sample.pretty_repr() and passed with the reference and rubrics.

The key distinction from RubricsScore is that rubrics are passed as part of the prompt input data rather than being embedded in the prompt instruction. This means the LLM sees different rubrics for each sample.

Usage

Use this metric when evaluation criteria vary across samples. For example, in a dataset where some questions require factual precision while others require creative writing quality, each sample can specify its own rubric. It is also useful when subject-matter experts define per-question grading criteria.

Code Reference

Source Location

Signature

class InstanceRubrics(MetricWithLLM, SingleTurnMetric, MultiTurnMetric):
    def __init__(
        self,
        name: str = "instance_rubrics",
        llm: t.Optional[BaseRagasLLM] = None,
        required_columns: t.Optional[t.Dict[MetricType, t.Set[str]]] = None,
        output_type: t.Optional[MetricOutputType] = MetricOutputType.DISCRETE,
        single_turn_prompt: t.Optional[PydanticPrompt] = None,
        multi_turn_prompt: t.Optional[PydanticPrompt] = None,
        max_retries: int = 1,
    ):

Import

from ragas.metrics import InstanceRubrics

I/O Contract

Inputs (Single-Turn)

Name Type Required Description
rubrics Dict[str, str] Yes The per-instance scoring rubric mapping score keys to descriptions
user_input str No (optional) The user's question or query
response str No (optional) The LLM-generated response to evaluate
retrieved_contexts List[str] No (optional) The retrieved contexts; when present, concatenated and appended to user_input
reference str No (optional) The ground truth reference answer
reference_contexts List[str] No (optional) The reference contexts for evaluation

Inputs (Multi-Turn)

Name Type Required Description
rubrics Dict[str, str] Yes The per-instance scoring rubric
user_input str No (optional) The full multi-turn interaction (formatted via pretty_repr)
reference str Yes The reference answer for evaluation (asserted not None)

Outputs

Name Type Description
score int A discrete integer score based on the instance-specific rubric criteria

Key Components

Input Models

Class Parent Description
SingleTurnInputWithRubric SingleTurnInputWithoutRubric Extends the domain-specific rubrics input model by adding a required rubrics dictionary field
MultiTurnInputWithRubric MultiTurnInputWithoutRubric Extends the multi-turn input model by adding a required rubrics dictionary field

Prompt Classes

Class Description
SingleTurnPrompt PydanticPrompt mapping SingleTurnInputWithRubric to ScoreFeedback; instruction directs the LLM to score based on the rubric passed in the input
MultiTurnPrompt PydanticPrompt mapping MultiTurnInputWithRubric to ScoreFeedback; uses the same instruction pattern

Both prompt classes use the instruction: "Your task is to assign an appropriate score and provide feedback to the inputs based solely on the scoring criteria passed in the input." This distinguishes them from the domain-specific rubrics prompts where criteria are embedded in the instruction itself.

Reused Components

The module imports and extends models from _domain_specific_rubrics:

  • SingleTurnInputWithoutRubric: Base input model for single-turn evaluation
  • MultiTurnInputWithoutRubric: Base input model for multi-turn evaluation
  • ScoreFeedback: Output model containing feedback text and integer score

Usage Examples

Basic Usage with Per-Instance Rubrics

from ragas.metrics import InstanceRubrics
from ragas.dataset_schema import SingleTurnSample

metric = InstanceRubrics()
# metric.llm = your_llm_instance

sample = SingleTurnSample(
    user_input="Explain quantum entanglement in simple terms.",
    response="Quantum entanglement is when two particles become linked and instantly affect each other regardless of distance.",
    rubrics={
        "score1_description": "Explanation is incorrect or incomprehensible.",
        "score2_description": "Explanation has major inaccuracies.",
        "score3_description": "Explanation is roughly correct but unclear.",
        "score4_description": "Explanation is correct and mostly clear.",
        "score5_description": "Explanation is correct, clear, and uses effective analogies.",
    }
)

# score = await metric.single_turn_ascore(sample)

Multi-Turn Evaluation

from ragas.metrics import InstanceRubrics
from ragas.dataset_schema import MultiTurnSample

metric = InstanceRubrics()
# metric.llm = your_llm_instance

# multi_turn_sample = MultiTurnSample(
#     ...,
#     reference="expected outcome",
#     rubrics={
#         "score1_description": "Agent failed to complete the task.",
#         "score2_description": "Agent partially completed the task with errors.",
#         "score3_description": "Agent completed the task but inefficiently.",
#         "score4_description": "Agent completed the task well.",
#         "score5_description": "Agent completed the task perfectly and efficiently.",
#     }
# )
# score = await metric.multi_turn_ascore(multi_turn_sample)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment