Implementation:Vibrantlabsai Ragas InstanceRubrics

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

InstanceRubrics evaluates LLM responses using per-sample scoring rubrics that are provided as part of each evaluation instance, enabling fine-grained, instance-specific evaluation criteria.

Description

Unlike RubricsScore which applies a single set of rubrics across all samples, InstanceRubrics expects each evaluation sample to carry its own rubrics dictionary. This allows different scoring criteria for different questions or interaction types within the same evaluation run.

The metric works by:

Extracting the rubrics from the sample's data (the "rubrics" field), raising a ValueError if rubrics are not provided for a sample.
Constructing a prompt input that includes the rubrics alongside the user input, response, reference, and optionally retrieved contexts. When retrieved contexts are present, they are concatenated and appended to the user input.
Generating a score using an LLM judge via a PydanticPrompt that maps the input (with rubrics) to a ScoreFeedback output containing both a feedback string and an integer score.

The metric supports both single-turn and multi-turn evaluation:

Single-turn: Uses SingleTurnInputWithRubric which extends the domain-specific rubrics input model by adding a required rubrics field.
Multi-turn: Uses MultiTurnInputWithRubric which extends the multi-turn input model. The full conversation is formatted using sample.pretty_repr() and passed with the reference and rubrics.

The key distinction from RubricsScore is that rubrics are passed as part of the prompt input data rather than being embedded in the prompt instruction. This means the LLM sees different rubrics for each sample.

Usage

Use this metric when evaluation criteria vary across samples. For example, in a dataset where some questions require factual precision while others require creative writing quality, each sample can specify its own rubric. It is also useful when subject-matter experts define per-question grading criteria.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/_instance_specific_rubrics.py

Signature

class InstanceRubrics(MetricWithLLM, SingleTurnMetric, MultiTurnMetric):
    def __init__(
        self,
        name: str = "instance_rubrics",
        llm: t.Optional[BaseRagasLLM] = None,
        required_columns: t.Optional[t.Dict[MetricType, t.Set[str]]] = None,
        output_type: t.Optional[MetricOutputType] = MetricOutputType.DISCRETE,
        single_turn_prompt: t.Optional[PydanticPrompt] = None,
        multi_turn_prompt: t.Optional[PydanticPrompt] = None,
        max_retries: int = 1,
    ):

Import

from ragas.metrics import InstanceRubrics

I/O Contract

Inputs (Single-Turn)

Name	Type	Required	Description
rubrics	Dict[str, str]	Yes	The per-instance scoring rubric mapping score keys to descriptions
user_input	str	No (optional)	The user's question or query
response	str	No (optional)	The LLM-generated response to evaluate
retrieved_contexts	List[str]	No (optional)	The retrieved contexts; when present, concatenated and appended to user_input
reference	str	No (optional)	The ground truth reference answer
reference_contexts	List[str]	No (optional)	The reference contexts for evaluation

Inputs (Multi-Turn)

Name	Type	Required	Description
rubrics	Dict[str, str]	Yes	The per-instance scoring rubric
user_input	str	No (optional)	The full multi-turn interaction (formatted via pretty_repr)
reference	str	Yes	The reference answer for evaluation (asserted not None)

Outputs

Name	Type	Description
score	int	A discrete integer score based on the instance-specific rubric criteria

Key Components

Input Models

Class	Parent	Description
SingleTurnInputWithRubric	SingleTurnInputWithoutRubric	Extends the domain-specific rubrics input model by adding a required rubrics dictionary field
MultiTurnInputWithRubric	MultiTurnInputWithoutRubric	Extends the multi-turn input model by adding a required rubrics dictionary field

Prompt Classes

Class	Description
SingleTurnPrompt	PydanticPrompt mapping SingleTurnInputWithRubric to ScoreFeedback; instruction directs the LLM to score based on the rubric passed in the input
MultiTurnPrompt	PydanticPrompt mapping MultiTurnInputWithRubric to ScoreFeedback; uses the same instruction pattern

Both prompt classes use the instruction: "Your task is to assign an appropriate score and provide feedback to the inputs based solely on the scoring criteria passed in the input." This distinguishes them from the domain-specific rubrics prompts where criteria are embedded in the instruction itself.

Reused Components

The module imports and extends models from _domain_specific_rubrics:

SingleTurnInputWithoutRubric: Base input model for single-turn evaluation
MultiTurnInputWithoutRubric: Base input model for multi-turn evaluation
ScoreFeedback: Output model containing feedback text and integer score

Usage Examples

Basic Usage with Per-Instance Rubrics

from ragas.metrics import InstanceRubrics
from ragas.dataset_schema import SingleTurnSample

metric = InstanceRubrics()
# metric.llm = your_llm_instance

sample = SingleTurnSample(
    user_input="Explain quantum entanglement in simple terms.",
    response="Quantum entanglement is when two particles become linked and instantly affect each other regardless of distance.",
    rubrics={
        "score1_description": "Explanation is incorrect or incomprehensible.",
        "score2_description": "Explanation has major inaccuracies.",
        "score3_description": "Explanation is roughly correct but unclear.",
        "score4_description": "Explanation is correct and mostly clear.",
        "score5_description": "Explanation is correct, clear, and uses effective analogies.",
    }
)

# score = await metric.single_turn_ascore(sample)

Multi-Turn Evaluation

from ragas.metrics import InstanceRubrics
from ragas.dataset_schema import MultiTurnSample

metric = InstanceRubrics()
# metric.llm = your_llm_instance

# multi_turn_sample = MultiTurnSample(
#     ...,
#     reference="expected outcome",
#     rubrics={
#         "score1_description": "Agent failed to complete the task.",
#         "score2_description": "Agent partially completed the task with errors.",
#         "score3_description": "Agent completed the task but inefficiently.",
#         "score4_description": "Agent completed the task well.",
#         "score5_description": "Agent completed the task perfectly and efficiently.",
#     }
# )
# score = await metric.multi_turn_ascore(multi_turn_sample)

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment