Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Vibrantlabsai Ragas RubricsScore

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

RubricsScore evaluates LLM responses against domain-specific scoring rubrics using an LLM judge, supporting both single-turn and multi-turn conversations with configurable 1-to-5 scoring criteria.

Description

This metric implements rubric-based evaluation where an LLM acts as a judge, scoring responses on a 1-to-5 scale according to predefined scoring criteria. The rubrics are embedded directly into the LLM prompt instruction so the judge is aware of the scoring criteria before evaluating each sample.

The module provides two sets of default rubrics:

DEFAULT_REFERENCE_FREE_RUBRICS: Evaluates responses based solely on accuracy, clarity, and thoroughness relative to the user input, without a reference answer. Scores range from 1 (entirely incorrect) to 5 (completely accurate, clear, and thorough).

DEFAULT_WITH_REFERENCE_RUBRICS: Evaluates responses based on alignment with a provided reference answer. Scores range from 1 (entirely incorrect or irrelevant relative to reference) to 5 (fully accurate and completely aligned with reference).

The metric supports both single-turn and multi-turn evaluation. For single-turn evaluation, the prompt receives user input, response, retrieved contexts, reference, and reference contexts (all optional). For multi-turn evaluation, the full interaction is formatted using sample.pretty_repr() and passed as the user input.

Custom rubrics can be supplied as a dictionary mapping score descriptions (e.g., "score1_description" through "score5_description") to their criteria text.

Usage

Use this metric when you need flexible, criteria-driven evaluation of LLM responses. It is ideal for domain-specific evaluations where standard metrics do not capture the nuances of response quality. Provide custom rubrics tailored to your use case, or use the built-in defaults for general-purpose assessment.

Code Reference

Source Location

Signature

class RubricsScore(MetricWithLLM, SingleTurnMetric, MultiTurnMetric):
    def __init__(
        self,
        name: str = "domain_specific_rubrics",
        rubrics: t.Dict[str, str] = DEFAULT_REFERENCE_FREE_RUBRICS,
        llm: t.Optional[BaseRagasLLM] = None,
        required_columns: t.Optional[t.Dict[MetricType, t.Set[str]]] = None,
        output_type: t.Optional[MetricOutputType] = MetricOutputType.DISCRETE,
        single_turn_prompt: t.Optional[PydanticPrompt] = None,
        multi_turn_prompt: t.Optional[PydanticPrompt] = None,
        max_retries: int = 1,
    ):

Import

from ragas.metrics import RubricsScore

I/O Contract

Inputs (Single-Turn)

Name Type Required Description
user_input str No (optional) The user's question or query
response str No (optional) The LLM-generated response to evaluate
retrieved_contexts List[str] No (optional) The retrieved contexts from the RAG pipeline
reference str No (optional) The ground truth reference answer
reference_contexts List[str] No (optional) The reference contexts for evaluation

Inputs (Multi-Turn)

Name Type Required Description
user_input str No (optional) The full multi-turn interaction (formatted via pretty_repr)
reference str No (optional) The reference answer for evaluation

Configuration

Name Type Default Description
rubrics Dict[str, str] DEFAULT_REFERENCE_FREE_RUBRICS Scoring criteria mapping score keys to descriptions
output_type MetricOutputType DISCRETE The output type of the metric (discrete integer scores)
max_retries int 1 Maximum number of retries for LLM generation

Outputs

Name Type Description
score int A discrete integer score (typically 1-5) based on the rubric criteria

Key Components

Prompt Models

Class Description
ScoreFeedback Pydantic model with feedback (str) and score (int) fields for LLM output
SingleTurnInputWithoutRubric Input model for single-turn evaluation with optional user_input, response, retrieved_contexts, reference_contexts, and reference fields
MultiTurnInputWithoutRubric Input model for multi-turn evaluation with optional user_input and reference fields
SingleTurnPrompt PydanticPrompt mapping SingleTurnInputWithoutRubric to ScoreFeedback
MultiTurnPrompt PydanticPrompt mapping MultiTurnInputWithoutRubric to ScoreFeedback

Rubric Injection

During initialization, the rubrics dictionary is formatted as a newline-separated list of "key: value" pairs and appended to the prompt instruction of both the single-turn and multi-turn scoring prompts. This ensures the LLM judge always has access to the scoring criteria.

Usage Examples

Basic Usage with Default Rubrics

from ragas.metrics import RubricsScore
from ragas.dataset_schema import SingleTurnSample

metric = RubricsScore()
# metric.llm = your_llm_instance

sample = SingleTurnSample(
    user_input="What is photosynthesis?",
    response="Photosynthesis is the process by which plants convert light energy into chemical energy."
)

# score = await metric.single_turn_ascore(sample)
# Returns an integer score from 1-5

Custom Rubrics with Reference

from ragas.metrics import RubricsScore
from ragas.metrics._domain_specific_rubrics import DEFAULT_WITH_REFERENCE_RUBRICS

metric = RubricsScore(
    name="reference_rubric_eval",
    rubrics=DEFAULT_WITH_REFERENCE_RUBRICS
)
# metric.llm = your_llm_instance

Multi-Turn Evaluation

from ragas.metrics import RubricsScore
from ragas.dataset_schema import MultiTurnSample

metric = RubricsScore()
# metric.llm = your_llm_instance

# multi_turn_sample = MultiTurnSample(...)
# score = await metric.multi_turn_ascore(multi_turn_sample)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment