Implementation:Vibrantlabsai Ragas RubricsScore

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

RubricsScore evaluates LLM responses against domain-specific scoring rubrics using an LLM judge, supporting both single-turn and multi-turn conversations with configurable 1-to-5 scoring criteria.

Description

This metric implements rubric-based evaluation where an LLM acts as a judge, scoring responses on a 1-to-5 scale according to predefined scoring criteria. The rubrics are embedded directly into the LLM prompt instruction so the judge is aware of the scoring criteria before evaluating each sample.

The module provides two sets of default rubrics:

DEFAULT_REFERENCE_FREE_RUBRICS: Evaluates responses based solely on accuracy, clarity, and thoroughness relative to the user input, without a reference answer. Scores range from 1 (entirely incorrect) to 5 (completely accurate, clear, and thorough).

DEFAULT_WITH_REFERENCE_RUBRICS: Evaluates responses based on alignment with a provided reference answer. Scores range from 1 (entirely incorrect or irrelevant relative to reference) to 5 (fully accurate and completely aligned with reference).

The metric supports both single-turn and multi-turn evaluation. For single-turn evaluation, the prompt receives user input, response, retrieved contexts, reference, and reference contexts (all optional). For multi-turn evaluation, the full interaction is formatted using sample.pretty_repr() and passed as the user input.

Custom rubrics can be supplied as a dictionary mapping score descriptions (e.g., "score1_description" through "score5_description") to their criteria text.

Usage

Use this metric when you need flexible, criteria-driven evaluation of LLM responses. It is ideal for domain-specific evaluations where standard metrics do not capture the nuances of response quality. Provide custom rubrics tailored to your use case, or use the built-in defaults for general-purpose assessment.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/_domain_specific_rubrics.py

Signature

class RubricsScore(MetricWithLLM, SingleTurnMetric, MultiTurnMetric):
    def __init__(
        self,
        name: str = "domain_specific_rubrics",
        rubrics: t.Dict[str, str] = DEFAULT_REFERENCE_FREE_RUBRICS,
        llm: t.Optional[BaseRagasLLM] = None,
        required_columns: t.Optional[t.Dict[MetricType, t.Set[str]]] = None,
        output_type: t.Optional[MetricOutputType] = MetricOutputType.DISCRETE,
        single_turn_prompt: t.Optional[PydanticPrompt] = None,
        multi_turn_prompt: t.Optional[PydanticPrompt] = None,
        max_retries: int = 1,
    ):

Import

from ragas.metrics import RubricsScore

I/O Contract

Inputs (Single-Turn)

Name	Type	Required	Description
user_input	str	No (optional)	The user's question or query
response	str	No (optional)	The LLM-generated response to evaluate
retrieved_contexts	List[str]	No (optional)	The retrieved contexts from the RAG pipeline
reference	str	No (optional)	The ground truth reference answer
reference_contexts	List[str]	No (optional)	The reference contexts for evaluation

Inputs (Multi-Turn)

Name	Type	Required	Description
user_input	str	No (optional)	The full multi-turn interaction (formatted via pretty_repr)
reference	str	No (optional)	The reference answer for evaluation

Configuration

Name	Type	Default	Description
rubrics	Dict[str, str]	DEFAULT_REFERENCE_FREE_RUBRICS	Scoring criteria mapping score keys to descriptions
output_type	MetricOutputType	DISCRETE	The output type of the metric (discrete integer scores)
max_retries	int	1	Maximum number of retries for LLM generation

Outputs

Name	Type	Description
score	int	A discrete integer score (typically 1-5) based on the rubric criteria

Key Components

Prompt Models

Class	Description
ScoreFeedback	Pydantic model with feedback (str) and score (int) fields for LLM output
SingleTurnInputWithoutRubric	Input model for single-turn evaluation with optional user_input, response, retrieved_contexts, reference_contexts, and reference fields
MultiTurnInputWithoutRubric	Input model for multi-turn evaluation with optional user_input and reference fields
SingleTurnPrompt	PydanticPrompt mapping SingleTurnInputWithoutRubric to ScoreFeedback
MultiTurnPrompt	PydanticPrompt mapping MultiTurnInputWithoutRubric to ScoreFeedback

Rubric Injection

During initialization, the rubrics dictionary is formatted as a newline-separated list of "key: value" pairs and appended to the prompt instruction of both the single-turn and multi-turn scoring prompts. This ensures the LLM judge always has access to the scoring criteria.

Usage Examples

Basic Usage with Default Rubrics

from ragas.metrics import RubricsScore
from ragas.dataset_schema import SingleTurnSample

metric = RubricsScore()
# metric.llm = your_llm_instance

sample = SingleTurnSample(
    user_input="What is photosynthesis?",
    response="Photosynthesis is the process by which plants convert light energy into chemical energy."
)

# score = await metric.single_turn_ascore(sample)
# Returns an integer score from 1-5

Custom Rubrics with Reference

from ragas.metrics import RubricsScore
from ragas.metrics._domain_specific_rubrics import DEFAULT_WITH_REFERENCE_RUBRICS

metric = RubricsScore(
    name="reference_rubric_eval",
    rubrics=DEFAULT_WITH_REFERENCE_RUBRICS
)
# metric.llm = your_llm_instance

Multi-Turn Evaluation

from ragas.metrics import RubricsScore
from ragas.dataset_schema import MultiTurnSample

metric = RubricsScore()
# metric.llm = your_llm_instance

# multi_turn_sample = MultiTurnSample(...)
# score = await metric.multi_turn_ascore(multi_turn_sample)

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment