Implementation:Vibrantlabsai Ragas RubricsScore
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
RubricsScore evaluates LLM responses against domain-specific scoring rubrics using an LLM judge, supporting both single-turn and multi-turn conversations with configurable 1-to-5 scoring criteria.
Description
This metric implements rubric-based evaluation where an LLM acts as a judge, scoring responses on a 1-to-5 scale according to predefined scoring criteria. The rubrics are embedded directly into the LLM prompt instruction so the judge is aware of the scoring criteria before evaluating each sample.
The module provides two sets of default rubrics:
DEFAULT_REFERENCE_FREE_RUBRICS: Evaluates responses based solely on accuracy, clarity, and thoroughness relative to the user input, without a reference answer. Scores range from 1 (entirely incorrect) to 5 (completely accurate, clear, and thorough).
DEFAULT_WITH_REFERENCE_RUBRICS: Evaluates responses based on alignment with a provided reference answer. Scores range from 1 (entirely incorrect or irrelevant relative to reference) to 5 (fully accurate and completely aligned with reference).
The metric supports both single-turn and multi-turn evaluation. For single-turn evaluation, the prompt receives user input, response, retrieved contexts, reference, and reference contexts (all optional). For multi-turn evaluation, the full interaction is formatted using sample.pretty_repr() and passed as the user input.
Custom rubrics can be supplied as a dictionary mapping score descriptions (e.g., "score1_description" through "score5_description") to their criteria text.
Usage
Use this metric when you need flexible, criteria-driven evaluation of LLM responses. It is ideal for domain-specific evaluations where standard metrics do not capture the nuances of response quality. Provide custom rubrics tailored to your use case, or use the built-in defaults for general-purpose assessment.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/metrics/_domain_specific_rubrics.py
Signature
class RubricsScore(MetricWithLLM, SingleTurnMetric, MultiTurnMetric):
def __init__(
self,
name: str = "domain_specific_rubrics",
rubrics: t.Dict[str, str] = DEFAULT_REFERENCE_FREE_RUBRICS,
llm: t.Optional[BaseRagasLLM] = None,
required_columns: t.Optional[t.Dict[MetricType, t.Set[str]]] = None,
output_type: t.Optional[MetricOutputType] = MetricOutputType.DISCRETE,
single_turn_prompt: t.Optional[PydanticPrompt] = None,
multi_turn_prompt: t.Optional[PydanticPrompt] = None,
max_retries: int = 1,
):
Import
from ragas.metrics import RubricsScore
I/O Contract
Inputs (Single-Turn)
| Name | Type | Required | Description |
|---|---|---|---|
| user_input | str | No (optional) | The user's question or query |
| response | str | No (optional) | The LLM-generated response to evaluate |
| retrieved_contexts | List[str] | No (optional) | The retrieved contexts from the RAG pipeline |
| reference | str | No (optional) | The ground truth reference answer |
| reference_contexts | List[str] | No (optional) | The reference contexts for evaluation |
Inputs (Multi-Turn)
| Name | Type | Required | Description |
|---|---|---|---|
| user_input | str | No (optional) | The full multi-turn interaction (formatted via pretty_repr) |
| reference | str | No (optional) | The reference answer for evaluation |
Configuration
| Name | Type | Default | Description |
|---|---|---|---|
| rubrics | Dict[str, str] | DEFAULT_REFERENCE_FREE_RUBRICS | Scoring criteria mapping score keys to descriptions |
| output_type | MetricOutputType | DISCRETE | The output type of the metric (discrete integer scores) |
| max_retries | int | 1 | Maximum number of retries for LLM generation |
Outputs
| Name | Type | Description |
|---|---|---|
| score | int | A discrete integer score (typically 1-5) based on the rubric criteria |
Key Components
Prompt Models
| Class | Description |
|---|---|
| ScoreFeedback | Pydantic model with feedback (str) and score (int) fields for LLM output |
| SingleTurnInputWithoutRubric | Input model for single-turn evaluation with optional user_input, response, retrieved_contexts, reference_contexts, and reference fields |
| MultiTurnInputWithoutRubric | Input model for multi-turn evaluation with optional user_input and reference fields |
| SingleTurnPrompt | PydanticPrompt mapping SingleTurnInputWithoutRubric to ScoreFeedback |
| MultiTurnPrompt | PydanticPrompt mapping MultiTurnInputWithoutRubric to ScoreFeedback |
Rubric Injection
During initialization, the rubrics dictionary is formatted as a newline-separated list of "key: value" pairs and appended to the prompt instruction of both the single-turn and multi-turn scoring prompts. This ensures the LLM judge always has access to the scoring criteria.
Usage Examples
Basic Usage with Default Rubrics
from ragas.metrics import RubricsScore
from ragas.dataset_schema import SingleTurnSample
metric = RubricsScore()
# metric.llm = your_llm_instance
sample = SingleTurnSample(
user_input="What is photosynthesis?",
response="Photosynthesis is the process by which plants convert light energy into chemical energy."
)
# score = await metric.single_turn_ascore(sample)
# Returns an integer score from 1-5
Custom Rubrics with Reference
from ragas.metrics import RubricsScore
from ragas.metrics._domain_specific_rubrics import DEFAULT_WITH_REFERENCE_RUBRICS
metric = RubricsScore(
name="reference_rubric_eval",
rubrics=DEFAULT_WITH_REFERENCE_RUBRICS
)
# metric.llm = your_llm_instance
Multi-Turn Evaluation
from ragas.metrics import RubricsScore
from ragas.dataset_schema import MultiTurnSample
metric = RubricsScore()
# metric.llm = your_llm_instance
# multi_turn_sample = MultiTurnSample(...)
# score = await metric.multi_turn_ascore(multi_turn_sample)