Implementation:Vibrantlabsai Ragas RubricsScoreV2
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
Evaluates LLM responses using customizable domain-specific rubrics with a 1-5 scoring scale, supporting both reference-free and reference-based evaluation modes.
Description
The DomainSpecificRubrics metric (V2 collections implementation) evaluates responses by having an LLM score them against user-defined or default rubric criteria on a 1 to 5 scale. The metric supports both reference-free and reference-based evaluation.
The evaluation process works as follows: 1. A RubricScoreInput is constructed containing the user_input, response, and optionally reference, retrieved_contexts, and reference_contexts. 2. The RubricScorePrompt is formatted with the rubric criteria and sent to the LLM. 3. The LLM returns a RubricScoreOutput containing a numeric score (1-5) and textual feedback. 4. The score and feedback are returned as a MetricResult.
Default rubric scoring (score interpretation):
- Score 1: Response is entirely incorrect or irrelevant
- Score 2: Response has partial accuracy with major errors
- Score 3: Response is mostly accurate but lacks detail
- Score 4: Response is accurate with minor omissions
- Score 5: Response is completely accurate and thorough
When with_reference=True, the default rubrics shift to reference-based criteria (loaded from DEFAULT_WITH_REFERENCE_RUBRICS). When with_reference=False, reference-free criteria are used (loaded from DEFAULT_REFERENCE_FREE_RUBRICS). Custom rubrics can be provided as a dictionary with keys like "score1_description" through "score5_description".
The rubric text is formatted via format_rubrics and appended to the prompt instruction at initialization time. The allowed score range is set to (1.0, 5.0).
Two convenience subclasses are provided:
- RubricsScoreWithoutReference - equivalent to DomainSpecificRubrics(with_reference=False)
- RubricsScoreWithReference - equivalent to DomainSpecificRubrics(with_reference=True)
Usage
Use this metric when you need flexible, criteria-based evaluation of LLM responses. It is well-suited for domain-specific evaluation where standard metrics are insufficient, such as evaluating medical advice, legal responses, or technical documentation. The custom rubrics feature allows you to define precisely what constitutes different quality levels for your use case.
This is the V2 collections version which uses modern instructor-based LLMs with structured output for reliable scoring.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/metrics/collections/domain_specific_rubrics/metric.py
Signature
class DomainSpecificRubrics(BaseMetric):
def __init__(
self,
llm: "InstructorBaseRagasLLM",
rubrics: t.Optional[t.Dict[str, str]] = None,
with_reference: bool = False,
name: str = "domain_specific_rubrics",
**kwargs,
): ...
async def ascore(
self,
user_input: t.Optional[str] = None,
response: t.Optional[str] = None,
retrieved_contexts: t.Optional[t.List[str]] = None,
reference_contexts: t.Optional[t.List[str]] = None,
reference: t.Optional[str] = None,
) -> MetricResult: ...
class RubricsScoreWithoutReference(DomainSpecificRubrics):
def __init__(
self,
llm: "InstructorBaseRagasLLM",
rubrics: t.Optional[t.Dict[str, str]] = None,
name: str = "rubrics_score_without_reference",
**kwargs,
): ...
class RubricsScoreWithReference(DomainSpecificRubrics):
def __init__(
self,
llm: "InstructorBaseRagasLLM",
rubrics: t.Optional[t.Dict[str, str]] = None,
name: str = "rubrics_score_with_reference",
**kwargs,
): ...
Import
from ragas.metrics.collections import DomainSpecificRubrics, RubricsScoreWithoutReference, RubricsScoreWithReference
I/O Contract
Constructor Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| llm | InstructorBaseRagasLLM | Yes | Modern instructor-based LLM used for rubric-based evaluation |
| rubrics | Dict[str, str] or None | No | Custom rubric definitions mapping score descriptions (e.g., "score1_description") to criteria text. If None, uses default rubrics based on with_reference setting |
| with_reference | bool | No | Whether to use reference-based evaluation (default: False). When True, uses DEFAULT_WITH_REFERENCE_RUBRICS; when False, uses DEFAULT_REFERENCE_FREE_RUBRICS |
| name | str | No | Metric name (default: "domain_specific_rubrics") |
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| user_input | str or None | No | The question or input provided to the system |
| response | str or None | No | The response generated by the system |
| retrieved_contexts | List[str] or None | No | Contexts retrieved for generating the response |
| reference_contexts | List[str] or None | No | Reference contexts for evaluation |
| reference | str or None | No | The reference/ground truth answer (used when with_reference=True) |
Outputs
| Name | Type | Description |
|---|---|---|
| score | MetricResult (float value) | Score between 1.0 and 5.0 |
| reason | str | Textual feedback from the LLM explaining the score, available via result.reason |
Usage Examples
Reference-Free Evaluation
from openai import AsyncOpenAI
from ragas.llms.base import llm_factory
from ragas.metrics.collections import DomainSpecificRubrics
client = AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)
metric = DomainSpecificRubrics(llm=llm)
result = await metric.ascore(
user_input="What is the capital of France?",
response="The capital of France is Paris.",
)
print(f"Score: {result.value}, Feedback: {result.reason}")
Reference-Based Evaluation
from ragas.metrics.collections import DomainSpecificRubrics
metric = DomainSpecificRubrics(llm=llm, with_reference=True)
result = await metric.ascore(
user_input="What is the capital of France?",
response="The capital of France is Paris.",
reference="Paris is the capital and largest city of France.",
)
print(f"Score: {result.value}, Feedback: {result.reason}")
Custom Rubrics
from ragas.metrics.collections import DomainSpecificRubrics
custom_rubrics = {
"score1_description": "Completely wrong",
"score2_description": "Mostly wrong with some correct elements",
"score3_description": "Partially correct",
"score4_description": "Mostly correct with minor issues",
"score5_description": "Fully correct and comprehensive",
}
metric = DomainSpecificRubrics(llm=llm, rubrics=custom_rubrics)
result = await metric.ascore(
user_input="Explain photosynthesis.",
response="Photosynthesis converts sunlight into chemical energy in plants.",
)
print(f"Score: {result.value}, Feedback: {result.reason}")
Convenience Classes
from ragas.metrics.collections import RubricsScoreWithoutReference, RubricsScoreWithReference
# Reference-free (equivalent to DomainSpecificRubrics(with_reference=False))
metric_no_ref = RubricsScoreWithoutReference(llm=llm)
# Reference-based (equivalent to DomainSpecificRubrics(with_reference=True))
metric_with_ref = RubricsScoreWithReference(llm=llm)