Implementation:Vibrantlabsai Ragas RubricsScoreV2

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

Evaluates LLM responses using customizable domain-specific rubrics with a 1-5 scoring scale, supporting both reference-free and reference-based evaluation modes.

Description

The DomainSpecificRubrics metric (V2 collections implementation) evaluates responses by having an LLM score them against user-defined or default rubric criteria on a 1 to 5 scale. The metric supports both reference-free and reference-based evaluation.

The evaluation process works as follows: 1. A RubricScoreInput is constructed containing the user_input, response, and optionally reference, retrieved_contexts, and reference_contexts. 2. The RubricScorePrompt is formatted with the rubric criteria and sent to the LLM. 3. The LLM returns a RubricScoreOutput containing a numeric score (1-5) and textual feedback. 4. The score and feedback are returned as a MetricResult.

Default rubric scoring (score interpretation):

Score 1: Response is entirely incorrect or irrelevant
Score 2: Response has partial accuracy with major errors
Score 3: Response is mostly accurate but lacks detail
Score 4: Response is accurate with minor omissions
Score 5: Response is completely accurate and thorough

When with_reference=True, the default rubrics shift to reference-based criteria (loaded from DEFAULT_WITH_REFERENCE_RUBRICS). When with_reference=False, reference-free criteria are used (loaded from DEFAULT_REFERENCE_FREE_RUBRICS). Custom rubrics can be provided as a dictionary with keys like "score1_description" through "score5_description".

The rubric text is formatted via format_rubrics and appended to the prompt instruction at initialization time. The allowed score range is set to (1.0, 5.0).

Two convenience subclasses are provided:

RubricsScoreWithoutReference - equivalent to DomainSpecificRubrics(with_reference=False)
RubricsScoreWithReference - equivalent to DomainSpecificRubrics(with_reference=True)

Usage

Use this metric when you need flexible, criteria-based evaluation of LLM responses. It is well-suited for domain-specific evaluation where standard metrics are insufficient, such as evaluating medical advice, legal responses, or technical documentation. The custom rubrics feature allows you to define precisely what constitutes different quality levels for your use case.

This is the V2 collections version which uses modern instructor-based LLMs with structured output for reliable scoring.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/collections/domain_specific_rubrics/metric.py

Signature

class DomainSpecificRubrics(BaseMetric):
    def __init__(
        self,
        llm: "InstructorBaseRagasLLM",
        rubrics: t.Optional[t.Dict[str, str]] = None,
        with_reference: bool = False,
        name: str = "domain_specific_rubrics",
        **kwargs,
    ): ...

    async def ascore(
        self,
        user_input: t.Optional[str] = None,
        response: t.Optional[str] = None,
        retrieved_contexts: t.Optional[t.List[str]] = None,
        reference_contexts: t.Optional[t.List[str]] = None,
        reference: t.Optional[str] = None,
    ) -> MetricResult: ...

class RubricsScoreWithoutReference(DomainSpecificRubrics):
    def __init__(
        self,
        llm: "InstructorBaseRagasLLM",
        rubrics: t.Optional[t.Dict[str, str]] = None,
        name: str = "rubrics_score_without_reference",
        **kwargs,
    ): ...

class RubricsScoreWithReference(DomainSpecificRubrics):
    def __init__(
        self,
        llm: "InstructorBaseRagasLLM",
        rubrics: t.Optional[t.Dict[str, str]] = None,
        name: str = "rubrics_score_with_reference",
        **kwargs,
    ): ...

Import

from ragas.metrics.collections import DomainSpecificRubrics, RubricsScoreWithoutReference, RubricsScoreWithReference

I/O Contract

Constructor Parameters

Name	Type	Required	Description
llm	InstructorBaseRagasLLM	Yes	Modern instructor-based LLM used for rubric-based evaluation
rubrics	Dict[str, str] or None	No	Custom rubric definitions mapping score descriptions (e.g., "score1_description") to criteria text. If None, uses default rubrics based on with_reference setting
with_reference	bool	No	Whether to use reference-based evaluation (default: False). When True, uses DEFAULT_WITH_REFERENCE_RUBRICS; when False, uses DEFAULT_REFERENCE_FREE_RUBRICS
name	str	No	Metric name (default: "domain_specific_rubrics")

Inputs

Name	Type	Required	Description
user_input	str or None	No	The question or input provided to the system
response	str or None	No	The response generated by the system
retrieved_contexts	List[str] or None	No	Contexts retrieved for generating the response
reference_contexts	List[str] or None	No	Reference contexts for evaluation
reference	str or None	No	The reference/ground truth answer (used when with_reference=True)

Outputs

Name	Type	Description
score	MetricResult (float value)	Score between 1.0 and 5.0
reason	str	Textual feedback from the LLM explaining the score, available via result.reason

Usage Examples

Reference-Free Evaluation

from openai import AsyncOpenAI
from ragas.llms.base import llm_factory
from ragas.metrics.collections import DomainSpecificRubrics

client = AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)

metric = DomainSpecificRubrics(llm=llm)

result = await metric.ascore(
    user_input="What is the capital of France?",
    response="The capital of France is Paris.",
)
print(f"Score: {result.value}, Feedback: {result.reason}")

Reference-Based Evaluation

from ragas.metrics.collections import DomainSpecificRubrics

metric = DomainSpecificRubrics(llm=llm, with_reference=True)

result = await metric.ascore(
    user_input="What is the capital of France?",
    response="The capital of France is Paris.",
    reference="Paris is the capital and largest city of France.",
)
print(f"Score: {result.value}, Feedback: {result.reason}")

Custom Rubrics

from ragas.metrics.collections import DomainSpecificRubrics

custom_rubrics = {
    "score1_description": "Completely wrong",
    "score2_description": "Mostly wrong with some correct elements",
    "score3_description": "Partially correct",
    "score4_description": "Mostly correct with minor issues",
    "score5_description": "Fully correct and comprehensive",
}

metric = DomainSpecificRubrics(llm=llm, rubrics=custom_rubrics)

result = await metric.ascore(
    user_input="Explain photosynthesis.",
    response="Photosynthesis converts sunlight into chemical energy in plants.",
)
print(f"Score: {result.value}, Feedback: {result.reason}")

Convenience Classes

from ragas.metrics.collections import RubricsScoreWithoutReference, RubricsScoreWithReference

# Reference-free (equivalent to DomainSpecificRubrics(with_reference=False))
metric_no_ref = RubricsScoreWithoutReference(llm=llm)

# Reference-based (equivalent to DomainSpecificRubrics(with_reference=True))
metric_with_ref = RubricsScoreWithReference(llm=llm)

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment