Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas RubricsScoreV2

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

Evaluates LLM responses using customizable domain-specific rubrics with a 1-5 scoring scale, supporting both reference-free and reference-based evaluation modes.

Description

The DomainSpecificRubrics metric (V2 collections implementation) evaluates responses by having an LLM score them against user-defined or default rubric criteria on a 1 to 5 scale. The metric supports both reference-free and reference-based evaluation.

The evaluation process works as follows: 1. A RubricScoreInput is constructed containing the user_input, response, and optionally reference, retrieved_contexts, and reference_contexts. 2. The RubricScorePrompt is formatted with the rubric criteria and sent to the LLM. 3. The LLM returns a RubricScoreOutput containing a numeric score (1-5) and textual feedback. 4. The score and feedback are returned as a MetricResult.

Default rubric scoring (score interpretation):

  • Score 1: Response is entirely incorrect or irrelevant
  • Score 2: Response has partial accuracy with major errors
  • Score 3: Response is mostly accurate but lacks detail
  • Score 4: Response is accurate with minor omissions
  • Score 5: Response is completely accurate and thorough

When with_reference=True, the default rubrics shift to reference-based criteria (loaded from DEFAULT_WITH_REFERENCE_RUBRICS). When with_reference=False, reference-free criteria are used (loaded from DEFAULT_REFERENCE_FREE_RUBRICS). Custom rubrics can be provided as a dictionary with keys like "score1_description" through "score5_description".

The rubric text is formatted via format_rubrics and appended to the prompt instruction at initialization time. The allowed score range is set to (1.0, 5.0).

Two convenience subclasses are provided:

  • RubricsScoreWithoutReference - equivalent to DomainSpecificRubrics(with_reference=False)
  • RubricsScoreWithReference - equivalent to DomainSpecificRubrics(with_reference=True)

Usage

Use this metric when you need flexible, criteria-based evaluation of LLM responses. It is well-suited for domain-specific evaluation where standard metrics are insufficient, such as evaluating medical advice, legal responses, or technical documentation. The custom rubrics feature allows you to define precisely what constitutes different quality levels for your use case.

This is the V2 collections version which uses modern instructor-based LLMs with structured output for reliable scoring.

Code Reference

Source Location

  • Repository: Vibrantlabsai_Ragas
  • File: src/ragas/metrics/collections/domain_specific_rubrics/metric.py

Signature

class DomainSpecificRubrics(BaseMetric):
    def __init__(
        self,
        llm: "InstructorBaseRagasLLM",
        rubrics: t.Optional[t.Dict[str, str]] = None,
        with_reference: bool = False,
        name: str = "domain_specific_rubrics",
        **kwargs,
    ): ...

    async def ascore(
        self,
        user_input: t.Optional[str] = None,
        response: t.Optional[str] = None,
        retrieved_contexts: t.Optional[t.List[str]] = None,
        reference_contexts: t.Optional[t.List[str]] = None,
        reference: t.Optional[str] = None,
    ) -> MetricResult: ...

class RubricsScoreWithoutReference(DomainSpecificRubrics):
    def __init__(
        self,
        llm: "InstructorBaseRagasLLM",
        rubrics: t.Optional[t.Dict[str, str]] = None,
        name: str = "rubrics_score_without_reference",
        **kwargs,
    ): ...

class RubricsScoreWithReference(DomainSpecificRubrics):
    def __init__(
        self,
        llm: "InstructorBaseRagasLLM",
        rubrics: t.Optional[t.Dict[str, str]] = None,
        name: str = "rubrics_score_with_reference",
        **kwargs,
    ): ...

Import

from ragas.metrics.collections import DomainSpecificRubrics, RubricsScoreWithoutReference, RubricsScoreWithReference

I/O Contract

Constructor Parameters

Name Type Required Description
llm InstructorBaseRagasLLM Yes Modern instructor-based LLM used for rubric-based evaluation
rubrics Dict[str, str] or None No Custom rubric definitions mapping score descriptions (e.g., "score1_description") to criteria text. If None, uses default rubrics based on with_reference setting
with_reference bool No Whether to use reference-based evaluation (default: False). When True, uses DEFAULT_WITH_REFERENCE_RUBRICS; when False, uses DEFAULT_REFERENCE_FREE_RUBRICS
name str No Metric name (default: "domain_specific_rubrics")

Inputs

Name Type Required Description
user_input str or None No The question or input provided to the system
response str or None No The response generated by the system
retrieved_contexts List[str] or None No Contexts retrieved for generating the response
reference_contexts List[str] or None No Reference contexts for evaluation
reference str or None No The reference/ground truth answer (used when with_reference=True)

Outputs

Name Type Description
score MetricResult (float value) Score between 1.0 and 5.0
reason str Textual feedback from the LLM explaining the score, available via result.reason

Usage Examples

Reference-Free Evaluation

from openai import AsyncOpenAI
from ragas.llms.base import llm_factory
from ragas.metrics.collections import DomainSpecificRubrics

client = AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)

metric = DomainSpecificRubrics(llm=llm)

result = await metric.ascore(
    user_input="What is the capital of France?",
    response="The capital of France is Paris.",
)
print(f"Score: {result.value}, Feedback: {result.reason}")

Reference-Based Evaluation

from ragas.metrics.collections import DomainSpecificRubrics

metric = DomainSpecificRubrics(llm=llm, with_reference=True)

result = await metric.ascore(
    user_input="What is the capital of France?",
    response="The capital of France is Paris.",
    reference="Paris is the capital and largest city of France.",
)
print(f"Score: {result.value}, Feedback: {result.reason}")

Custom Rubrics

from ragas.metrics.collections import DomainSpecificRubrics

custom_rubrics = {
    "score1_description": "Completely wrong",
    "score2_description": "Mostly wrong with some correct elements",
    "score3_description": "Partially correct",
    "score4_description": "Mostly correct with minor issues",
    "score5_description": "Fully correct and comprehensive",
}

metric = DomainSpecificRubrics(llm=llm, rubrics=custom_rubrics)

result = await metric.ascore(
    user_input="Explain photosynthesis.",
    response="Photosynthesis converts sunlight into chemical energy in plants.",
)
print(f"Score: {result.value}, Feedback: {result.reason}")

Convenience Classes

from ragas.metrics.collections import RubricsScoreWithoutReference, RubricsScoreWithReference

# Reference-free (equivalent to DomainSpecificRubrics(with_reference=False))
metric_no_ref = RubricsScoreWithoutReference(llm=llm)

# Reference-based (equivalent to DomainSpecificRubrics(with_reference=True))
metric_with_ref = RubricsScoreWithReference(llm=llm)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment