Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Explodinggradients Ragas Collections DomainSpecificRubrics Metric

From Leeroopedia
Revision as of 14:53, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Explodinggradients_Ragas_Collections_DomainSpecificRubrics_Metric.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Field Value
source Repo
domains Metrics, Evaluation
last_updated 2026-02-10 00:00 GMT

Overview

DomainSpecificRubrics is a v2 class-based metric that evaluates LLM responses using customizable scoring rubrics on a 1--5 scale, with convenience subclasses RubricsScoreWithoutReference and RubricsScoreWithReference.

Description

DomainSpecificRubrics extends BaseMetric with allowed_values=(1.0, 5.0) and requires a modern InstructorBaseRagasLLM. The metric uses an LLM to evaluate a response against a set of rubric criteria, producing a score from 1 to 5 with detailed textual feedback.

The evaluation process:

  1. Constructs a RubricScoreInput containing the user input, response, and optional reference/contexts.
  2. Uses RubricScorePrompt to format the prompt with the rubric criteria appended.
  3. The LLM generates a RubricScoreOutput with a numeric score and feedback string.

The metric supports two modes:

  • Reference-free (default): Uses DEFAULT_REFERENCE_FREE_RUBRICS to evaluate based on general quality criteria.
  • Reference-based (with_reference=True): Uses DEFAULT_WITH_REFERENCE_RUBRICS to evaluate against a ground truth.

Custom rubrics can be provided as a dictionary mapping score description keys (e.g., "score1_description" through "score5_description") to criteria text.

RubricsScoreWithoutReference and RubricsScoreWithReference are convenience subclasses that preset the with_reference flag.

Usage

Instantiate with a required llm parameter, optional rubrics dict, and optional with_reference flag. The ascore method accepts all parameters as optional to accommodate both evaluation modes.

Code Reference

Property Value
Source Location src/ragas/metrics/collections/domain_specific_rubrics/metric.py L1--185
Signatures class DomainSpecificRubrics(BaseMetric), class RubricsScoreWithoutReference(DomainSpecificRubrics), class RubricsScoreWithReference(DomainSpecificRubrics)
Import from ragas.metrics.collections import DomainSpecificRubrics

I/O Contract

Inputs

Parameter Type Required Description
user_input Optional[str] No The question or input provided to the system
response Optional[str] No The generated response to evaluate
retrieved_contexts Optional[List[str]] No Contexts retrieved for generating the response
reference_contexts Optional[List[str]] No Reference contexts for evaluation
reference Optional[str] No The reference / ground truth answer

Constructor Parameters

Parameter Type Default Description
llm InstructorBaseRagasLLM (required) Modern instructor-based LLM for rubric evaluation
rubrics Optional[Dict[str, str]] None Custom rubric definitions; defaults to built-in rubrics
with_reference bool False Whether to use reference-based evaluation
name str "domain_specific_rubrics" Metric name

Score Interpretation (Default Rubrics)

Score Meaning
1 Response is entirely incorrect or irrelevant
2 Response has partial accuracy with major errors
3 Response is mostly accurate but lacks detail
4 Response is accurate with minor omissions
5 Response is completely accurate and thorough

Outputs

Field Type Description
MetricResult.value float Rubric score in range 1.0--5.0
MetricResult.reason str Detailed feedback from the LLM evaluation

Usage Examples

from openai import AsyncOpenAI
from ragas.llms.base import llm_factory
from ragas.metrics.collections import DomainSpecificRubrics

# Setup
client = AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)

# Reference-free evaluation
metric = DomainSpecificRubrics(llm=llm)
result = await metric.ascore(
    user_input="What is the capital of France?",
    response="The capital of France is Paris."
)
print(f"Score: {result.value}, Feedback: {result.reason}")

# Reference-based evaluation
metric_ref = DomainSpecificRubrics(llm=llm, with_reference=True)
result = await metric_ref.ascore(
    user_input="What is the capital of France?",
    response="The capital of France is Paris.",
    reference="Paris is the capital and largest city of France."
)

# Custom rubrics
custom_rubrics = {
    "score1_description": "Completely wrong",
    "score2_description": "Mostly wrong with some correct elements",
    "score3_description": "Partially correct",
    "score4_description": "Mostly correct with minor issues",
    "score5_description": "Fully correct and comprehensive",
}
custom_metric = DomainSpecificRubrics(llm=llm, rubrics=custom_rubrics)
result = await custom_metric.ascore(
    user_input="Explain photosynthesis.",
    response="Plants convert sunlight to energy using chlorophyll."
)

# Convenience subclasses
from ragas.metrics.collections.domain_specific_rubrics.metric import (
    RubricsScoreWithoutReference,
    RubricsScoreWithReference,
)

no_ref = RubricsScoreWithoutReference(llm=llm)
with_ref = RubricsScoreWithReference(llm=llm)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment