Implementation:Explodinggradients Ragas Collections DomainSpecificRubrics Metric
| Field | Value |
|---|---|
| source | Repo |
| domains | Metrics, Evaluation |
| last_updated | 2026-02-10 00:00 GMT |
Overview
DomainSpecificRubrics is a v2 class-based metric that evaluates LLM responses using customizable scoring rubrics on a 1--5 scale, with convenience subclasses RubricsScoreWithoutReference and RubricsScoreWithReference.
Description
DomainSpecificRubrics extends BaseMetric with allowed_values=(1.0, 5.0) and requires a modern InstructorBaseRagasLLM. The metric uses an LLM to evaluate a response against a set of rubric criteria, producing a score from 1 to 5 with detailed textual feedback.
The evaluation process:
- Constructs a
RubricScoreInputcontaining the user input, response, and optional reference/contexts. - Uses
RubricScorePromptto format the prompt with the rubric criteria appended. - The LLM generates a
RubricScoreOutputwith a numeric score and feedback string.
The metric supports two modes:
- Reference-free (default): Uses
DEFAULT_REFERENCE_FREE_RUBRICSto evaluate based on general quality criteria. - Reference-based (
with_reference=True): UsesDEFAULT_WITH_REFERENCE_RUBRICSto evaluate against a ground truth.
Custom rubrics can be provided as a dictionary mapping score description keys (e.g., "score1_description" through "score5_description") to criteria text.
RubricsScoreWithoutReference and RubricsScoreWithReference are convenience subclasses that preset the with_reference flag.
Usage
Instantiate with a required llm parameter, optional rubrics dict, and optional with_reference flag. The ascore method accepts all parameters as optional to accommodate both evaluation modes.
Code Reference
| Property | Value |
|---|---|
| Source Location | src/ragas/metrics/collections/domain_specific_rubrics/metric.py L1--185
|
| Signatures | class DomainSpecificRubrics(BaseMetric), class RubricsScoreWithoutReference(DomainSpecificRubrics), class RubricsScoreWithReference(DomainSpecificRubrics)
|
| Import | from ragas.metrics.collections import DomainSpecificRubrics
|
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
user_input |
Optional[str] |
No | The question or input provided to the system |
response |
Optional[str] |
No | The generated response to evaluate |
retrieved_contexts |
Optional[List[str]] |
No | Contexts retrieved for generating the response |
reference_contexts |
Optional[List[str]] |
No | Reference contexts for evaluation |
reference |
Optional[str] |
No | The reference / ground truth answer |
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
llm |
InstructorBaseRagasLLM |
(required) | Modern instructor-based LLM for rubric evaluation |
rubrics |
Optional[Dict[str, str]] |
None |
Custom rubric definitions; defaults to built-in rubrics |
with_reference |
bool |
False |
Whether to use reference-based evaluation |
name |
str |
"domain_specific_rubrics" |
Metric name |
Score Interpretation (Default Rubrics)
| Score | Meaning |
|---|---|
| 1 | Response is entirely incorrect or irrelevant |
| 2 | Response has partial accuracy with major errors |
| 3 | Response is mostly accurate but lacks detail |
| 4 | Response is accurate with minor omissions |
| 5 | Response is completely accurate and thorough |
Outputs
| Field | Type | Description |
|---|---|---|
MetricResult.value |
float |
Rubric score in range 1.0--5.0 |
MetricResult.reason |
str |
Detailed feedback from the LLM evaluation |
Usage Examples
from openai import AsyncOpenAI
from ragas.llms.base import llm_factory
from ragas.metrics.collections import DomainSpecificRubrics
# Setup
client = AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)
# Reference-free evaluation
metric = DomainSpecificRubrics(llm=llm)
result = await metric.ascore(
user_input="What is the capital of France?",
response="The capital of France is Paris."
)
print(f"Score: {result.value}, Feedback: {result.reason}")
# Reference-based evaluation
metric_ref = DomainSpecificRubrics(llm=llm, with_reference=True)
result = await metric_ref.ascore(
user_input="What is the capital of France?",
response="The capital of France is Paris.",
reference="Paris is the capital and largest city of France."
)
# Custom rubrics
custom_rubrics = {
"score1_description": "Completely wrong",
"score2_description": "Mostly wrong with some correct elements",
"score3_description": "Partially correct",
"score4_description": "Mostly correct with minor issues",
"score5_description": "Fully correct and comprehensive",
}
custom_metric = DomainSpecificRubrics(llm=llm, rubrics=custom_rubrics)
result = await custom_metric.ascore(
user_input="Explain photosynthesis.",
response="Plants convert sunlight to energy using chlorophyll."
)
# Convenience subclasses
from ragas.metrics.collections.domain_specific_rubrics.metric import (
RubricsScoreWithoutReference,
RubricsScoreWithReference,
)
no_ref = RubricsScoreWithoutReference(llm=llm)
with_ref = RubricsScoreWithReference(llm=llm)
Related Pages
- Explodinggradients_Ragas_Collections_BaseMetric_Class -- Base class for all v2 metrics
- Explodinggradients_Ragas_Collections_ContextRelevance_Metric -- Dual-judge evaluation approach
- Explodinggradients_Ragas_Collections_ResponseGroundedness_Metric -- Groundedness evaluation
- Explodinggradients_Ragas_Collections_AnswerRelevancy_Metric -- Answer relevancy evaluation