Implementation:Explodinggradients Ragas DiscreteMetric Class
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| explodinggradients/ragas | LLM Evaluation, Metric Design | 2026-02-10 |
Overview
The DiscreteMetric Class provides an LLM-based evaluation metric that classifies text outputs into predefined categorical values using structured prompts and constrained LLM generation.
Description
DiscreteMetric (lines 18-126 of src/ragas/metrics/discrete.py) is a dataclass that inherits from SimpleLLMMetric and DiscreteValidator. It represents evaluation metrics whose output is one of a finite set of allowed string values. During initialization (__post_init__), it dynamically constructs a Pydantic response model with a value field constrained to Literal[allowed_values] and a reason field. The score() and ascore() methods (inherited from SimpleLLMMetric) format the prompt template with provided keyword arguments, call the LLM with the constrained response model, and return a MetricResult containing the classification value and reasoning. The class also provides get_correlation() using Cohen's Kappa and save()/load() for metric persistence.
Usage
Use the DiscreteMetric class when:
- Creating LLM-as-judge metrics with categorical outputs (pass/fail, good/bad/excellent, etc.)
- Needing both a classification value and reasoning explanation from the LLM
- Validating LLM judge performance against gold-standard human labels
- Saving and sharing metric configurations across teams
Code Reference
Source Location: src/ragas/metrics/discrete.py, lines 18-126
Signature:
@dataclass(repr=False)
class DiscreteMetric(SimpleLLMMetric, DiscreteValidator):
allowed_values: List[str] = field(default_factory=lambda: ["pass", "fail"])
Full constructor parameters (inherited from SimpleLLMMetric and SimpleBaseMetric):
DiscreteMetric(
name: str,
prompt: Optional[Union[str, Prompt]] = None,
allowed_values: List[str] = ["pass", "fail"],
)
Import:
from ragas.metrics import DiscreteMetric
Key Methods:
| Method | Signature | Description |
|---|---|---|
score |
(llm: BaseRagasLLM, **kwargs) -> MetricResult |
Synchronously score inputs using the LLM judge (inherited from SimpleLLMMetric)
|
ascore |
(llm: BaseRagasLLM, **kwargs) -> MetricResult |
Asynchronously score inputs using the LLM judge (inherited from SimpleLLMMetric)
|
get_correlation |
(gold_labels: List[str], predictions: List[str]) -> float |
Compute Cohen's Kappa between gold labels and predictions |
save |
(path: Optional[str] = None) -> None |
Serialize metric configuration to JSON (inherited from SimpleLLMMetric)
|
load |
(path: str, embedding_model: Optional = None) -> DiscreteMetric |
Load a metric from a JSON file (class method) |
get_variables |
() -> List[str] |
Extract placeholder variable names from the prompt template |
batch_score |
(inputs: List[Dict], **kwargs) -> List[MetricResult] |
Score multiple inputs sequentially |
abatch_score |
(inputs: List[Dict], **kwargs) -> List[MetricResult] |
Score multiple inputs concurrently |
I/O Contract
Inputs (constructor):
| Parameter | Type | Required | Description |
|---|---|---|---|
name |
str |
Yes | Name identifier for the metric |
prompt |
Optional[Union[str, Prompt]] |
No | Prompt template with {placeholder} variables for evaluation criteria
|
allowed_values |
List[str] |
No | Allowed output categories (default: ["pass", "fail"])
|
Inputs (score/ascore):
| Parameter | Type | Required | Description |
|---|---|---|---|
llm |
BaseRagasLLM or InstructorBaseRagasLLM |
Yes | The LLM instance to use as the judge |
**kwargs |
Any |
Yes | Values matching the prompt template's {placeholder} variables
|
Outputs:
| Output | Type | Description |
|---|---|---|
| Metric result | MetricResult |
Contains .value (one of the allowed values) and .reason (LLM's explanation)
|
| Correlation score | float |
Cohen's Kappa agreement score (from get_correlation())
|
Usage Examples
Basic pass/fail metric:
from openai import OpenAI
from ragas.metrics import DiscreteMetric
from ragas.llms import llm_factory
client = OpenAI(api_key="your-api-key")
llm = llm_factory("gpt-4o-mini", client=client)
metric = DiscreteMetric(
name="answer_correctness",
prompt="Given the question: {question}\nAnd the answer: {answer}\n"
"Is the answer correct? Respond with 'pass' or 'fail'.",
allowed_values=["pass", "fail"],
)
result = metric.score(
llm=llm,
question="What is the capital of France?",
answer="Paris",
)
print(result.value) # "pass"
print(result.reason) # "Paris is indeed the capital of France."
Multi-category metric:
from ragas.metrics import DiscreteMetric
metric = DiscreteMetric(
name="quality_check",
prompt="Evaluate the quality of this response: {response}. "
"Rate as 'excellent', 'good', or 'poor'.",
allowed_values=["excellent", "good", "poor"],
)
result = metric.score(
llm=llm,
response="Python is a high-level, interpreted programming language known for readability.",
)
print(result.value) # "excellent"
Validating against gold labels:
gold_labels = ["pass", "fail", "pass", "pass", "fail"]
predictions = ["pass", "fail", "pass", "fail", "fail"]
kappa = metric.get_correlation(gold_labels, predictions)
print(f"Cohen's Kappa: {kappa:.3f}") # e.g., 0.600
Saving and loading a metric:
# Save
metric.save("./metrics/answer_correctness.json")
# Load
loaded_metric = DiscreteMetric.load("./metrics/answer_correctness.json")
Related Pages
- Principle:Explodinggradients_Ragas_LLM_as_Judge_Metric
- Environment:Explodinggradients_Ragas_Python_Runtime_Environment
- Environment:Explodinggradients_Ragas_LLM_Provider_Environment
- Environment:Explodinggradients_Ragas_Optional_Metrics_Environment
- Heuristic:Explodinggradients_Ragas_LLM_Temperature_Defaults
- Heuristic:Explodinggradients_Ragas_Failed_Metrics_Return_NaN