Implementation:Explodinggradients Ragas DiscreteMetric Class

Knowledge Sources	Domains	Last Updated
explodinggradients/ragas	LLM Evaluation, Metric Design	2026-02-10

Overview

The DiscreteMetric Class provides an LLM-based evaluation metric that classifies text outputs into predefined categorical values using structured prompts and constrained LLM generation.

Description

DiscreteMetric (lines 18-126 of src/ragas/metrics/discrete.py) is a dataclass that inherits from SimpleLLMMetric and DiscreteValidator. It represents evaluation metrics whose output is one of a finite set of allowed string values. During initialization (__post_init__), it dynamically constructs a Pydantic response model with a value field constrained to Literal[allowed_values] and a reason field. The score() and ascore() methods (inherited from SimpleLLMMetric) format the prompt template with provided keyword arguments, call the LLM with the constrained response model, and return a MetricResult containing the classification value and reasoning. The class also provides get_correlation() using Cohen's Kappa and save()/load() for metric persistence.

Usage

Use the DiscreteMetric class when:

Creating LLM-as-judge metrics with categorical outputs (pass/fail, good/bad/excellent, etc.)
Needing both a classification value and reasoning explanation from the LLM
Validating LLM judge performance against gold-standard human labels
Saving and sharing metric configurations across teams

Code Reference

Source Location: src/ragas/metrics/discrete.py, lines 18-126

Signature:

@dataclass(repr=False)
class DiscreteMetric(SimpleLLMMetric, DiscreteValidator):
    allowed_values: List[str] = field(default_factory=lambda: ["pass", "fail"])

Full constructor parameters (inherited from SimpleLLMMetric and SimpleBaseMetric):

DiscreteMetric(
    name: str,
    prompt: Optional[Union[str, Prompt]] = None,
    allowed_values: List[str] = ["pass", "fail"],
)

Import:

from ragas.metrics import DiscreteMetric

Key Methods:

Method	Signature	Description
`score`	`(llm: BaseRagasLLM, **kwargs) -> MetricResult`	Synchronously score inputs using the LLM judge (inherited from `SimpleLLMMetric`)
`ascore`	`(llm: BaseRagasLLM, **kwargs) -> MetricResult`	Asynchronously score inputs using the LLM judge (inherited from `SimpleLLMMetric`)
`get_correlation`	`(gold_labels: List[str], predictions: List[str]) -> float`	Compute Cohen's Kappa between gold labels and predictions
`save`	`(path: Optional[str] = None) -> None`	Serialize metric configuration to JSON (inherited from `SimpleLLMMetric`)
`load`	`(path: str, embedding_model: Optional = None) -> DiscreteMetric`	Load a metric from a JSON file (class method)
`get_variables`	`() -> List[str]`	Extract placeholder variable names from the prompt template
`batch_score`	`(inputs: List[Dict], **kwargs) -> List[MetricResult]`	Score multiple inputs sequentially
`abatch_score`	`(inputs: List[Dict], **kwargs) -> List[MetricResult]`	Score multiple inputs concurrently

I/O Contract

Inputs (constructor):

Parameter	Type	Required	Description
`name`	`str`	Yes	Name identifier for the metric
`prompt`	`Optional[Union[str, Prompt]]`	No	Prompt template with `{placeholder}` variables for evaluation criteria
`allowed_values`	`List[str]`	No	Allowed output categories (default: `["pass", "fail"]`)

Inputs (score/ascore):

Parameter	Type	Required	Description
`llm`	`BaseRagasLLM` or `InstructorBaseRagasLLM`	Yes	The LLM instance to use as the judge
`**kwargs`	`Any`	Yes	Values matching the prompt template's `{placeholder}` variables

Outputs:

Output	Type	Description
Metric result	`MetricResult`	Contains `.value` (one of the allowed values) and `.reason` (LLM's explanation)
Correlation score	`float`	Cohen's Kappa agreement score (from `get_correlation()`)

Usage Examples

Basic pass/fail metric:

from openai import OpenAI
from ragas.metrics import DiscreteMetric
from ragas.llms import llm_factory

client = OpenAI(api_key="your-api-key")
llm = llm_factory("gpt-4o-mini", client=client)

metric = DiscreteMetric(
    name="answer_correctness",
    prompt="Given the question: {question}\nAnd the answer: {answer}\n"
           "Is the answer correct? Respond with 'pass' or 'fail'.",
    allowed_values=["pass", "fail"],
)

result = metric.score(
    llm=llm,
    question="What is the capital of France?",
    answer="Paris",
)
print(result.value)   # "pass"
print(result.reason)  # "Paris is indeed the capital of France."

Multi-category metric:

from ragas.metrics import DiscreteMetric

metric = DiscreteMetric(
    name="quality_check",
    prompt="Evaluate the quality of this response: {response}. "
           "Rate as 'excellent', 'good', or 'poor'.",
    allowed_values=["excellent", "good", "poor"],
)

result = metric.score(
    llm=llm,
    response="Python is a high-level, interpreted programming language known for readability.",
)
print(result.value)   # "excellent"

Validating against gold labels:

gold_labels = ["pass", "fail", "pass", "pass", "fail"]
predictions = ["pass", "fail", "pass", "fail", "fail"]

kappa = metric.get_correlation(gold_labels, predictions)
print(f"Cohen's Kappa: {kappa:.3f}")  # e.g., 0.600

Saving and loading a metric:

# Save
metric.save("./metrics/answer_correctness.json")

# Load
loaded_metric = DiscreteMetric.load("./metrics/answer_correctness.json")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment