Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Explodinggradients Ragas DiscreteMetric Class

From Leeroopedia


Knowledge Sources Domains Last Updated
explodinggradients/ragas LLM Evaluation, Metric Design 2026-02-10

Overview

The DiscreteMetric Class provides an LLM-based evaluation metric that classifies text outputs into predefined categorical values using structured prompts and constrained LLM generation.

Description

DiscreteMetric (lines 18-126 of src/ragas/metrics/discrete.py) is a dataclass that inherits from SimpleLLMMetric and DiscreteValidator. It represents evaluation metrics whose output is one of a finite set of allowed string values. During initialization (__post_init__), it dynamically constructs a Pydantic response model with a value field constrained to Literal[allowed_values] and a reason field. The score() and ascore() methods (inherited from SimpleLLMMetric) format the prompt template with provided keyword arguments, call the LLM with the constrained response model, and return a MetricResult containing the classification value and reasoning. The class also provides get_correlation() using Cohen's Kappa and save()/load() for metric persistence.

Usage

Use the DiscreteMetric class when:

  • Creating LLM-as-judge metrics with categorical outputs (pass/fail, good/bad/excellent, etc.)
  • Needing both a classification value and reasoning explanation from the LLM
  • Validating LLM judge performance against gold-standard human labels
  • Saving and sharing metric configurations across teams

Code Reference

Source Location: src/ragas/metrics/discrete.py, lines 18-126

Signature:

@dataclass(repr=False)
class DiscreteMetric(SimpleLLMMetric, DiscreteValidator):
    allowed_values: List[str] = field(default_factory=lambda: ["pass", "fail"])

Full constructor parameters (inherited from SimpleLLMMetric and SimpleBaseMetric):

DiscreteMetric(
    name: str,
    prompt: Optional[Union[str, Prompt]] = None,
    allowed_values: List[str] = ["pass", "fail"],
)

Import:

from ragas.metrics import DiscreteMetric

Key Methods:

Method Signature Description
score (llm: BaseRagasLLM, **kwargs) -> MetricResult Synchronously score inputs using the LLM judge (inherited from SimpleLLMMetric)
ascore (llm: BaseRagasLLM, **kwargs) -> MetricResult Asynchronously score inputs using the LLM judge (inherited from SimpleLLMMetric)
get_correlation (gold_labels: List[str], predictions: List[str]) -> float Compute Cohen's Kappa between gold labels and predictions
save (path: Optional[str] = None) -> None Serialize metric configuration to JSON (inherited from SimpleLLMMetric)
load (path: str, embedding_model: Optional = None) -> DiscreteMetric Load a metric from a JSON file (class method)
get_variables () -> List[str] Extract placeholder variable names from the prompt template
batch_score (inputs: List[Dict], **kwargs) -> List[MetricResult] Score multiple inputs sequentially
abatch_score (inputs: List[Dict], **kwargs) -> List[MetricResult] Score multiple inputs concurrently

I/O Contract

Inputs (constructor):

Parameter Type Required Description
name str Yes Name identifier for the metric
prompt Optional[Union[str, Prompt]] No Prompt template with {placeholder} variables for evaluation criteria
allowed_values List[str] No Allowed output categories (default: ["pass", "fail"])

Inputs (score/ascore):

Parameter Type Required Description
llm BaseRagasLLM or InstructorBaseRagasLLM Yes The LLM instance to use as the judge
**kwargs Any Yes Values matching the prompt template's {placeholder} variables

Outputs:

Output Type Description
Metric result MetricResult Contains .value (one of the allowed values) and .reason (LLM's explanation)
Correlation score float Cohen's Kappa agreement score (from get_correlation())

Usage Examples

Basic pass/fail metric:

from openai import OpenAI
from ragas.metrics import DiscreteMetric
from ragas.llms import llm_factory

client = OpenAI(api_key="your-api-key")
llm = llm_factory("gpt-4o-mini", client=client)

metric = DiscreteMetric(
    name="answer_correctness",
    prompt="Given the question: {question}\nAnd the answer: {answer}\n"
           "Is the answer correct? Respond with 'pass' or 'fail'.",
    allowed_values=["pass", "fail"],
)

result = metric.score(
    llm=llm,
    question="What is the capital of France?",
    answer="Paris",
)
print(result.value)   # "pass"
print(result.reason)  # "Paris is indeed the capital of France."

Multi-category metric:

from ragas.metrics import DiscreteMetric

metric = DiscreteMetric(
    name="quality_check",
    prompt="Evaluate the quality of this response: {response}. "
           "Rate as 'excellent', 'good', or 'poor'.",
    allowed_values=["excellent", "good", "poor"],
)

result = metric.score(
    llm=llm,
    response="Python is a high-level, interpreted programming language known for readability.",
)
print(result.value)   # "excellent"

Validating against gold labels:

gold_labels = ["pass", "fail", "pass", "pass", "fail"]
predictions = ["pass", "fail", "pass", "fail", "fail"]

kappa = metric.get_correlation(gold_labels, predictions)
print(f"Cohen's Kappa: {kappa:.3f}")  # e.g., 0.600

Saving and loading a metric:

# Save
metric.save("./metrics/answer_correctness.json")

# Load
loaded_metric = DiscreteMetric.load("./metrics/answer_correctness.json")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment