Principle:Explodinggradients Ragas Functional Metric Definition
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| explodinggradients/ragas | LLM Evaluation, Metric Design | 2026-02-10 |
Overview
Functional Metric Definition is the principle of creating evaluation metrics from plain Python functions using decorators, where the function's parameters become metric inputs and its return value becomes the metric score.
Description
While LLM-as-judge metrics are powerful for subjective evaluation, many evaluation criteria can be expressed as straightforward Python logic: string matching, length checks, keyword detection, format validation, or any other deterministic computation. Functional Metric Definition enables these computations to be wrapped in the same metric interface as LLM-based metrics, creating a uniform evaluation API regardless of the underlying scoring mechanism.
Decorator Pattern: A decorator (such as @discrete_metric) transforms a regular Python function into a full metric object with score(), ascore(), batch_score(), and abatch_score() methods. The decorated function itself becomes the scoring logic. This eliminates the need to write boilerplate class definitions for simple metrics.
Parameter Introspection: The decorator inspects the function's signature and type annotations to automatically create a Pydantic validation model. When score() is called with keyword arguments, these are validated against the function's expected parameters before execution. This provides clear error messages for missing or mistyped inputs without any manual validation code.
Result Wrapping: The function can return a plain value (a string, number, or list) and it will be automatically wrapped in a MetricResult object. Alternatively, the function can return a MetricResult directly for full control over the value and reason fields.
Validation of Output Values: For discrete metrics, the output is validated against the allowed values list. If the function returns a value not in the allowed set, an error result is returned with a descriptive message. This ensures that metrics always produce valid, expected output categories.
Dual Callable Nature: The metric instance remains callable as the original function (via __call__) for direct invocation, while also exposing score(**kwargs) for the standard metric interface with validation. This makes the metric useful both as a standalone function and as part of an evaluation pipeline.
Usage
Use the Functional Metric Definition principle when:
- Creating evaluation metrics based on deterministic Python logic (no LLM needed)
- Wanting the same
score()/ascore()interface as LLM-based metrics - Needing automatic input validation based on function signatures
- Building metrics that can be used in both standalone and pipeline contexts
- Defining custom pass/fail or multi-category checks with pure logic
Theoretical Basis
The theoretical foundation combines the Decorator Pattern with runtime introspection:
PROCEDURE create_functional_metric(func, name, allowed_values):
1. Introspect the function:
Extract parameter names, types, and defaults from the signature
Determine if the function is async
2. Create a validation model:
Build a dynamic Pydantic model from the function's parameters
Each parameter becomes a model field with its type and default
3. Create the metric instance:
Set name (from parameter or function name)
Set allowed_values for output validation
Store the original function reference
4. Define the scoring workflow:
PROCEDURE score(**kwargs):
a. Validate inputs against the Pydantic model
b. Execute the function with validated inputs
c. IF result is not MetricResult:
Wrap it: MetricResult(value=result, reason=None)
d. Validate result.value against allowed_values
e. Return MetricResult
5. Return the metric instance with:
- score(**kwargs) for synchronous evaluation
- ascore(**kwargs) for asynchronous evaluation
- __call__(*args, **kwargs) for direct function invocation
This design converts any Python function into a first-class metric object while preserving the function's original behavior and adding validation, error handling, and a standardized interface.