Principle:Explodinggradients Ragas Functional Metric Definition

Knowledge Sources	Domains	Last Updated
explodinggradients/ragas	LLM Evaluation, Metric Design	2026-02-10

Overview

Functional Metric Definition is the principle of creating evaluation metrics from plain Python functions using decorators, where the function's parameters become metric inputs and its return value becomes the metric score.

Description

While LLM-as-judge metrics are powerful for subjective evaluation, many evaluation criteria can be expressed as straightforward Python logic: string matching, length checks, keyword detection, format validation, or any other deterministic computation. Functional Metric Definition enables these computations to be wrapped in the same metric interface as LLM-based metrics, creating a uniform evaluation API regardless of the underlying scoring mechanism.

Decorator Pattern: A decorator (such as @discrete_metric) transforms a regular Python function into a full metric object with score(), ascore(), batch_score(), and abatch_score() methods. The decorated function itself becomes the scoring logic. This eliminates the need to write boilerplate class definitions for simple metrics.

Parameter Introspection: The decorator inspects the function's signature and type annotations to automatically create a Pydantic validation model. When score() is called with keyword arguments, these are validated against the function's expected parameters before execution. This provides clear error messages for missing or mistyped inputs without any manual validation code.

Result Wrapping: The function can return a plain value (a string, number, or list) and it will be automatically wrapped in a MetricResult object. Alternatively, the function can return a MetricResult directly for full control over the value and reason fields.

Validation of Output Values: For discrete metrics, the output is validated against the allowed values list. If the function returns a value not in the allowed set, an error result is returned with a descriptive message. This ensures that metrics always produce valid, expected output categories.

Dual Callable Nature: The metric instance remains callable as the original function (via __call__) for direct invocation, while also exposing score(**kwargs) for the standard metric interface with validation. This makes the metric useful both as a standalone function and as part of an evaluation pipeline.

Usage

Use the Functional Metric Definition principle when:

Creating evaluation metrics based on deterministic Python logic (no LLM needed)
Wanting the same score()/ascore() interface as LLM-based metrics
Needing automatic input validation based on function signatures
Building metrics that can be used in both standalone and pipeline contexts
Defining custom pass/fail or multi-category checks with pure logic

Theoretical Basis

The theoretical foundation combines the Decorator Pattern with runtime introspection:

PROCEDURE create_functional_metric(func, name, allowed_values):
    1. Introspect the function:
       Extract parameter names, types, and defaults from the signature
       Determine if the function is async

    2. Create a validation model:
       Build a dynamic Pydantic model from the function's parameters
       Each parameter becomes a model field with its type and default

    3. Create the metric instance:
       Set name (from parameter or function name)
       Set allowed_values for output validation
       Store the original function reference

    4. Define the scoring workflow:
       PROCEDURE score(**kwargs):
           a. Validate inputs against the Pydantic model
           b. Execute the function with validated inputs
           c. IF result is not MetricResult:
                Wrap it: MetricResult(value=result, reason=None)
           d. Validate result.value against allowed_values
           e. Return MetricResult

    5. Return the metric instance with:
       - score(**kwargs) for synchronous evaluation
       - ascore(**kwargs) for asynchronous evaluation
       - __call__(*args, **kwargs) for direct function invocation

This design converts any Python function into a first-class metric object while preserving the function's original behavior and adding validation, error handling, and a standardized interface.

Related Pages

Implementation:Explodinggradients_Ragas_Discrete_Metric_Decorator

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment