Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Evidentlyai Evidently Metric Types

From Leeroopedia
Revision as of 12:29, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Evidentlyai_Evidently_Metric_Types.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains ML Monitoring, Metrics, Data Quality
Last Updated 2026-02-14 12:00 GMT

Overview

Defines the core metric type system for Evidently's v2 metric framework, including base classes for metric configurations, metric calculations, metric results, and metric tests.

Description

The metric_types module is the foundational type system underpinning Evidently's v2 metrics engine. It provides a comprehensive hierarchy of metric result types, metric configuration classes, metric calculation base classes, and a test binding system that allows automated validation of metric outputs.

Result Types:

  • MetricResult -- Abstract base class for all metric results. Contains display metadata, optional visualization widgets, and test results.
  • SingleValue -- A metric result holding a single numeric value (float or int). Used for scalar metrics like accuracy, mean, MAE.
  • ByLabelValue -- A metric result containing a dictionary mapping labels to individual SingleValue results. Used for per-class metrics like precision per label.
  • CountValue -- A metric result combining an absolute count and a share (proportion). Used for counting occurrences such as missing values or duplicates.
  • MeanStdValue -- A metric result containing mean and standard deviation. Used for statistical distribution summaries.
  • ByLabelCountValue -- A metric result combining per-label counts and per-label shares.
  • DataframeValue -- A metric result containing a pandas DataFrame. Used for tabular metric outputs such as correlation matrices.

Configuration Classes:

  • Metric -- Abstract base class for metric configurations. Each Metric subclass defines parameters and is associated with a MetricCalculation class that performs the computation.
  • SingleValueMetric, ByLabelMetric, CountMetric, MeanStdMetric, DataframeMetric, ByLabelCountMetric -- Specialized Metric base classes for each result type, each with appropriate test binding logic.
  • ColumnMetric -- A Metric base that includes a column field for column-specific metrics.

Calculation Classes:

  • MetricCalculationBase -- Abstract base providing the call() and calculate() interface. Contains result caching, widget rendering, and resolved parameter support.
  • MetricCalculation -- Binds a Metric config to a calculation, auto-registering the metric-to-calculation mapping via __init_subclass__.
  • SingleValueCalculation, ByLabelCalculation, CountCalculation, MeanStdCalculation, DataframeCalculation, ByLabelCountCalculation -- Typed calculation base classes with convenience result() methods.

Test Binding System:

  • MetricTest -- Base class for defining test conditions on metric results. Tests can be bound to specific metric fingerprints.
  • BoundTest -- A test associated with a specific metric instance and value location.
  • SingleValueBoundTest, ByLabelBoundTest, CountBoundTest, MeanStdBoundTest, DataframeBoundTest, ByLabelCountBoundTest -- Specialized bound tests for each result type.
  • MetricTestResult -- The result of running a test, including status (PASS/FAIL/WARNING/ERROR), description, and configuration.

Location and Rendering:

  • MetricConfig -- Frozen model storing metric_id and parameters.
  • MetricValueLocation -- Navigates result hierarchies to extract specific values (e.g., a specific label from a ByLabelValue).
  • DatasetType -- Enum distinguishing Current vs Reference datasets.
  • render_results(), render_widgets() -- Utility functions for HTML rendering of metric results.
  • get_default_render(), get_default_render_ref() -- Generate default widget representations for each result type.

Usage

Use this module when:

  • Defining new custom metrics by subclassing the appropriate Metric and MetricCalculation pair.
  • Consuming metric results from a Report or monitoring pipeline.
  • Binding tests to metric values for automated quality checks.
  • Building custom rendering for metric outputs.

Code Reference

Source Location

Signature

class MetricConfig(FrozenBaseModel):
    metric_id: MetricId
    params: Dict[str, Any]

class MetricResult(AutoAliasMixin, PolymorphicModel):
    display_name: str
    widget: Optional[List[BaseWidgetInfo]]
    tests: List[MetricTestResult]

class SingleValue(MetricResult):
    value: Value  # Union[float, int]

class ByLabelValue(MetricResult):
    values: Dict[Label, SingleValue]

class CountValue(MetricResult):
    count: SingleValue
    share: SingleValue

class MeanStdValue(MetricResult):
    mean: SingleValue
    std: SingleValue

class DataframeValue(MetricResult):
    value: pd.DataFrame

class Metric(AutoAliasMixin, EvidentlyBaseModel, Generic[TCalculation]):
    def to_calculation(self) -> TCalculation: ...
    def get_bound_tests(self, context: "Context") -> Sequence[BoundTest]: ...

class MetricCalculationBase(Generic[TResult]):
    def call(self, context: "Context") -> Tuple[TResult, Optional[TResult]]: ...
    def calculate(self, context, current_data, reference_data) -> TMetricResult: ...

class MetricCalculation(MetricCalculationBase[TResult], Generic[TResult, TMetric], abc.ABC):
    metric: TMetric

class MetricTest(AutoAliasMixin, EvidentlyBaseModel):
    is_critical: bool = True
    def to_test(self) -> MetricTestProto: ...
    def run(self, context, metric, value) -> MetricTestResult: ...

class BoundTest(AutoAliasMixin, EvidentlyBaseModel, Generic[TResult], ABC):
    test: MetricTest
    metric_fingerprint: Fingerprint
    def run_test(self, context, calculation, metric_result): ...

Import

from evidently.core.metric_types import (
    Metric,
    MetricCalculation,
    MetricResult,
    SingleValue,
    SingleValueMetric,
    SingleValueCalculation,
    ByLabelValue,
    ByLabelMetric,
    ByLabelCalculation,
    CountValue,
    CountMetric,
    CountCalculation,
    MeanStdValue,
    MeanStdMetric,
    MeanStdCalculation,
    DataframeValue,
    DataframeMetric,
    DataframeCalculation,
    MetricTest,
    BoundTest,
    MetricTestResult,
    MetricConfig,
    MetricValueLocation,
    DatasetType,
    ColumnMetric,
)

I/O Contract

Inputs

Name Type Required Description
context Context Yes The report context containing datasets, configuration, and metric results
current_data Dataset Yes The current/production dataset to evaluate
reference_data Optional[Dataset] No Optional reference/baseline dataset for comparison

Outputs

Name Type Description
result Tuple[TResult, Optional[TResult]] Tuple of (current_result, optional_reference_result) where TResult is a MetricResult subclass
MetricTestResult MetricTestResult Contains id, name, description, status (PASS/FAIL/WARNING/ERROR), and configuration

Usage Examples

Defining a Custom Single Value Metric

from evidently.core.metric_types import (
    SingleValueMetric, SingleValueCalculation, TMetricResult
)
from evidently.core.datasets import Dataset

class MyAccuracy(SingleValueMetric):
    column: str

class MyAccuracyCalculation(SingleValueCalculation[MyAccuracy]):
    def calculate(self, context, current_data: Dataset, reference_data=None) -> TMetricResult:
        df = current_data.as_dataframe()
        acc = (df["prediction"] == df["target"]).mean()
        return self.result(acc)

    def display_name(self) -> str:
        return f"My Accuracy ({self.metric.column})"

Running a Metric in a Report

from evidently.core.report import Report
from evidently.core.metric_types import SingleValue

report = Report([MyAccuracy(column="prediction")])
snapshot = report.run(current_dataset, reference_dataset)

Accessing Metric Results

# SingleValue result
result = snapshot.get_metric_result("metric_fingerprint_id")
if isinstance(result, SingleValue):
    print(f"Value: {result.value}")

# ByLabelValue result
if isinstance(result, ByLabelValue):
    for label in result.labels():
        sv = result.get_label_result(label)
        print(f"Label {label}: {sv.value}")

# CountValue result
if isinstance(result, CountValue):
    print(f"Count: {result.count.value}, Share: {result.share.value}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment