Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Evidentlyai Evidently Legacy Data Drift Preset

From Leeroopedia
Knowledge Sources
Domains Data Quality, Drift Detection, Model Monitoring
Last Updated 2026-02-14 12:00 GMT

Overview

The DataDriftPreset class is a metric preset that bundles dataset-level drift detection, per-column drift analysis, and embeddings drift metrics into a single preset for Evidently reports.

Description

DataDriftPreset extends MetricPreset and generates a list of drift-related metrics via its generate_metrics method. The preset always includes:

  • DatasetDriftMetric -- detects whether the overall dataset has drifted, using a configurable drift_share threshold (default: 0.5, meaning drift is flagged if more than 50% of columns show drift)
  • DataDriftTable -- produces a per-column drift analysis table

Both metrics receive the full suite of statistical test configuration parameters, allowing fine-grained control over which statistical tests are used for different column types (categorical, numerical, text) and individual columns.

When embeddings data is present in the data definition, the preset additionally generates EmbeddingsDriftMetric instances for each embedding, using the helper function add_emb_drift_to_reports. Custom drift methods can be specified per embedding via the embeddings_drift_method parameter.

The preset supports extensive configuration of statistical tests:

  • Global defaults via stattest and stattest_threshold
  • Type-specific overrides via cat_stattest, num_stattest, text_stattest and their corresponding threshold parameters
  • Per-column overrides via per_column_stattest and per_column_stattest_threshold dictionaries

Usage

Use this preset when you want comprehensive data drift analysis across your dataset. It is the standard entry point for detecting feature distribution changes between reference and current data, suitable for production monitoring pipelines.

Code Reference

Source Location

Signature

class DataDriftPreset(MetricPreset):
    class Config:
        type_alias = "evidently:metric_preset:DataDriftPreset"

    columns: Optional[List[str]]
    embeddings: Optional[List[str]]
    embeddings_drift_method: Optional[Dict[str, DriftMethod]]
    drift_share: float
    stattest: Optional[PossibleStatTestType]
    cat_stattest: Optional[PossibleStatTestType]
    num_stattest: Optional[PossibleStatTestType]
    text_stattest: Optional[PossibleStatTestType]
    per_column_stattest: Optional[Dict[str, PossibleStatTestType]]
    stattest_threshold: Optional[float]
    cat_stattest_threshold: Optional[float]
    num_stattest_threshold: Optional[float]
    text_stattest_threshold: Optional[float]
    per_column_stattest_threshold: Optional[Dict[str, float]]

    def __init__(self, columns: Optional[List[str]] = None,
                 embeddings: Optional[List[str]] = None,
                 embeddings_drift_method: Optional[Dict[str, DriftMethod]] = None,
                 drift_share: float = 0.5,
                 stattest: Optional[PossibleStatTestType] = None,
                 cat_stattest: Optional[PossibleStatTestType] = None,
                 num_stattest: Optional[PossibleStatTestType] = None,
                 text_stattest: Optional[PossibleStatTestType] = None,
                 per_column_stattest: Optional[Dict[str, PossibleStatTestType]] = None,
                 stattest_threshold: Optional[float] = None,
                 cat_stattest_threshold: Optional[float] = None,
                 num_stattest_threshold: Optional[float] = None,
                 text_stattest_threshold: Optional[float] = None,
                 per_column_stattest_threshold: Optional[Dict[str, float]] = None): ...

    def generate_metrics(self, data_definition: DataDefinition,
                         additional_data: Optional[Dict[str, Any]]) -> List[AnyMetric]: ...

Import

from evidently.legacy.metric_preset.data_drift import DataDriftPreset

I/O Contract

Inputs

Name Type Required Description
columns Optional[List[str]] No Specific columns to analyze for drift. If None, all columns are analyzed.
embeddings Optional[List[str]] No List of embedding names to analyze for drift
embeddings_drift_method Optional[Dict[str, DriftMethod]] No Custom drift detection methods per embedding name
drift_share float No Fraction of drifting columns required to flag dataset-level drift (default: 0.5)
stattest Optional[PossibleStatTestType] No Default statistical test for all column types
cat_stattest Optional[PossibleStatTestType] No Statistical test override for categorical columns
num_stattest Optional[PossibleStatTestType] No Statistical test override for numerical columns
text_stattest Optional[PossibleStatTestType] No Statistical test override for text columns
per_column_stattest Optional[Dict[str, PossibleStatTestType]] No Per-column statistical test overrides
stattest_threshold Optional[float] No Default p-value threshold for drift detection
cat_stattest_threshold Optional[float] No Threshold override for categorical columns
num_stattest_threshold Optional[float] No Threshold override for numerical columns
text_stattest_threshold Optional[float] No Threshold override for text columns
per_column_stattest_threshold Optional[Dict[str, float]] No Per-column threshold overrides

Outputs

Name Type Description
metrics List[AnyMetric] A list containing DatasetDriftMetric, DataDriftTable, and optionally EmbeddingsDriftMetric instances

Usage Examples

from evidently.legacy.metric_preset.data_drift import DataDriftPreset

# Basic usage with defaults
preset = DataDriftPreset()

# With custom drift share threshold and specific columns
preset = DataDriftPreset(
    columns=["age", "income", "category"],
    drift_share=0.3
)

# With per-type statistical test configuration
preset = DataDriftPreset(
    num_stattest="ks",
    cat_stattest="chi2",
    stattest_threshold=0.05
)

# With embeddings drift detection
preset = DataDriftPreset(
    embeddings=["text_embedding"],
    embeddings_drift_method={"text_embedding": DriftMethod(name="model")}
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment