Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Evidentlyai Evidently Statistical Test Auto Selection

From Leeroopedia
Knowledge Sources
Domains Data_Drift, Statistics
Last Updated 2026-02-14 10:00 GMT

Overview

Decision framework for automatically selecting the optimal statistical test for data drift detection based on sample size, feature type, and number of unique values.

Description

Evidently uses a sophisticated decision tree to automatically select the most appropriate statistical test when no explicit test is specified by the user. The selection considers three factors: (1) the feature type (Numerical, Categorical, or Text), (2) the sample size (threshold at 1000 observations), and (3) the number of unique values (threshold at 5 for numerical features, threshold at 2 for binary detection). This heuristic encodes domain expertise about which statistical tests perform best under different data characteristics.

Usage

This heuristic applies whenever you run a data drift report or metric without explicitly specifying a statistical test via `DataDriftOptions`. Understanding this selection logic is essential for interpreting drift results and knowing when to override the default test choice.

The Insight (Rule of Thumb)

  • Small samples (n <= 1000), Numerical, > 5 unique values: Kolmogorov-Smirnov test (`ks`) — default threshold: 0.05
  • Small samples (n <= 1000), Numerical, <= 5 unique values, > 2 unique: Chi-squared test (`chisquare`) — default threshold: 0.05
  • Small samples (n <= 1000), Numerical, <= 2 unique values: Z-test (`z`) — default threshold: 0.05
  • Small samples (n <= 1000), Categorical, > 2 unique values: Chi-squared test (`chisquare`) — default threshold: 0.05
  • Small samples (n <= 1000), Categorical, <= 2 unique values: Z-test (`z`) — default threshold: 0.05
  • Large samples (n > 1000), Numerical, > 5 unique values: Wasserstein distance (`wasserstein`) — default threshold: 0.1
  • Large samples (n > 1000), Numerical, <= 5 unique values: Jensen-Shannon divergence (`jensenshannon`) — default threshold: 0.1
  • Large samples (n > 1000), Categorical: Jensen-Shannon divergence (`jensenshannon`) — default threshold: 0.1
  • Text, n <= 1000: Percentage-based text content drift (`perc_text_content_drift`) — default threshold: 0.55
  • Text, n > 1000: Absolute text content drift (`abs_text_content_drift`) — default threshold: 0.55
  • Trade-off: The 1000-sample threshold balances statistical power (parametric tests need enough data) with computational efficiency (distance-based tests scale better for large data).

Reasoning

The selection follows established statistical best practices:

Small samples (n <= 1000): Classical hypothesis tests (KS, Chi-squared, Z) are chosen because they have well-understood p-value distributions and Type I error control at small sample sizes. KS is the default for continuous numerical features because it is non-parametric and detects any kind of distributional shift. Chi-squared and Z-tests are used for low-cardinality features because they are designed for discrete distributions.

Large samples (n > 1000): Distance-based measures (Wasserstein, Jensen-Shannon) are preferred because classical hypothesis tests become overly sensitive at large sample sizes — they detect statistically significant but practically insignificant drift. Wasserstein distance measures the "earth mover's distance" which has an intuitive interpretation as the cost of transforming one distribution into another. Jensen-Shannon divergence is a symmetric, bounded measure suitable for discrete distributions.

Text features: Use a domain classifier approach (ROC AUC of a classifier trained to distinguish reference from current data). For small text datasets, percentage-based drift is more stable; for large datasets, absolute measurement is preferred.

The unique value count threshold of 5 distinguishes between truly continuous and quasi-categorical numerical features (e.g., a rating from 1-5 should be treated categorically). The binary threshold of 2 identifies features where a simpler proportions test (Z-test) is more appropriate than Chi-squared.

Code Evidence

Default statistical test selection from `src/evidently/legacy/calculations/stattests/registry.py:137-160`:

def _get_default_stattest(reference_data, current_data, feature_type):
    n_values = pd.concat([reference_data, current_data]).nunique()
    if feature_type == ColumnType.Text:
        if reference_data.shape[0] > 1000:
            return stattests.abs_text_content_drift_stat_test
        return stattests.perc_text_content_drift_stat_test
    elif reference_data.shape[0] <= 1000:
        if feature_type == ColumnType.Numerical:
            if n_values <= 5:
                return stattests.chi_stat_test if n_values > 2 else stattests.z_stat_test
            elif n_values > 5:
                return stattests.ks_stat_test
        elif feature_type == ColumnType.Categorical:
            return stattests.chi_stat_test if n_values > 2 else stattests.z_stat_test
    elif reference_data.shape[0] > 1000:
        if feature_type == ColumnType.Numerical:
            if n_values <= 5:
                return stattests.jensenshannon_stat_test
            elif n_values > 5:
                return stattests.wasserstein_stat_test
        elif feature_type == ColumnType.Categorical:
            return stattests.jensenshannon_stat_test

Default threshold of 0.05 from `src/evidently/legacy/calculations/stattests/registry.py:38`:

@dataclasses.dataclass
class StatTest:
    name: str
    display_name: str
    allowed_feature_types: List[ColumnType]
    default_threshold: float = 0.05

Text drift classifier parameters from `src/evidently/legacy/utils/data_drift_utils.py:105-114`:

def roc_auc_domain_classifier(X_train, X_test, y_train, y_test):
    pipeline = Pipeline([
        ("vectorization", TfidfVectorizer(sublinear_tf=True, max_df=0.5, stop_words="english")),
        ("classification", SGDClassifier(alpha=0.0001, max_iter=50, penalty="l1",
                                         loss="modified_huber", random_state=42)),
    ])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment