Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Evidentlyai Evidently DataDefinition

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, ML_Monitoring
Last Updated 2026-02-14 12:00 GMT

Overview

Concrete tool for mapping column types and roles in datasets provided by the Evidently library.

Description

The DataDefinition class is a Pydantic-based configuration model that maps column types (numerical, categorical, text, datetime) and roles (id, timestamp, target, prediction) in a dataset. It is the primary mechanism for telling Evidently how to interpret raw DataFrame columns. When passed to Dataset.from_pandas(), it drives metric selection, drift method choice, and task-specific evaluation.

An empty DataDefinition() triggers auto-inference from pandas dtypes. Explicit column lists override auto-inference for precise control.

Usage

Import this class whenever you need to create an Evidently Dataset from a pandas DataFrame and want explicit control over column type mapping. Use it as a required parameter for Dataset.from_pandas().

Code Reference

Source Location

  • Repository: evidently
  • File: src/evidently/core/datasets.py
  • Lines: L367-481

Signature

class DataDefinition(BaseModel):
    def __init__(
        self,
        id_column: Optional[str] = None,
        timestamp: Optional[str] = None,
        numerical_columns: Optional[List[str]] = None,
        categorical_columns: Optional[List[str]] = None,
        text_columns: Optional[List[str]] = None,
        datetime_columns: Optional[List[str]] = None,
        classification: Optional[List[Classification]] = None,
        regression: Optional[List[Regression]] = None,
        llm: Optional[LLMDefinition] = None,
        numerical_descriptors: Optional[List[str]] = None,
        categorical_descriptors: Optional[List[str]] = None,
        unknown_columns: Optional[List[str]] = None,
        list_columns: Optional[List[str]] = None,
        test_descriptors: Optional[List[str]] = None,
        ranking: Optional[List[Recsys]] = None,
        service_columns: Optional[ServiceColumns] = None,
        special_columns: Optional[List[SpecialColumnInfo]] = None,
        embeddings: Optional[Dict[str, List[str]]] = None,
    ):
        """
        Args:
            id_column: Column name with unique identifiers.
            timestamp: Column name with timestamp values.
            numerical_columns: List of numerical column names.
            categorical_columns: List of categorical column names.
            text_columns: List of text column names.
            datetime_columns: List of datetime column names.
            classification: List of BinaryClassification or MulticlassClassification configs.
            regression: List of Regression configs.
            llm: LLM task configuration.
            numerical_descriptors: List of numerical descriptor column names.
            categorical_descriptors: List of categorical descriptor column names.
            unknown_columns: List of unknown/unclassified column names.
            list_columns: List of list/array column names.
            test_descriptors: List of test descriptor column names.
            ranking: List of Recsys configs.
            service_columns: Service columns like trace links.
            special_columns: Additional special column configurations.
            embeddings: Embeddings columns definitions (name -> list of columns).
        """

Import

from evidently import DataDefinition
# or
from evidently.core.datasets import DataDefinition

I/O Contract

Inputs

Name Type Required Description
numerical_columns Optional[List[str]] No List of numerical column names
categorical_columns Optional[List[str]] No List of categorical column names
text_columns Optional[List[str]] No List of text column names
datetime_columns Optional[List[str]] No List of datetime column names
classification Optional[List[Classification]] No BinaryClassification or MulticlassClassification configs
regression Optional[List[Regression]] No Regression task configs
id_column Optional[str] No Column with unique identifiers
timestamp Optional[str] No Column with timestamps
embeddings Optional[Dict[str, List[str]]] No Embedding name to column list mapping

Outputs

Name Type Description
DataDefinition DataDefinition Configuration object passed to Dataset.from_pandas()

Usage Examples

Basic Column Type Mapping

from evidently import DataDefinition

# Explicit column type mapping for a tabular dataset
data_definition = DataDefinition(
    numerical_columns=["age", "salary", "experience"],
    categorical_columns=["department", "city"],
    text_columns=["review_text"],
    timestamp="created_at",
    id_column="user_id",
)

Auto-Inference (Empty Definition)

from evidently import DataDefinition

# Let Evidently auto-infer column types from pandas dtypes
data_definition = DataDefinition()

With Classification Task

from evidently import DataDefinition
from evidently.core.datasets import BinaryClassification

data_definition = DataDefinition(
    numerical_columns=["feature_1", "feature_2"],
    classification=[
        BinaryClassification(
            target="is_fraud",
            prediction_labels="predicted_fraud",
            pos_label=1,
        )
    ],
)

Related Pages

Implements Principle

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment