Implementation:Evidentlyai Evidently DataDefinition
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, ML_Monitoring |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Concrete tool for mapping column types and roles in datasets provided by the Evidently library.
Description
The DataDefinition class is a Pydantic-based configuration model that maps column types (numerical, categorical, text, datetime) and roles (id, timestamp, target, prediction) in a dataset. It is the primary mechanism for telling Evidently how to interpret raw DataFrame columns. When passed to Dataset.from_pandas(), it drives metric selection, drift method choice, and task-specific evaluation.
An empty DataDefinition() triggers auto-inference from pandas dtypes. Explicit column lists override auto-inference for precise control.
Usage
Import this class whenever you need to create an Evidently Dataset from a pandas DataFrame and want explicit control over column type mapping. Use it as a required parameter for Dataset.from_pandas().
Code Reference
Source Location
- Repository: evidently
- File: src/evidently/core/datasets.py
- Lines: L367-481
Signature
class DataDefinition(BaseModel):
def __init__(
self,
id_column: Optional[str] = None,
timestamp: Optional[str] = None,
numerical_columns: Optional[List[str]] = None,
categorical_columns: Optional[List[str]] = None,
text_columns: Optional[List[str]] = None,
datetime_columns: Optional[List[str]] = None,
classification: Optional[List[Classification]] = None,
regression: Optional[List[Regression]] = None,
llm: Optional[LLMDefinition] = None,
numerical_descriptors: Optional[List[str]] = None,
categorical_descriptors: Optional[List[str]] = None,
unknown_columns: Optional[List[str]] = None,
list_columns: Optional[List[str]] = None,
test_descriptors: Optional[List[str]] = None,
ranking: Optional[List[Recsys]] = None,
service_columns: Optional[ServiceColumns] = None,
special_columns: Optional[List[SpecialColumnInfo]] = None,
embeddings: Optional[Dict[str, List[str]]] = None,
):
"""
Args:
id_column: Column name with unique identifiers.
timestamp: Column name with timestamp values.
numerical_columns: List of numerical column names.
categorical_columns: List of categorical column names.
text_columns: List of text column names.
datetime_columns: List of datetime column names.
classification: List of BinaryClassification or MulticlassClassification configs.
regression: List of Regression configs.
llm: LLM task configuration.
numerical_descriptors: List of numerical descriptor column names.
categorical_descriptors: List of categorical descriptor column names.
unknown_columns: List of unknown/unclassified column names.
list_columns: List of list/array column names.
test_descriptors: List of test descriptor column names.
ranking: List of Recsys configs.
service_columns: Service columns like trace links.
special_columns: Additional special column configurations.
embeddings: Embeddings columns definitions (name -> list of columns).
"""
Import
from evidently import DataDefinition
# or
from evidently.core.datasets import DataDefinition
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| numerical_columns | Optional[List[str]] | No | List of numerical column names |
| categorical_columns | Optional[List[str]] | No | List of categorical column names |
| text_columns | Optional[List[str]] | No | List of text column names |
| datetime_columns | Optional[List[str]] | No | List of datetime column names |
| classification | Optional[List[Classification]] | No | BinaryClassification or MulticlassClassification configs |
| regression | Optional[List[Regression]] | No | Regression task configs |
| id_column | Optional[str] | No | Column with unique identifiers |
| timestamp | Optional[str] | No | Column with timestamps |
| embeddings | Optional[Dict[str, List[str]]] | No | Embedding name to column list mapping |
Outputs
| Name | Type | Description |
|---|---|---|
| DataDefinition | DataDefinition | Configuration object passed to Dataset.from_pandas() |
Usage Examples
Basic Column Type Mapping
from evidently import DataDefinition
# Explicit column type mapping for a tabular dataset
data_definition = DataDefinition(
numerical_columns=["age", "salary", "experience"],
categorical_columns=["department", "city"],
text_columns=["review_text"],
timestamp="created_at",
id_column="user_id",
)
Auto-Inference (Empty Definition)
from evidently import DataDefinition
# Let Evidently auto-infer column types from pandas dtypes
data_definition = DataDefinition()
With Classification Task
from evidently import DataDefinition
from evidently.core.datasets import BinaryClassification
data_definition = DataDefinition(
numerical_columns=["feature_1", "feature_2"],
classification=[
BinaryClassification(
target="is_fraud",
prediction_labels="predicted_fraud",
pos_label=1,
)
],
)