Implementation:Evidentlyai Evidently Dataset From Pandas
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, ML_Monitoring |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Concrete factory method for creating Evidently Dataset objects from pandas DataFrames provided by the Evidently library.
Description
Dataset.from_pandas() is a classmethod factory that wraps a pandas.DataFrame with a DataDefinition to produce a PandasDataset instance. If descriptors are provided, they are computed and appended as new columns during construction.
This is the primary entry point for all data in Evidently's evaluation pipeline. It is used across every workflow: drift monitoring, model quality, text evaluation, and LLM assessment.
Usage
Import and call this method whenever you need to prepare data for any Evidently Report.run() call. Use it for both reference and current datasets.
Code Reference
Source Location
- Repository: evidently
- File: src/evidently/core/datasets.py
- Lines: L1243-1276
Signature
class Dataset:
@classmethod
def from_pandas(
cls,
data: pd.DataFrame,
data_definition: Optional[DataDefinition] = None,
descriptors: Optional[List[Descriptor]] = None,
options: AnyOptions = None,
metadata: Optional[Dict[str, MetadataValueType]] = None,
tags: Optional[List[str]] = None,
) -> "Dataset":
"""
Args:
data: pandas.DataFrame with your data.
data_definition: Optional DataDefinition for column mapping (auto-inferred if None).
descriptors: Optional list of descriptors to compute and add to dataset.
options: Optional options for descriptor computation.
metadata: Optional metadata dictionary.
tags: Optional list of tags.
Returns:
Dataset object ready for use with Report.run().
"""
Import
from evidently import Dataset
# or
from evidently.core.datasets import Dataset
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | pd.DataFrame | Yes | Source pandas DataFrame with your data |
| data_definition | Optional[DataDefinition] | No | Column type/role mapping (auto-inferred if None) |
| descriptors | Optional[List[Descriptor]] | No | Row-level descriptors to compute and append |
| options | AnyOptions | No | Options for descriptor computation |
| metadata | Optional[Dict[str, MetadataValueType]] | No | Metadata key-value pairs |
| tags | Optional[List[str]] | No | Tags for categorization |
Outputs
| Name | Type | Description |
|---|---|---|
| return value | Dataset | Schema-aware dataset ready for Report.run() |
Usage Examples
Basic Dataset Creation
import pandas as pd
from evidently import Dataset, DataDefinition
df = pd.read_csv("data.csv")
# With explicit schema
data_def = DataDefinition(
numerical_columns=["age", "salary"],
categorical_columns=["department"],
)
dataset = Dataset.from_pandas(df, data_definition=data_def)
Auto-Inferred Schema
from evidently import Dataset, DataDefinition
# Let Evidently auto-infer column types
dataset = Dataset.from_pandas(df, data_definition=DataDefinition())
With Reference and Current Datasets
from evidently import Dataset, DataDefinition
data_def = DataDefinition(
numerical_columns=["feature_1", "feature_2", "feature_3"],
)
reference = Dataset.from_pandas(df_reference, data_definition=data_def)
current = Dataset.from_pandas(df_current, data_definition=data_def)
# Both datasets ready for report.run(current, reference)