Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Evidentlyai Evidently Dataset From Pandas

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, ML_Monitoring
Last Updated 2026-02-14 12:00 GMT

Overview

Concrete factory method for creating Evidently Dataset objects from pandas DataFrames provided by the Evidently library.

Description

Dataset.from_pandas() is a classmethod factory that wraps a pandas.DataFrame with a DataDefinition to produce a PandasDataset instance. If descriptors are provided, they are computed and appended as new columns during construction.

This is the primary entry point for all data in Evidently's evaluation pipeline. It is used across every workflow: drift monitoring, model quality, text evaluation, and LLM assessment.

Usage

Import and call this method whenever you need to prepare data for any Evidently Report.run() call. Use it for both reference and current datasets.

Code Reference

Source Location

  • Repository: evidently
  • File: src/evidently/core/datasets.py
  • Lines: L1243-1276

Signature

class Dataset:
    @classmethod
    def from_pandas(
        cls,
        data: pd.DataFrame,
        data_definition: Optional[DataDefinition] = None,
        descriptors: Optional[List[Descriptor]] = None,
        options: AnyOptions = None,
        metadata: Optional[Dict[str, MetadataValueType]] = None,
        tags: Optional[List[str]] = None,
    ) -> "Dataset":
        """
        Args:
            data: pandas.DataFrame with your data.
            data_definition: Optional DataDefinition for column mapping (auto-inferred if None).
            descriptors: Optional list of descriptors to compute and add to dataset.
            options: Optional options for descriptor computation.
            metadata: Optional metadata dictionary.
            tags: Optional list of tags.
        Returns:
            Dataset object ready for use with Report.run().
        """

Import

from evidently import Dataset
# or
from evidently.core.datasets import Dataset

I/O Contract

Inputs

Name Type Required Description
data pd.DataFrame Yes Source pandas DataFrame with your data
data_definition Optional[DataDefinition] No Column type/role mapping (auto-inferred if None)
descriptors Optional[List[Descriptor]] No Row-level descriptors to compute and append
options AnyOptions No Options for descriptor computation
metadata Optional[Dict[str, MetadataValueType]] No Metadata key-value pairs
tags Optional[List[str]] No Tags for categorization

Outputs

Name Type Description
return value Dataset Schema-aware dataset ready for Report.run()

Usage Examples

Basic Dataset Creation

import pandas as pd
from evidently import Dataset, DataDefinition

df = pd.read_csv("data.csv")

# With explicit schema
data_def = DataDefinition(
    numerical_columns=["age", "salary"],
    categorical_columns=["department"],
)
dataset = Dataset.from_pandas(df, data_definition=data_def)

Auto-Inferred Schema

from evidently import Dataset, DataDefinition

# Let Evidently auto-infer column types
dataset = Dataset.from_pandas(df, data_definition=DataDefinition())

With Reference and Current Datasets

from evidently import Dataset, DataDefinition

data_def = DataDefinition(
    numerical_columns=["feature_1", "feature_2", "feature_3"],
)

reference = Dataset.from_pandas(df_reference, data_definition=data_def)
current = Dataset.from_pandas(df_current, data_definition=data_def)

# Both datasets ready for report.run(current, reference)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment