Principle:Evidentlyai Evidently Dataset Creation

Knowledge Sources	Evidently Datasets Guide Evidently
Domains	Data_Engineering, ML_Monitoring
Last Updated	2026-02-14 12:00 GMT

Overview

A data wrapping mechanism that converts raw pandas DataFrames into typed, schema-aware Dataset objects for evaluation.

Description

Dataset Creation is the process of wrapping a raw pandas.DataFrame with schema metadata (via DataDefinition) to produce an Evidently Dataset object. This Dataset object is the universal input for all Evidently evaluation operations (reports, metrics, tests).

The key transformation is binding raw tabular data with column semantics so that downstream metrics know:

Which columns are numerical vs. categorical vs. text
Which columns represent ML task targets/predictions
How to interpret column values for statistical testing

Without this wrapping step, Evidently's evaluation engine cannot correctly dispatch metrics or compute drift statistics.

Usage

Use this principle as the mandatory data preparation step before running any Evidently Report. It applies to every workflow: drift monitoring, model quality evaluation, text analysis, and dashboard monitoring.

Theoretical Basis

Dataset creation follows the adapter pattern from software engineering: it adapts the pandas DataFrame interface into Evidently's internal Dataset interface:

# Pseudocode: Adapter pattern
raw_data = pd.DataFrame(...)        # Generic tabular data
schema = DataDefinition(...)         # Column semantics
dataset = adapt(raw_data, schema)    # Typed, schema-aware object
report.run(dataset)                  # Evaluation engine consumes typed input

The factory method pattern (Dataset.from_pandas()) ensures consistent construction regardless of whether descriptors are applied.

Related Pages

Implemented By

Implementation:Evidentlyai_Evidently_Dataset_From_Pandas

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment