Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Evidentlyai Evidently Dataset Creation

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, ML_Monitoring
Last Updated 2026-02-14 12:00 GMT

Overview

A data wrapping mechanism that converts raw pandas DataFrames into typed, schema-aware Dataset objects for evaluation.

Description

Dataset Creation is the process of wrapping a raw pandas.DataFrame with schema metadata (via DataDefinition) to produce an Evidently Dataset object. This Dataset object is the universal input for all Evidently evaluation operations (reports, metrics, tests).

The key transformation is binding raw tabular data with column semantics so that downstream metrics know:

  • Which columns are numerical vs. categorical vs. text
  • Which columns represent ML task targets/predictions
  • How to interpret column values for statistical testing

Without this wrapping step, Evidently's evaluation engine cannot correctly dispatch metrics or compute drift statistics.

Usage

Use this principle as the mandatory data preparation step before running any Evidently Report. It applies to every workflow: drift monitoring, model quality evaluation, text analysis, and dashboard monitoring.

Theoretical Basis

Dataset creation follows the adapter pattern from software engineering: it adapts the pandas DataFrame interface into Evidently's internal Dataset interface:

# Pseudocode: Adapter pattern
raw_data = pd.DataFrame(...)        # Generic tabular data
schema = DataDefinition(...)         # Column semantics
dataset = adapt(raw_data, schema)    # Typed, schema-aware object
report.run(dataset)                  # Evaluation engine consumes typed input

The factory method pattern (Dataset.from_pandas()) ensures consistent construction regardless of whether descriptors are applied.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment