Principle:Evidentlyai Evidently Dataset Creation
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, ML_Monitoring |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
A data wrapping mechanism that converts raw pandas DataFrames into typed, schema-aware Dataset objects for evaluation.
Description
Dataset Creation is the process of wrapping a raw pandas.DataFrame with schema metadata (via DataDefinition) to produce an Evidently Dataset object. This Dataset object is the universal input for all Evidently evaluation operations (reports, metrics, tests).
The key transformation is binding raw tabular data with column semantics so that downstream metrics know:
- Which columns are numerical vs. categorical vs. text
- Which columns represent ML task targets/predictions
- How to interpret column values for statistical testing
Without this wrapping step, Evidently's evaluation engine cannot correctly dispatch metrics or compute drift statistics.
Usage
Use this principle as the mandatory data preparation step before running any Evidently Report. It applies to every workflow: drift monitoring, model quality evaluation, text analysis, and dashboard monitoring.
Theoretical Basis
Dataset creation follows the adapter pattern from software engineering: it adapts the pandas DataFrame interface into Evidently's internal Dataset interface:
# Pseudocode: Adapter pattern
raw_data = pd.DataFrame(...) # Generic tabular data
schema = DataDefinition(...) # Column semantics
dataset = adapt(raw_data, schema) # Typed, schema-aware object
report.run(dataset) # Evaluation engine consumes typed input
The factory method pattern (Dataset.from_pandas()) ensures consistent construction regardless of whether descriptors are applied.