Implementation:Evidentlyai Evidently Legacy Data Loader

Knowledge Sources	Evidentlyai_Evidently
Domains	ML Monitoring, Data Loading, Data Pipeline
Last Updated	2026-02-14 12:00 GMT

Overview

DataLoader provides CSV data loading with configurable sampling strategies (none, nth-row, and randomized) for loading datasets into pandas DataFrames in the Evidently legacy pipeline.

Description

This module defines a data loading subsystem with three main components:

SamplingOptions (dataclass) -- configures how rows are sampled during loading:

type -- sampling strategy: "none" (load all), "nth" (every nth row), or "random" (random sampling). Default: "none".
random_seed -- seed for reproducible random sampling. Default: 1.
ratio -- probability ratio for random sampling (0.0 to 1.0). Default: 1.0.
n -- interval for nth-row sampling. Default: 1.

DataOptions (dataclass) -- configures CSV parsing:

date_column -- column name to parse as datetime. Default: "datetime".
separator -- CSV field separator. Default: ",".
header -- whether the CSV has a header row. Default: True.
column_names -- explicit column names, or None to infer from data.

DataLoader -- the main class with a single load method that:

Reads a CSV file using pd.read_csv
Applies a skiprows function based on the sampling options
Parses the date column if specified
Handles the header row based on DataOptions.header

RandomizedSkipRows -- an internal class that implements chunk-based random row selection. It generates random boolean arrays in chunks of CHUNK_SIZE (1000) rows for memory-efficient random sampling.

Internal helper functions:

_skiprows(sampling_options) -- resolves the sampling type to a callable skip function or None
__simple(sampling_options) -- creates a skip function for nth-row sampling (keeps rows where row_idx % n == 1)

Usage

Use DataLoader when loading CSV data files for Evidently analysis, particularly when you need to sample large datasets for faster processing or development iteration.

Code Reference

Source Location

Repository: Evidentlyai_Evidently
File: src/evidently/legacy/runner/loader.py

Signature

@dataclasses.dataclass
class SamplingOptions:
    type: str = "none"
    random_seed: int = 1
    ratio: float = 1.0
    n: int = 1

@dataclasses.dataclass
class DataOptions:
    date_column: str
    separator: str
    header: bool
    column_names: Optional[List[str]]

    def __init__(self, date_column="datetime", separator=",", header=True, column_names=None):
        ...

class DataLoader:
    def __init__(self): ...
    def load(
        self,
        filename: str,
        data_options: DataOptions,
        sampling_options: SamplingOptions = None,
    ) -> pd.DataFrame: ...

CHUNK_SIZE = 1000

class RandomizedSkipRows:
    def __init__(self, ratio: float, random_seed: int): ...
    def skiprows(self, row_index: int) -> bool: ...

Import

from evidently.legacy.runner.loader import DataLoader
from evidently.legacy.runner.loader import DataOptions
from evidently.legacy.runner.loader import SamplingOptions

I/O Contract

Inputs

Name	Type	Required	Description
filename	`str`	Yes	Path to the CSV file to load.
data_options	`DataOptions`	Yes	CSV parsing configuration (date column, separator, header, column names).
sampling_options	`SamplingOptions`	No	Row sampling configuration. Defaults to no sampling.

Outputs

Name	Type	Description
return	`pd.DataFrame`	A pandas DataFrame containing the loaded and optionally sampled data.

Usage Examples

from evidently.legacy.runner.loader import DataLoader, DataOptions, SamplingOptions

loader = DataLoader()

# Load entire CSV with default options
data_options = DataOptions(date_column="datetime", separator=",")
df = loader.load("data/train.csv", data_options)

# Load with nth-row sampling (every 5th row)
sampling = SamplingOptions(type="nth", n=5)
df_sampled = loader.load("data/train.csv", data_options, sampling_options=sampling)

# Load with random sampling (50% of rows)
sampling = SamplingOptions(type="random", ratio=0.5, random_seed=42)
df_random = loader.load("data/train.csv", data_options, sampling_options=sampling)

# Load a CSV without header and with custom separator
data_options = DataOptions(
    date_column=None,
    separator="\t",
    header=False,
    column_names=["col1", "col2", "col3"],
)
df = loader.load("data/raw.tsv", data_options)

Related Pages

Environment:Evidentlyai_Evidently_Python_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment