Implementation:Online ml River Stream Iter Csv

Knowledge Sources	River River Docs
Domains	Online_Learning Data_Ingestion ETL
Last Updated	2026-02-08 16:00 GMT

Overview

Concrete tool for converting CSV files into observation-by-observation data streams with configurable type conversion, date parsing, column dropping, and random sampling.

Description

The stream.iter_csv function reads a CSV file (or buffer) and yields one (x, y) tuple at a time, where x is a feature dictionary and y is the target value (or None if no target column is specified). It supports on-the-fly type conversion via a converters dictionary, date parsing via parse_dates, column exclusion via drop, and random sub-sampling via fraction.

Under the hood, it uses a custom DictReader subclass that extends Python's csv.DictReader with Bernoulli sampling support. Compressed files (.gz, .zip) are transparently decompressed when compression="infer".

This function is the core data ingestion primitive used by all of River's built-in FileDataset classes (such as datasets.Phishing). It is also available directly for loading custom CSV data.

Usage

Import this function when you need to:

Stream a custom CSV file into a River model for training or evaluation.
Control type conversion, date parsing, or column selection during ingestion.
Sub-sample a large CSV file for rapid prototyping.
Build a custom dataset class that wraps a CSV file.

Code Reference

Source Location

File	Lines
`river/stream/iter_csv.py`	L34-L189

Signature

def iter_csv(
    filepath_or_buffer,
    target: str | list[str] | None = None,
    converters: dict | None = None,
    parse_dates: dict | None = None,
    drop: list[str] | None = None,
    drop_nones=False,
    fraction=1.0,
    compression="infer",
    seed: int | None = None,
    field_size_limit: int | None = None,
    **kwargs,
) -> base.typing.Stream

Import

from river import stream

dataset = stream.iter_csv('data.csv', target='label')

I/O Contract

Inputs

Parameter	Type	Default	Description
`filepath_or_buffer`	`str` or buffer	(required)	Path to a CSV file or a buffer with a `read` method.
`target`	list[str] \| None	`None`	Name of the target column. If a list, multiple output targets are extracted. If `None`, `y` is always `None`.
`converters`	None	`None`	Mapping of column names to callables for type conversion (e.g., `{'age': int, 'score': float}`).
`parse_dates`	None	`None`	Mapping of column names to datetime format strings for date parsing.
`drop`	None	`None`	Column names to exclude from the feature dictionary.
`drop_nones`	`bool`	`False`	Whether to drop features with `None` values.
`fraction`	`float`	`1.0`	Sampling fraction in `(0, 1]`. Values below 1.0 enable Bernoulli sampling.
`compression`	`str`	`"infer"`	Decompression method. `"infer"` detects from file extension (`.gz`, `.zip`).
`seed`	None	`None`	Random seed for deterministic sampling.
`field_size_limit`	None	`None`	Maximum field size for the CSV reader.
`**kwargs`			Additional keyword arguments passed to `csv.DictReader`.

Outputs

Output	Type	Description
Return value	`base.typing.Stream`	A generator yielding `(x: dict, y)` tuples. `x` is a dictionary of feature names to values. `y` is the target value (type depends on `converters`) or `None` if no target is specified.

Usage Examples

Basic CSV streaming with target:

from river import stream

for x, y in stream.iter_csv('data.csv', target='label'):
    print(x, y)

With type converters and date parsing:

from river import stream

params = {
    'converters': {'rating': float},
    'parse_dates': {'year': '%Y'}
}

for x, y in stream.iter_csv('tv_shows.csv', target='rating', **params):
    print(x, y)
# {'name': 'Planet Earth II', 'year': datetime.datetime(2016, 1, 1, 0, 0)} 9.5
# ...

Sub-sampling a large file:

from river import stream

# Only read ~10% of the rows, deterministically
for x, y in stream.iter_csv('large_data.csv', target='label', fraction=0.1, seed=42):
    print(x, y)

Without a target column:

from river import stream

for x, y in stream.iter_csv('features_only.csv'):
    print(x, y)
    # y is always None

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment