Implementation:Online ml River Stream Iter Arff

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Data_Streaming, File_Formats
Last Updated	2026-02-08 16:00 GMT

Overview

Iterates over rows from ARFF (Attribute-Relation File Format) files, supporting both dense and sparse data representations.

Description

The iter_arff function reads ARFF files, which are commonly used in the machine learning community, particularly with Weka. It supports numerical and categorical features, sparse data representations, and handles missing values (indicated by '?'). The function uses scipy's ARFF reader internally and yields (features, target) tuples for online learning.

Usage

Use this when working with datasets in ARFF format, especially those from the OpenML repository or created with Weka. It's particularly useful for handling sparse datasets where explicitly storing all zeros would be memory-intensive.

Code Reference

Source Location

Repository: Online_ml_River
File: river/stream/iter_arff.py

Signature

def iter_arff(
    filepath_or_buffer,
    target: str | list[str] | None = None,
    compression="infer",
    sparse=False
) -> base.typing.Stream:
    ...

Import

from river import stream

I/O Contract

Parameter	Type	Description
filepath_or_buffer	str or buffer	Path to ARFF file or buffer with read method
target	str, list[str], or None	Name(s) of target field(s). Returns dict if list provided.
compression	str	Decompression method ('infer', 'gz', 'zip')
sparse	bool	Whether data is in sparse format

Returns:

Type	Description
Iterator[(dict, Any)]	Tuples of (features dict, target value)

Usage Examples

from river import stream

# Create a simple ARFF file
arff_content = '''
@relation CarData
@attribute make {Toyota, Honda, Ford}
@attribute model string
@attribute year numeric
@attribute price numeric
@data
Toyota, Corolla, 2018, 15000
Honda, Civic, 2019, 16000
Ford, Mustang, 2020, 25000
'''

with open('cars.arff', 'w') as f:
    f.write(arff_content)

# Iterate over the data
for x, y in stream.iter_arff('cars.arff', target='price'):
    print(f"Features: {x}")
    print(f"Target: {y}")
    break

# Sparse ARFF example
sparse_content = '''
@RELATION sparse_data
@ATTRIBUTE y {0, 1}
@ATTRIBUTE x0 NUMERIC
@ATTRIBUTE x1 NUMERIC
@DATA
{0 1, 1 0.5, 2 0.8}
{0 0, 2 0.3}
'''

with open('sparse.arff', 'w') as f:
    f.write(sparse_content)

# Read sparse data
for x, y in stream.iter_arff('sparse.arff', target='y', sparse=True):
    print(f"Sparse features: {x}")
    print(f"Target: {y}")
    break

# Cleanup
import os
os.remove('cars.arff')
os.remove('sparse.arff')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment