Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Stream Iter Arff

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Data_Streaming, File_Formats
Last Updated 2026-02-08 16:00 GMT

Overview

Iterates over rows from ARFF (Attribute-Relation File Format) files, supporting both dense and sparse data representations.

Description

The iter_arff function reads ARFF files, which are commonly used in the machine learning community, particularly with Weka. It supports numerical and categorical features, sparse data representations, and handles missing values (indicated by '?'). The function uses scipy's ARFF reader internally and yields (features, target) tuples for online learning.

Usage

Use this when working with datasets in ARFF format, especially those from the OpenML repository or created with Weka. It's particularly useful for handling sparse datasets where explicitly storing all zeros would be memory-intensive.

Code Reference

Source Location

Signature

def iter_arff(
    filepath_or_buffer,
    target: str | list[str] | None = None,
    compression="infer",
    sparse=False
) -> base.typing.Stream:
    ...

Import

from river import stream

I/O Contract

Parameter Type Description
filepath_or_buffer str or buffer Path to ARFF file or buffer with read method
target str, list[str], or None Name(s) of target field(s). Returns dict if list provided.
compression str Decompression method ('infer', 'gz', 'zip')
sparse bool Whether data is in sparse format

Returns:

Type Description
Iterator[(dict, Any)] Tuples of (features dict, target value)

Usage Examples

from river import stream

# Create a simple ARFF file
arff_content = '''
@relation CarData
@attribute make {Toyota, Honda, Ford}
@attribute model string
@attribute year numeric
@attribute price numeric
@data
Toyota, Corolla, 2018, 15000
Honda, Civic, 2019, 16000
Ford, Mustang, 2020, 25000
'''

with open('cars.arff', 'w') as f:
    f.write(arff_content)

# Iterate over the data
for x, y in stream.iter_arff('cars.arff', target='price'):
    print(f"Features: {x}")
    print(f"Target: {y}")
    break

# Sparse ARFF example
sparse_content = '''
@RELATION sparse_data
@ATTRIBUTE y {0, 1}
@ATTRIBUTE x0 NUMERIC
@ATTRIBUTE x1 NUMERIC
@DATA
{0 1, 1 0.5, 2 0.8}
{0 0, 2 0.3}
'''

with open('sparse.arff', 'w') as f:
    f.write(sparse_content)

# Read sparse data
for x, y in stream.iter_arff('sparse.arff', target='y', sparse=True):
    print(f"Sparse features: {x}")
    print(f"Target: {y}")
    break

# Cleanup
import os
os.remove('cars.arff')
os.remove('sparse.arff')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment