Implementation:Online ml River Stream Iter Arff
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Data_Streaming, File_Formats |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Iterates over rows from ARFF (Attribute-Relation File Format) files, supporting both dense and sparse data representations.
Description
The iter_arff function reads ARFF files, which are commonly used in the machine learning community, particularly with Weka. It supports numerical and categorical features, sparse data representations, and handles missing values (indicated by '?'). The function uses scipy's ARFF reader internally and yields (features, target) tuples for online learning.
Usage
Use this when working with datasets in ARFF format, especially those from the OpenML repository or created with Weka. It's particularly useful for handling sparse datasets where explicitly storing all zeros would be memory-intensive.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/stream/iter_arff.py
Signature
def iter_arff(
filepath_or_buffer,
target: str | list[str] | None = None,
compression="infer",
sparse=False
) -> base.typing.Stream:
...
Import
from river import stream
I/O Contract
| Parameter | Type | Description |
|---|---|---|
| filepath_or_buffer | str or buffer | Path to ARFF file or buffer with read method |
| target | str, list[str], or None | Name(s) of target field(s). Returns dict if list provided. |
| compression | str | Decompression method ('infer', 'gz', 'zip') |
| sparse | bool | Whether data is in sparse format |
Returns:
| Type | Description |
|---|---|
| Iterator[(dict, Any)] | Tuples of (features dict, target value) |
Usage Examples
from river import stream
# Create a simple ARFF file
arff_content = '''
@relation CarData
@attribute make {Toyota, Honda, Ford}
@attribute model string
@attribute year numeric
@attribute price numeric
@data
Toyota, Corolla, 2018, 15000
Honda, Civic, 2019, 16000
Ford, Mustang, 2020, 25000
'''
with open('cars.arff', 'w') as f:
f.write(arff_content)
# Iterate over the data
for x, y in stream.iter_arff('cars.arff', target='price'):
print(f"Features: {x}")
print(f"Target: {y}")
break
# Sparse ARFF example
sparse_content = '''
@RELATION sparse_data
@ATTRIBUTE y {0, 1}
@ATTRIBUTE x0 NUMERIC
@ATTRIBUTE x1 NUMERIC
@DATA
{0 1, 1 0.5, 2 0.8}
{0 0, 2 0.3}
'''
with open('sparse.arff', 'w') as f:
f.write(sparse_content)
# Read sparse data
for x, y in stream.iter_arff('sparse.arff', target='y', sparse=True):
print(f"Sparse features: {x}")
print(f"Target: {y}")
break
# Cleanup
import os
os.remove('cars.arff')
os.remove('sparse.arff')