Implementation:Online ml River Stream Iter Libsvm
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Data_Streaming, File_Formats, Sparse_Data |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Iterates over datasets stored in LIBSVM format, a popular sparse data representation used in machine learning.
Description
The iter_libsvm function reads files in LIBSVM format, where each line represents a sample with a target value followed by sparse feature-value pairs. This format is widely used for storing large sparse datasets, especially in text classification and recommender systems. Only numerical features are supported, but feature names are treated as strings.
Usage
Use this when working with datasets in LIBSVM/SVMlight format, particularly large sparse datasets from text mining, recommendation systems, or datasets from LIBSVM's repository. The sparse format saves memory and disk space when most feature values are zero.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/stream/iter_libsvm.py
Signature
def iter_libsvm(
filepath_or_buffer: str,
target_type=float,
compression="infer"
) -> base.typing.Stream:
...
Import
from river import stream
I/O Contract
| Parameter | Type | Description |
|---|---|---|
| filepath_or_buffer | str or buffer | Path to file or buffer with read method |
| target_type | type | Type to cast target values (default: float) |
| compression | str | Decompression method ('infer', 'gz', 'zip') |
Returns:
| Type | Description |
|---|---|
| Iterator[(dict, Any)] | Stream of (sparse features dict, target) tuples |
Usage Examples
import io
from river import stream
# Create LIBSVM format data
# Format: target feature:value feature:value ...
libsvm_data = io.StringIO('''+1 x:-134.26 y:0.2563
1 x:-12 z:0.3
-1 y:.25
''')
# Iterate with integer targets
for x, y in stream.iter_libsvm(libsvm_data, target_type=int):
print(f"Target: {y}, Features: {x}")
# Output:
# Target: 1, Features: {'x': -134.26, 'y': 0.2563}
# Target: 1, Features: {'x': -12.0, 'z': 0.3}
# Target: -1, Features: {'y': 0.25}
# Example with file
with open('sparse_data.libsvm', 'w') as f:
f.write("+1 1:0.5 3:1.2 5:0.8\n")
f.write("-1 2:0.3 4:0.7\n")
f.write("+1 1:0.9 2:0.1 3:0.4\n")
# Read and process
for features, label in stream.iter_libsvm('sparse_data.libsvm', target_type=int):
print(f"Label: {label:+d}")
print(f"Active features: {list(features.keys())}")
print(f"Values: {features}")
print()
# Cleanup
import os
os.remove('sparse_data.libsvm')
# The format supports comments (lines starting with #)
# and empty lines which are ignored