Implementation:Online ml River Stream Iter Libsvm

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Data_Streaming, File_Formats, Sparse_Data
Last Updated	2026-02-08 16:00 GMT

Overview

Iterates over datasets stored in LIBSVM format, a popular sparse data representation used in machine learning.

Description

The iter_libsvm function reads files in LIBSVM format, where each line represents a sample with a target value followed by sparse feature-value pairs. This format is widely used for storing large sparse datasets, especially in text classification and recommender systems. Only numerical features are supported, but feature names are treated as strings.

Usage

Use this when working with datasets in LIBSVM/SVMlight format, particularly large sparse datasets from text mining, recommendation systems, or datasets from LIBSVM's repository. The sparse format saves memory and disk space when most feature values are zero.

Code Reference

Source Location

Repository: Online_ml_River
File: river/stream/iter_libsvm.py

Signature

def iter_libsvm(
    filepath_or_buffer: str,
    target_type=float,
    compression="infer"
) -> base.typing.Stream:
    ...

Import

from river import stream

I/O Contract

Parameter	Type	Description
filepath_or_buffer	str or buffer	Path to file or buffer with read method
target_type	type	Type to cast target values (default: float)
compression	str	Decompression method ('infer', 'gz', 'zip')

Returns:

Type	Description
Iterator[(dict, Any)]	Stream of (sparse features dict, target) tuples

Usage Examples

import io
from river import stream

# Create LIBSVM format data
# Format: target feature:value feature:value ...
libsvm_data = io.StringIO('''+1 x:-134.26 y:0.2563
1 x:-12 z:0.3
-1 y:.25
''')

# Iterate with integer targets
for x, y in stream.iter_libsvm(libsvm_data, target_type=int):
    print(f"Target: {y}, Features: {x}")
# Output:
# Target: 1, Features: {'x': -134.26, 'y': 0.2563}
# Target: 1, Features: {'x': -12.0, 'z': 0.3}
# Target: -1, Features: {'y': 0.25}

# Example with file
with open('sparse_data.libsvm', 'w') as f:
    f.write("+1 1:0.5 3:1.2 5:0.8\n")
    f.write("-1 2:0.3 4:0.7\n")
    f.write("+1 1:0.9 2:0.1 3:0.4\n")

# Read and process
for features, label in stream.iter_libsvm('sparse_data.libsvm', target_type=int):
    print(f"Label: {label:+d}")
    print(f"Active features: {list(features.keys())}")
    print(f"Values: {features}")
    print()

# Cleanup
import os
os.remove('sparse_data.libsvm')

# The format supports comments (lines starting with #)
# and empty lines which are ignored

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment