Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn SvmlightFormatIO

From Leeroopedia


Knowledge Sources
Domains Data Loading, File I/O
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for loading and saving datasets in SVMlight/LibSVM format provided by scikit-learn.

Description

This module implements a loader and dumper for the SVMlight text-based sparse data format. Each line represents one sample with feature:value pairs, making it efficient for sparse datasets. The module provides load_svmlight_file for reading single files, load_svmlight_files for reading multiple files with consistent feature dimensions, and dump_svmlight_file for writing data. It delegates to a fast Cython implementation for performance.

Usage

Use this module to read and write datasets in SVMlight/LibSVM format, which is the default format for svmlight and libsvm command-line tools and is commonly used for storing sparse datasets.

Code Reference

Source Location

Signature

@validate_params(...)
def load_svmlight_file(
    f,
    *,
    n_features=None,
    dtype=np.float64,
    multilabel=False,
    zero_based="auto",
    query_id=False,
    offset=0,
    length=-1,
)

def load_svmlight_files(files, *, n_features=None, ...)
def dump_svmlight_file(X, y, f, *, zero_based=True, ...)

Import

from sklearn.datasets import load_svmlight_file, dump_svmlight_file

I/O Contract

Inputs

Name Type Required Description
f str, int, PathLike, or file-like Yes File path, descriptor, or object to read from
n_features int or None No Number of features; inferred if None (default: None)
dtype numpy dtype No Data type of feature values (default: np.float64)
multilabel bool No Whether labels are multi-label (default: False)
zero_based bool or 'auto' No Whether feature indices are 0-based (default: 'auto')
query_id bool No Whether to return query IDs (default: False)
offset int No Byte offset to start reading from (default: 0)
length int No Number of bytes to read; -1 for all (default: -1)

Outputs

Name Type Description
X scipy.sparse.csr_matrix Sparse feature matrix
y numpy.ndarray Target values array
query_id numpy.ndarray Query IDs (only if query_id=True)

Usage Examples

Basic Usage

from sklearn.datasets import load_svmlight_file, dump_svmlight_file
import numpy as np
from scipy.sparse import csr_matrix

# Load a SVMlight file
X, y = load_svmlight_file("data.svmlight")
print("Shape:", X.shape, "Labels:", np.unique(y))

# Save data in SVMlight format
dump_svmlight_file(X, y, "output.svmlight")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment