Implementation:Scikit learn Scikit learn SvmlightFormatIO

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Data Loading, File I/O
Last Updated	2026-02-08 15:00 GMT

Overview

Concrete tool for loading and saving datasets in SVMlight/LibSVM format provided by scikit-learn.

Description

This module implements a loader and dumper for the SVMlight text-based sparse data format. Each line represents one sample with feature:value pairs, making it efficient for sparse datasets. The module provides load_svmlight_file for reading single files, load_svmlight_files for reading multiple files with consistent feature dimensions, and dump_svmlight_file for writing data. It delegates to a fast Cython implementation for performance.

Usage

Use this module to read and write datasets in SVMlight/LibSVM format, which is the default format for svmlight and libsvm command-line tools and is commonly used for storing sparse datasets.

Code Reference

Source Location

Repository: scikit-learn
File: sklearn/datasets/_svmlight_format_io.py

Signature

@validate_params(...)
def load_svmlight_file(
    f,
    *,
    n_features=None,
    dtype=np.float64,
    multilabel=False,
    zero_based="auto",
    query_id=False,
    offset=0,
    length=-1,
)

def load_svmlight_files(files, *, n_features=None, ...)
def dump_svmlight_file(X, y, f, *, zero_based=True, ...)

Import

from sklearn.datasets import load_svmlight_file, dump_svmlight_file

I/O Contract

Inputs

Name	Type	Required	Description
f	str, int, PathLike, or file-like	Yes	File path, descriptor, or object to read from
n_features	int or None	No	Number of features; inferred if None (default: None)
dtype	numpy dtype	No	Data type of feature values (default: np.float64)
multilabel	bool	No	Whether labels are multi-label (default: False)
zero_based	bool or 'auto'	No	Whether feature indices are 0-based (default: 'auto')
query_id	bool	No	Whether to return query IDs (default: False)
offset	int	No	Byte offset to start reading from (default: 0)
length	int	No	Number of bytes to read; -1 for all (default: -1)

Outputs

Name	Type	Description
X	scipy.sparse.csr_matrix	Sparse feature matrix
y	numpy.ndarray	Target values array
query_id	numpy.ndarray	Query IDs (only if query_id=True)

Usage Examples

Basic Usage

from sklearn.datasets import load_svmlight_file, dump_svmlight_file
import numpy as np
from scipy.sparse import csr_matrix

# Load a SVMlight file
X, y = load_svmlight_file("data.svmlight")
print("Shape:", X.shape, "Labels:", np.unique(y))

# Save data in SVMlight format
dump_svmlight_file(X, y, "output.svmlight")

Related Pages

Principle:Scikit_learn_Scikit_learn_Dataset_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment