Implementation:Scikit learn Scikit learn SvmlightFormatIO
| Knowledge Sources | |
|---|---|
| Domains | Data Loading, File I/O |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for loading and saving datasets in SVMlight/LibSVM format provided by scikit-learn.
Description
This module implements a loader and dumper for the SVMlight text-based sparse data format. Each line represents one sample with feature:value pairs, making it efficient for sparse datasets. The module provides load_svmlight_file for reading single files, load_svmlight_files for reading multiple files with consistent feature dimensions, and dump_svmlight_file for writing data. It delegates to a fast Cython implementation for performance.
Usage
Use this module to read and write datasets in SVMlight/LibSVM format, which is the default format for svmlight and libsvm command-line tools and is commonly used for storing sparse datasets.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/datasets/_svmlight_format_io.py
Signature
@validate_params(...)
def load_svmlight_file(
f,
*,
n_features=None,
dtype=np.float64,
multilabel=False,
zero_based="auto",
query_id=False,
offset=0,
length=-1,
)
def load_svmlight_files(files, *, n_features=None, ...)
def dump_svmlight_file(X, y, f, *, zero_based=True, ...)
Import
from sklearn.datasets import load_svmlight_file, dump_svmlight_file
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| f | str, int, PathLike, or file-like | Yes | File path, descriptor, or object to read from |
| n_features | int or None | No | Number of features; inferred if None (default: None) |
| dtype | numpy dtype | No | Data type of feature values (default: np.float64) |
| multilabel | bool | No | Whether labels are multi-label (default: False) |
| zero_based | bool or 'auto' | No | Whether feature indices are 0-based (default: 'auto') |
| query_id | bool | No | Whether to return query IDs (default: False) |
| offset | int | No | Byte offset to start reading from (default: 0) |
| length | int | No | Number of bytes to read; -1 for all (default: -1) |
Outputs
| Name | Type | Description |
|---|---|---|
| X | scipy.sparse.csr_matrix | Sparse feature matrix |
| y | numpy.ndarray | Target values array |
| query_id | numpy.ndarray | Query IDs (only if query_id=True) |
Usage Examples
Basic Usage
from sklearn.datasets import load_svmlight_file, dump_svmlight_file
import numpy as np
from scipy.sparse import csr_matrix
# Load a SVMlight file
X, y = load_svmlight_file("data.svmlight")
print("Shape:", X.shape, "Labels:", np.unique(y))
# Save data in SVMlight format
dump_svmlight_file(X, y, "output.svmlight")