Implementation:Scikit learn Scikit learn SafeIndexing
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Data Indexing |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete utility module for safe data indexing across array types provided by scikit-learn.
Description
The _indexing module provides functions for consistently indexing arrays, sparse matrices, pandas DataFrames, and Polars DataFrames. It handles the differences between indexing APIs across these data containers, supporting boolean masks, integer indices, and slices. The module also provides safe train/test splitting utilities.
Usage
Use these utilities when you need to index into data containers in a type-agnostic way, such as during cross-validation splitting, stratified sampling, or any operation that needs to work with multiple data container types.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/utils/_indexing.py
Signature
def _array_indexing(array, key, key_dtype, axis):
...
def _pandas_indexing(X, key, key_dtype, axis):
...
def _safe_indexing(X, indices, *, axis=0):
...
def _safe_assign(X, values, *, row_indexer=None, column_indexer=None):
...
Import
from sklearn.utils import _safe_indexing
from sklearn.utils._indexing import _safe_assign
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like, sparse matrix, or DataFrame | Yes | Data to be indexed |
| indices | array-like, slice, or int | Yes | Indices to select from the data |
| axis | int | No | Axis along which to index (0 for rows, 1 for columns) |
| key_dtype | str | No | Type of key: "int", "bool", or "str" |
Outputs
| Name | Type | Description |
|---|---|---|
| result | same as input | Subset of the input data selected by the indices |
Usage Examples
Basic Usage
import numpy as np
from sklearn.utils import _safe_indexing
X = np.array([[1, 2], [3, 4], [5, 6]])
indices = [0, 2]
result = _safe_indexing(X, indices)
print(result)
# [[1 2]
# [5 6]]