Implementation:Scikit learn Scikit learn SetOutput
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Output Configuration |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete utility module for configuring transformer output containers provided by scikit-learn.
Description
The _set_output module implements the set_output API that allows scikit-learn transformers to return pandas DataFrames or Polars DataFrames instead of NumPy arrays. It provides the _SetOutputMixin, container adapter protocols for pandas and Polars, and functions to wrap transformer output with appropriate metadata (column names, index).
Usage
Use the set_output API on any scikit-learn transformer to configure its output format. Set transform="pandas" or transform="polars" to get DataFrame outputs from transform, fit_transform, and related methods.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/utils/_set_output.py
Signature
class ContainerAdapterProtocol(Protocol):
container_lib: str
def create_container(self, X_output, X_original, columns, inplace=False):
...
def is_supported_container(self, X):
...
class PandasAdapter:
...
class PolarsAdapter:
...
class _SetOutputMixin:
def set_output(self, *, transform=None):
...
def _safe_set_output(estimator, *, transform=None):
...
Import
from sklearn.utils._set_output import _SetOutputMixin, _safe_set_output
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| transform | str or None | No | Output container type: "pandas", "polars", or None for default (ndarray) |
| estimator | estimator instance | Yes | Estimator to configure output for |
Outputs
| Name | Type | Description |
|---|---|---|
| self | estimator | The estimator with output configured |
| X_output | DataFrame or ndarray | Transformed data in the configured output format |
Usage Examples
Basic Usage
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
import pandas as pd
X, y = load_iris(return_X_y=True, as_frame=True)
scaler = StandardScaler().set_output(transform="pandas")
X_scaled = scaler.fit_transform(X)
print(type(X_scaled)) # <class 'pandas.core.frame.DataFrame'>
print(X_scaled.columns.tolist()) # Original feature names preserved