Implementation:Scikit learn Scikit learn Make Pipeline

Overview

Concrete tool for constructing a Pipeline with auto-generated step names provided by scikit-learn.

Code Reference

Function: make_pipeline

Module: sklearn/pipeline.py (lines 1379-1440)

Signature:

def make_pipeline(*steps, memory=None, transform_input=None, verbose=False):

This is a convenience shorthand for the Pipeline constructor. It does not require (and does not permit) naming the estimators. Instead, step names are automatically set to the lowercase of their class types.

I/O Contract

Parameters:

*steps : list of Estimator objects -- The scikit-learn estimators to be chained together. All intermediate steps must implement fit and transform. The final step only needs to implement fit.
memory : str or object with the joblib.Memory interface, default=None -- Used to cache the fitted transformers of the pipeline. The last step is never cached, even if it is a transformer. If a string is given, it is the path to the caching directory.
transform_input : list of str, default=None -- Enables transforming some input arguments to fit (other than X) by the pipeline steps. Requires metadata routing to be enabled via sklearn.set_config(enable_metadata_routing=True). Added in version 1.6.
verbose : bool, default=False -- If True, the time elapsed while fitting each step will be printed as it is completed.

Returns:

p : Pipeline -- A scikit-learn Pipeline object with auto-generated step names.

Implementation Details

The function performs two actions:

Calls _name_estimators(steps) to generate (name, estimator) tuples from the positional arguments. The naming algorithm converts each estimator's class name to lowercase. If there are duplicates, a numeric suffix is appended (e.g., standardscaler-1, standardscaler-2).
Constructs and returns a Pipeline object with the named steps and the given memory, transform_input, and verbose arguments.

The underlying Pipeline class (sklearn/pipeline.py, line 91) has the constructor:

class Pipeline(_BaseComposition):
    def __init__(self, steps, *, transform_input=None, memory=None, verbose=False):

The Pipeline class stores the steps as a list of (name, estimator) tuples and provides:

named_steps : a Bunch for dictionary-like access to any step by name.
Nested parameter access via the step_name__param_name syntax for use with GridSearchCV.
Automatic delegation of predict, predict_proba, score, and other methods to the final estimator.

Usage Examples

Basic pipeline construction:

from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

pipe = make_pipeline(StandardScaler(), GaussianNB(priors=None))
# Pipeline(steps=[('standardscaler', StandardScaler()),
#                 ('gaussiannb', GaussianNB())])

Full preprocessing and modeling pipeline:

import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer, make_column_selector
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

# Numeric sub-pipeline
numeric_transformer = make_pipeline(
    SimpleImputer(strategy="median"),
    StandardScaler()
)

# Categorical sub-pipeline
categorical_transformer = make_pipeline(
    SimpleImputer(strategy="most_frequent"),
    OneHotEncoder(handle_unknown="ignore")
)

# Column composition
preprocessor = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer,
         make_column_selector(dtype_include=np.number)),
        ("cat", categorical_transformer,
         make_column_selector(dtype_include=object)),
    ]
)

# Full pipeline: preprocessing + model
clf = make_pipeline(preprocessor, LogisticRegression())

Related Pages

Principle:Scikit_learn_Scikit_learn_Pipeline_Chaining

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment