Implementation:Scikit learn Scikit learn Make Pipeline
Overview
Concrete tool for constructing a Pipeline with auto-generated step names provided by scikit-learn.
Code Reference
Function: make_pipeline
Module: sklearn/pipeline.py (lines 1379-1440)
Signature:
def make_pipeline(*steps, memory=None, transform_input=None, verbose=False):
This is a convenience shorthand for the Pipeline constructor. It does not require (and does not permit) naming the estimators. Instead, step names are automatically set to the lowercase of their class types.
I/O Contract
Parameters:
*steps: list of Estimator objects -- The scikit-learn estimators to be chained together. All intermediate steps must implementfitandtransform. The final step only needs to implementfit.memory:stror object with thejoblib.Memoryinterface, default=None-- Used to cache the fitted transformers of the pipeline. The last step is never cached, even if it is a transformer. If a string is given, it is the path to the caching directory.transform_input:listofstr, default=None-- Enables transforming some input arguments tofit(other thanX) by the pipeline steps. Requires metadata routing to be enabled viasklearn.set_config(enable_metadata_routing=True). Added in version 1.6.verbose:bool, default=False-- IfTrue, the time elapsed while fitting each step will be printed as it is completed.
Returns:
p:Pipeline-- A scikit-learnPipelineobject with auto-generated step names.
Implementation Details
The function performs two actions:
- Calls
_name_estimators(steps)to generate(name, estimator)tuples from the positional arguments. The naming algorithm converts each estimator's class name to lowercase. If there are duplicates, a numeric suffix is appended (e.g.,standardscaler-1,standardscaler-2). - Constructs and returns a
Pipelineobject with the named steps and the givenmemory,transform_input, andverbosearguments.
The underlying Pipeline class (sklearn/pipeline.py, line 91) has the constructor:
class Pipeline(_BaseComposition):
def __init__(self, steps, *, transform_input=None, memory=None, verbose=False):
The Pipeline class stores the steps as a list of (name, estimator) tuples and provides:
named_steps: aBunchfor dictionary-like access to any step by name.- Nested parameter access via the
step_name__param_namesyntax for use withGridSearchCV. - Automatic delegation of
predict,predict_proba,score, and other methods to the final estimator.
Usage Examples
Basic pipeline construction:
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
pipe = make_pipeline(StandardScaler(), GaussianNB(priors=None))
# Pipeline(steps=[('standardscaler', StandardScaler()),
# ('gaussiannb', GaussianNB())])
Full preprocessing and modeling pipeline:
import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer, make_column_selector
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
# Numeric sub-pipeline
numeric_transformer = make_pipeline(
SimpleImputer(strategy="median"),
StandardScaler()
)
# Categorical sub-pipeline
categorical_transformer = make_pipeline(
SimpleImputer(strategy="most_frequent"),
OneHotEncoder(handle_unknown="ignore")
)
# Column composition
preprocessor = ColumnTransformer(
transformers=[
("num", numeric_transformer,
make_column_selector(dtype_include=np.number)),
("cat", categorical_transformer,
make_column_selector(dtype_include=object)),
]
)
# Full pipeline: preprocessing + model
clf = make_pipeline(preprocessor, LogisticRegression())