Implementation:Scikit learn contrib Imbalanced learn Pipeline

Knowledge Sources	imbalanced-learn imbalanced-learn Docs
Domains	Machine_Learning, Software_Engineering, Data_Pipeline
Last Updated	2026-02-09 03:00 GMT

Overview

Concrete tool for chaining samplers, transformers, and estimators into a single leakage-free workflow provided by the imbalanced-learn library.

Description

The Pipeline class extends scikit-learn's sklearn.pipeline.Pipeline to support steps that implement fit_resample (samplers). During fit(), sampler steps resample the data; during predict() and transform(), they are skipped. The make_pipeline convenience function auto-names the steps. Supports caching, metadata routing, and parameter setting via the stepname__param syntax.

Usage

Import Pipeline or make_pipeline from imblearn.pipeline whenever you need to combine resampling with other sklearn-compatible steps. Always use this instead of sklearn's Pipeline when samplers are involved.

Code Reference

Source Location

Repository: imbalanced-learn
File: imblearn/pipeline.py
Lines: L111-1458 (Pipeline class: L111-1328, make_pipeline: L1398-1458)

Signature

class Pipeline(sklearn.pipeline.Pipeline):
    def __init__(
        self,
        steps,
        *,
        memory=None,
        transform_input=None,
        verbose=False,
    ):
        """
        Args:
            steps: list of (name, estimator) tuples - Samplers, transformers,
                and final estimator chained in order.
            memory: None, str, or joblib.Memory - Cache fitted transformers.
            transform_input: list of str or None - Metadata to transform.
            verbose: bool - Print step timing (default: False).
        """

def make_pipeline(*steps, memory=None, transform_input=None, verbose=False):
    """Construct a Pipeline with auto-named steps."""

Import

from imblearn.pipeline import Pipeline, make_pipeline

I/O Contract

Inputs

Name	Type	Required	Description
steps	list of (str, estimator) tuples	Yes	Named pipeline steps
X	{array-like, sparse matrix}	Yes (for fit/predict)	Feature matrix
y	array-like	Yes (for fit)	Target labels

Outputs

Name	Type	Description
Pipeline.fit()	self	Fitted pipeline; samplers have resampled training data
Pipeline.predict()	ndarray	Predicted labels (samplers skipped)
Pipeline.fit_resample()	(X, y)	Resampled data from all steps

Usage Examples

Basic Pipeline

from imblearn.pipeline import make_pipeline
from imblearn.over_sampling import SMOTE
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

pipeline = make_pipeline(
    StandardScaler(),
    SMOTE(random_state=42),
    LinearSVC(random_state=42),
)
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)

Cross-Validation

from sklearn.model_selection import cross_validate

scores = cross_validate(
    pipeline, X, y,
    scoring="balanced_accuracy",
    cv=5,
)
print(f"Mean: {scores['test_score'].mean():.3f}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment