Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Scikit learn contrib Imbalanced learn Pipeline

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Software_Engineering, Data_Pipeline
Last Updated 2026-02-09 03:00 GMT

Overview

Concrete tool for chaining samplers, transformers, and estimators into a single leakage-free workflow provided by the imbalanced-learn library.

Description

The Pipeline class extends scikit-learn's sklearn.pipeline.Pipeline to support steps that implement fit_resample (samplers). During fit(), sampler steps resample the data; during predict() and transform(), they are skipped. The make_pipeline convenience function auto-names the steps. Supports caching, metadata routing, and parameter setting via the stepname__param syntax.

Usage

Import Pipeline or make_pipeline from imblearn.pipeline whenever you need to combine resampling with other sklearn-compatible steps. Always use this instead of sklearn's Pipeline when samplers are involved.

Code Reference

Source Location

  • Repository: imbalanced-learn
  • File: imblearn/pipeline.py
  • Lines: L111-1458 (Pipeline class: L111-1328, make_pipeline: L1398-1458)

Signature

class Pipeline(sklearn.pipeline.Pipeline):
    def __init__(
        self,
        steps,
        *,
        memory=None,
        transform_input=None,
        verbose=False,
    ):
        """
        Args:
            steps: list of (name, estimator) tuples - Samplers, transformers,
                and final estimator chained in order.
            memory: None, str, or joblib.Memory - Cache fitted transformers.
            transform_input: list of str or None - Metadata to transform.
            verbose: bool - Print step timing (default: False).
        """

def make_pipeline(*steps, memory=None, transform_input=None, verbose=False):
    """Construct a Pipeline with auto-named steps."""

Import

from imblearn.pipeline import Pipeline, make_pipeline

I/O Contract

Inputs

Name Type Required Description
steps list of (str, estimator) tuples Yes Named pipeline steps
X {array-like, sparse matrix} Yes (for fit/predict) Feature matrix
y array-like Yes (for fit) Target labels

Outputs

Name Type Description
Pipeline.fit() self Fitted pipeline; samplers have resampled training data
Pipeline.predict() ndarray Predicted labels (samplers skipped)
Pipeline.fit_resample() (X, y) Resampled data from all steps

Usage Examples

Basic Pipeline

from imblearn.pipeline import make_pipeline
from imblearn.over_sampling import SMOTE
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

pipeline = make_pipeline(
    StandardScaler(),
    SMOTE(random_state=42),
    LinearSVC(random_state=42),
)
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)

Cross-Validation

from sklearn.model_selection import cross_validate

scores = cross_validate(
    pipeline, X, y,
    scoring="balanced_accuracy",
    cv=5,
)
print(f"Mean: {scores['test_score'].mean():.3f}")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment