Implementation:Scikit learn contrib Imbalanced learn Pipeline
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Software_Engineering, Data_Pipeline |
| Last Updated | 2026-02-09 03:00 GMT |
Overview
Concrete tool for chaining samplers, transformers, and estimators into a single leakage-free workflow provided by the imbalanced-learn library.
Description
The Pipeline class extends scikit-learn's sklearn.pipeline.Pipeline to support steps that implement fit_resample (samplers). During fit(), sampler steps resample the data; during predict() and transform(), they are skipped. The make_pipeline convenience function auto-names the steps. Supports caching, metadata routing, and parameter setting via the stepname__param syntax.
Usage
Import Pipeline or make_pipeline from imblearn.pipeline whenever you need to combine resampling with other sklearn-compatible steps. Always use this instead of sklearn's Pipeline when samplers are involved.
Code Reference
Source Location
- Repository: imbalanced-learn
- File: imblearn/pipeline.py
- Lines: L111-1458 (Pipeline class: L111-1328, make_pipeline: L1398-1458)
Signature
class Pipeline(sklearn.pipeline.Pipeline):
def __init__(
self,
steps,
*,
memory=None,
transform_input=None,
verbose=False,
):
"""
Args:
steps: list of (name, estimator) tuples - Samplers, transformers,
and final estimator chained in order.
memory: None, str, or joblib.Memory - Cache fitted transformers.
transform_input: list of str or None - Metadata to transform.
verbose: bool - Print step timing (default: False).
"""
def make_pipeline(*steps, memory=None, transform_input=None, verbose=False):
"""Construct a Pipeline with auto-named steps."""
Import
from imblearn.pipeline import Pipeline, make_pipeline
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| steps | list of (str, estimator) tuples | Yes | Named pipeline steps |
| X | {array-like, sparse matrix} | Yes (for fit/predict) | Feature matrix |
| y | array-like | Yes (for fit) | Target labels |
Outputs
| Name | Type | Description |
|---|---|---|
| Pipeline.fit() | self | Fitted pipeline; samplers have resampled training data |
| Pipeline.predict() | ndarray | Predicted labels (samplers skipped) |
| Pipeline.fit_resample() | (X, y) | Resampled data from all steps |
Usage Examples
Basic Pipeline
from imblearn.pipeline import make_pipeline
from imblearn.over_sampling import SMOTE
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
pipeline = make_pipeline(
StandardScaler(),
SMOTE(random_state=42),
LinearSVC(random_state=42),
)
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
Cross-Validation
from sklearn.model_selection import cross_validate
scores = cross_validate(
pipeline, X, y,
scoring="balanced_accuracy",
cv=5,
)
print(f"Mean: {scores['test_score'].mean():.3f}")