Implementation:Scikit learn contrib Imbalanced learn NeighbourhoodCleaningRule

Knowledge Sources	imbalanced-learn imbalanced-learn Docs Laurikkala 2001
Domains	Machine_Learning, Data_Preprocessing, Imbalanced_Learning
Last Updated	2026-02-09 03:00 GMT

Overview

Concrete tool for under-sampling based on the Neighbourhood Cleaning Rule provided by the imbalanced-learn library.

Description

The NeighbourhoodCleaningRule class implements a two-phase data cleaning approach that combines Edited Nearest Neighbours (ENN) with K-Nearest Neighbours to remove noisy majority-class samples. It extends BaseCleaningSampler and operates in two phases: first applying ENN to identify majority samples misclassified by their neighbourhood, then using K-NN on minority samples to find additional majority neighbours that cause misclassification of minority instances. The union of both sets of identified samples is removed.

Usage

Import this class when you need a neighbourhood-based cleaning approach that not only removes noisy majority samples (via ENN) but also targets majority samples that are harmful to minority class recognition.

Code Reference

Source Location

Repository: imbalanced-learn
File: imblearn/under_sampling/_prototype_selection/_neighbourhood_cleaning_rule.py
Lines: L1-239

Signature

class NeighbourhoodCleaningRule(BaseCleaningSampler):
    def __init__(
        self,
        *,
        sampling_strategy="auto",
        edited_nearest_neighbours=None,
        n_neighbors=3,
        threshold_cleaning=0.5,
        n_jobs=None,
    ):
        """
        Args:
            sampling_strategy: str, dict, or callable - Desired ratio of
                minority to majority samples. 'auto' targets all majority
                classes.
            edited_nearest_neighbours: EditedNearestNeighbours or None -
                Custom ENN object for Phase 1 cleaning. Defaults to ENN
                with kind_sel="mode" and the specified n_neighbors.
            n_neighbors: int or KNeighborsMixin estimator - Number of
                nearest neighbours for the K-NN classifier in Phase 2
                (default: 3).
            threshold_cleaning: float - Threshold for deciding which
                classes to clean in Phase 2. A class is cleaned when
                its size > minority_size * threshold (default: 0.5).
            n_jobs: int or None - Number of parallel jobs.
        """

Import

from imblearn.under_sampling import NeighbourhoodCleaningRule

I/O Contract

Inputs

Name	Type	Required	Description
X	{array-like, sparse matrix, dataframe} of shape (n_samples, n_features)	Yes	Feature matrix of training data
y	array-like of shape (n_samples,)	Yes	Target labels
sampling_strategy	str, dict, or callable	No	Resampling ratio (default: 'auto')
edited_nearest_neighbours	EditedNearestNeighbours or None	No	Custom ENN object for Phase 1 (default: None)
n_neighbors	int or KNeighborsMixin estimator	No	Neighbours for K-NN classifier (default: 3)
threshold_cleaning	float	No	Class size threshold for Phase 2 cleaning (default: 0.5)
n_jobs	int or None	No	Number of parallel jobs (default: None)

Outputs

Name	Type	Description
X_resampled	{ndarray, sparse matrix, dataframe} of shape (n_samples_new, n_features)	Feature matrix with noisy majority samples removed
y_resampled	ndarray of shape (n_samples_new,)	Target array after neighbourhood cleaning

Key Attributes After Fitting

Attribute	Type	Description
sampling_strategy_	dict	Maps class labels to number of samples to sample
edited_nearest_neighbours_	estimator object	The ENN object used for Phase 1 resampling
nn_	estimator object	Validated K-Nearest Neighbours classifier used in Phase 2
classes_to_clean_	list	Classes considered for under-sampling during Phase 2
sample_indices_	ndarray of shape (n_new_samples,)	Indices of samples selected from the original dataset
n_features_in_	int	Number of features in the input dataset
feature_names_in_	ndarray of shape (n_features_in_,)	Names of features seen during fit (when X has string feature names)

Usage Examples

Basic Usage

from collections import Counter
from sklearn.datasets import make_classification
from imblearn.under_sampling import NeighbourhoodCleaningRule

# Create imbalanced dataset
X, y = make_classification(
    n_classes=2, class_sep=2, weights=[0.1, 0.9],
    n_informative=3, n_redundant=1, flip_y=0,
    n_features=20, n_clusters_per_class=1,
    n_samples=1000, random_state=10,
)
print(f"Original: {Counter(y)}")
# Original: Counter({1: 900, 0: 100})

# Apply NeighbourhoodCleaningRule
ncr = NeighbourhoodCleaningRule()
X_res, y_res = ncr.fit_resample(X, y)
print(f"Resampled: {Counter(y_res)}")
# Resampled: Counter({1: 888, 0: 100})

Custom ENN and Threshold

from imblearn.under_sampling import (
    EditedNearestNeighbours,
    NeighbourhoodCleaningRule,
)

# Configure custom ENN for Phase 1
custom_enn = EditedNearestNeighbours(n_neighbors=5, kind_sel="all")

ncr = NeighbourhoodCleaningRule(
    edited_nearest_neighbours=custom_enn,
    n_neighbors=5,
    threshold_cleaning=0.3,
)
X_res, y_res = ncr.fit_resample(X, y)

In a Pipeline

from imblearn.pipeline import make_pipeline
from imblearn.under_sampling import NeighbourhoodCleaningRule
from sklearn.ensemble import GradientBoostingClassifier

pipeline = make_pipeline(
    NeighbourhoodCleaningRule(),
    GradientBoostingClassifier(random_state=42),
)
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)

Related Pages

Implements Principle

Principle:Scikit_learn_contrib_Imbalanced_learn_Neighbourhood_Cleaning_Rule

Requires Environment

Environment:Scikit_learn_contrib_Imbalanced_learn_Python_Scikit_learn

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment