Implementation:Scikit learn contrib Imbalanced learn CondensedNearestNeighbour

Knowledge Sources	imbalanced-learn imbalanced-learn Docs P. Hart, "The condensed nearest neighbor rule," IEEE Transactions on Information Theory, 1968
Domains	Machine_Learning, Data_Preprocessing, Imbalanced_Learning
Last Updated	2026-02-09 03:00 GMT

Overview

Under-sampling technique that iteratively builds a consistent subset of the training data by retaining only majority samples that are misclassified by a 1-nearest-neighbor classifier trained on the current subset.

Description

The CondensedNearestNeighbour class implements the Condensed Nearest Neighbour (CNN) rule for under-sampling majority class instances. It extends BaseCleaningSampler and works by initializing a subset with all minority samples plus a random seed from each majority class, then iteratively adding misclassified majority samples until the subset is consistent (i.e., every sample in the original training set is correctly classified by a 1-NN classifier using only the subset). The class integrates with scikit-learn's estimator API, supporting pipeline composition, parameter validation, and multi-class resampling via a one-vs.-rest scheme.

Usage

Import this class when you need to reduce the size of a majority class while preserving samples near the decision boundary. Use it as a standalone resampler via fit_resample() or as a step in an imblearn.pipeline.Pipeline. CNN is most effective when the majority class contains large regions of redundant samples far from the decision boundary.

Code Reference

Source Location

Repository: imbalanced-learn
File: imblearn/under_sampling/_prototype_selection/_condensed_nearest_neighbour.py
Lines: L1-247

Signature

class CondensedNearestNeighbour(BaseCleaningSampler):
    def __init__(
        self,
        *,
        sampling_strategy="auto",
        random_state=None,
        n_neighbors=None,
        n_seeds_S=1,
        n_jobs=None,
    ):
        """
        Args:
            sampling_strategy: str, dict, or callable - Desired ratio of
                samples after resampling. 'auto' targets all classes except
                the minority class.
            random_state: int, RandomState, or None - Seed for reproducibility
                when selecting the initial majority seed samples.
            n_neighbors: int, KNeighborsClassifier, or None - Number of
                nearest neighbors for classification. None defaults to 1-NN.
            n_seeds_S: int - Number of initial majority samples to seed the
                condensed set (default: 1).
            n_jobs: int or None - Number of parallel jobs for the nearest
                neighbor classifier.
        """

Import

from imblearn.under_sampling import CondensedNearestNeighbour

I/O Contract

Inputs

Name	Type	Required	Description
X	{array-like, sparse matrix, dataframe} of shape (n_samples, n_features)	Yes	Feature matrix of training data
y	array-like of shape (n_samples,)	Yes	Target labels indicating class membership
sampling_strategy	str, dict, or callable	No	Resampling target; 'auto' targets all classes except the minority
n_neighbors	int, KNeighborsClassifier, or None	No	Neighbor count or estimator for 1-NN classification (default: None, i.e. 1-NN)
n_seeds_S	int	No	Number of random majority seeds to initialize the condensed set (default: 1)
random_state	int, RandomState, or None	No	Random seed for reproducibility
n_jobs	int or None	No	Number of parallel jobs for the nearest neighbor search

Outputs

Name	Type	Description
X_resampled	{ndarray, sparse matrix, dataframe} of shape (n_samples_new, n_features)	Feature matrix with redundant majority samples removed
y_resampled	ndarray of shape (n_samples_new,)	Target array with corresponding labels for the condensed subset

Attributes

Name	Type	Description
sampling_strategy_	dict	Maps class labels to the number of samples to remove
estimators_	list of KNeighborsClassifier	One fitted 1-NN estimator per resampled class
sample_indices_	ndarray of shape (n_new_samples,)	Indices of selected samples from the original dataset
n_features_in_	int	Number of features seen during fit
feature_names_in_	ndarray of shape (n_features_in_,)	Feature names seen during fit (when X has string feature names)

Usage Examples

Basic Under-sampling

from collections import Counter
from sklearn.datasets import make_classification
from imblearn.under_sampling import CondensedNearestNeighbour

# 1. Create an imbalanced dataset
X, y = make_classification(
    n_classes=2, class_sep=2, weights=[0.1, 0.9],
    n_informative=3, n_redundant=1, flip_y=0,
    n_features=20, n_clusters_per_class=1,
    n_samples=1000, random_state=10,
)
print(f"Original: {Counter(y)}")

# 2. Apply Condensed Nearest Neighbour
cnn = CondensedNearestNeighbour(random_state=42)
X_resampled, y_resampled = cnn.fit_resample(X, y)
print(f"Resampled: {Counter(y_resampled)}")

Inside a Pipeline

from imblearn.pipeline import make_pipeline
from imblearn.under_sampling import CondensedNearestNeighbour
from sklearn.svm import LinearSVC
from sklearn.model_selection import cross_validate

# Build pipeline with CNN + classifier
pipeline = make_pipeline(
    CondensedNearestNeighbour(random_state=42),
    LinearSVC(),
)

# Cross-validate (CNN applied only to training folds)
scores = cross_validate(pipeline, X, y, scoring="balanced_accuracy", cv=5)
print(f"Mean balanced accuracy: {scores['test_score'].mean():.3f}")

Custom Neighbor Count

from imblearn.under_sampling import CondensedNearestNeighbour

# Use 3-NN instead of the default 1-NN
cnn = CondensedNearestNeighbour(
    n_neighbors=3,
    n_seeds_S=1,
    random_state=42,
)
X_res, y_res = cnn.fit_resample(X, y)

# Inspect which samples were retained
print(f"Retained indices: {cnn.sample_indices_}")

Related Pages

Implements Principle

Principle:Scikit_learn_contrib_Imbalanced_learn_Condensed_Nearest_Neighbour

Requires Environment

Environment:Scikit_learn_contrib_Imbalanced_learn_Python_Scikit_learn

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment