Principle:Scikit learn Scikit learn Nearest Neighbors

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Supervised Learning, Instance-Based Learning
Last Updated	2026-02-08 15:00 GMT

Overview

Nearest neighbor methods classify or predict based on the labels or values of the closest training examples in the feature space, making predictions without learning an explicit model.

Description

Nearest neighbor algorithms are non-parametric, instance-based methods that store the entire training dataset and make predictions by finding the most similar training examples to a query point. They solve the problem of learning complex, non-linear decision boundaries without making assumptions about the underlying data distribution. For classification, the predicted label is determined by majority voting among the neighbors; for regression, the prediction is the average (or weighted average) of neighbor values. These methods are conceptually simple but surprisingly effective, and they also underpin neighbor graph construction used in manifold learning and clustering.

Usage

Use K-Nearest Neighbors (KNN) classification when decision boundaries are expected to be complex and non-linear, when the dataset is not extremely large, and when interpretability via local examples is valued. Use KNN regression for non-parametric function approximation where local smoothness is a reasonable assumption. Use nearest neighbor graphs for constructing connectivity structures needed by spectral clustering, manifold learning, or graph-based semi-supervised methods. Use the Nearest Centroid classifier for a fast, simple baseline that classifies based on the nearest class centroid. Choose the value of K carefully: small K captures local structure but is sensitive to noise; large K provides smoother boundaries but may oversmooth.

Theoretical Basis

K-Nearest Neighbors (KNN) prediction for a query point $x$ :

Find the set $N_{k} (x)$ of the $k$ training points closest to $x$ according to a distance metric $d$ .
For classification, predict by majority vote:
$\hat{y} = \arg \max_{c} \sum_{x_{i} \in N_{k} (x)} 𝟏 (y_{i} = c)$
For regression, predict by averaging:
$\hat{y} = \frac{1}{k} \sum_{x_{i} \in N_{k} (x)} y_{i}$

Weighted KNN assigns weights inversely proportional to distance:

$w_{i} = \frac{1}{d (x, x_{i})}$

giving closer neighbors more influence on the prediction.

Nearest Centroid Classifier computes the centroid $μ_{c}$ for each class $c$ and assigns a query to the class with the nearest centroid:

$\hat{y} = \arg \min_{c} ‖ x - μ_{c} ‖$

Nearest Neighbor Graph connects each point to its $k$ nearest neighbors, producing a directed graph (or an undirected graph if using mutual nearest neighbors). This graph structure is used by:

Spectral clustering (via the graph Laplacian)
Manifold learning algorithms (Isomap, LLE)
Semi-supervised learning (label propagation)

Efficient search is critical for scalability. Common data structures include:

KD-tree: Partitions space using axis-aligned hyperplanes, with $O (d \cdot n \log n)$ construction and $O (d \cdot n^{1 - 1 / d})$ average query time (degrades in high dimensions).
Ball tree: Partitions space using hyperspheres, more efficient than KD-trees in moderate-to-high dimensions.
Brute force: Computes all distances, with $O (n \cdot d)$ query time; preferred when $n$ or $d$ is small or when exact results are needed.

The curse of dimensionality fundamentally limits nearest neighbor methods: as the number of dimensions grows, distances between points become increasingly uniform, reducing the discriminative power of proximity.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment