Implementation:Scikit learn Scikit learn NearestCentroid
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Classification |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for nearest centroid classification provided by scikit-learn.
Description
NearestCentroid is a simple classification algorithm where each class is represented by its centroid (mean or median of features). Test samples are classified to the class with the nearest centroid. It supports Euclidean and Manhattan distance metrics, optional centroid shrinkage to remove features, and configurable class priors. The classifier is computationally efficient and has no hyperparameters to tune beyond the optional shrink threshold.
Usage
Use NearestCentroid when you need a fast, simple classifier with minimal tuning requirements. It works well when classes are well-separated and approximately convex, and is particularly effective for high-dimensional text classification with TF-IDF features.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/neighbors/_nearest_centroid.py
Signature
class NearestCentroid(
DiscriminantAnalysisPredictionMixin, ClassifierMixin, BaseEstimator
):
def __init__(
self,
metric="euclidean",
*,
shrink_threshold=None,
priors="uniform",
):
Import
from sklearn.neighbors import NearestCentroid
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| metric | str | No | Distance metric: 'euclidean' or 'manhattan' (default='euclidean') |
| shrink_threshold | float or None | No | Threshold for shrinking centroids to remove features (default=None) |
| priors | str or array-like | No | Class prior probabilities: 'uniform', 'empirical', or array of shape (n_classes,) (default='uniform') |
Outputs
| Name | Type | Description |
|---|---|---|
| centroids_ | ndarray of shape (n_classes, n_features) | Centroid of each class |
| classes_ | ndarray of shape (n_classes,) | The unique class labels |
| n_features_in_ | int | Number of features seen during fit |
| feature_names_in_ | ndarray of shape (n_features_in_,) | Names of features seen during fit |
Usage Examples
Basic Usage
from sklearn.neighbors import NearestCentroid
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
clf = NearestCentroid(metric="euclidean")
clf.fit(X_train, y_train)
print(f"Accuracy: {clf.score(X_test, y_test):.3f}")
print(f"Centroids shape: {clf.centroids_.shape}")