Implementation:Scikit learn Scikit learn KNNImputer
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Missing Data, Imputation |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete implementation of k-Nearest Neighbors imputation for missing values provided by scikit-learn.
Description
The KNNImputer class fills in missing values using the mean (or weighted mean) of the k-nearest neighbors found in the training set. Distance between samples is computed only on features that neither sample has missing. It supports uniform and distance-based weighting, and uses the nan_euclidean distance metric by default which handles NaN values natively.
Usage
Use KNNImputer when you want to impute missing values based on the similarity of samples, leveraging the local structure of the data. It is particularly effective when similar samples tend to have similar feature values.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/impute/_knn.py
Signature
class KNNImputer(_BaseImputer):
def __init__(
self,
*,
missing_values=np.nan,
n_neighbors=5,
weights="uniform",
metric="nan_euclidean",
copy=True,
add_indicator=False,
keep_empty_features=False,
):
...
def fit(self, X, y=None):
...
def transform(self, X):
...
Import
from sklearn.impute import KNNImputer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like of shape (n_samples, n_features) | Yes | Data with missing values to impute |
| n_neighbors | int | No | Number of nearest neighbors to use (default: 5) |
| weights | str or callable | No | Weight function: "uniform" or "distance" |
| metric | str or callable | No | Distance metric (default: "nan_euclidean") |
| missing_values | int, float, str, or np.nan | No | Placeholder for missing values |
| copy | bool | No | Whether to create a copy of the input data |
Outputs
| Name | Type | Description |
|---|---|---|
| X_imputed | ndarray of shape (n_samples, n_features) | Data with missing values imputed using KNN |
Usage Examples
Basic Usage
import numpy as np
from sklearn.impute import KNNImputer
X = np.array([[1, 2, np.nan], [3, 4, 3], [np.nan, 6, 5], [8, 8, 7]])
imputer = KNNImputer(n_neighbors=2, weights="uniform")
X_imputed = imputer.fit_transform(X)
print(X_imputed)