Implementation:Scikit learn Scikit learn KernelDensity
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Density Estimation |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for kernel density estimation provided by scikit-learn.
Description
KernelDensity implements kernel density estimation (KDE), a non-parametric method for estimating the probability density function of a random variable. It supports multiple kernel functions (gaussian, tophat, epanechnikov, exponential, linear, cosine) and tree-based algorithms (ball_tree, kd_tree) for efficient computation. The bandwidth parameter controls the smoothness of the resulting density estimate.
Usage
Use KernelDensity when you need to estimate the underlying probability distribution of data without assuming a specific parametric form. It is commonly used for density visualization, anomaly detection, and generating synthetic samples.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/neighbors/_kde.py
Signature
class KernelDensity(BaseEstimator):
def __init__(
self,
*,
bandwidth=1.0,
algorithm="auto",
kernel="gaussian",
metric="euclidean",
atol=0,
rtol=0,
breadth_first=True,
leaf_size=40,
metric_params=None,
):
Import
from sklearn.neighbors import KernelDensity
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| bandwidth | float or str | No | Bandwidth of the kernel; can be float or 'scott'/'silverman' (default=1.0) |
| algorithm | str | No | Tree algorithm to use: 'kd_tree', 'ball_tree', or 'auto' (default='auto') |
| kernel | str | No | Kernel function: 'gaussian', 'tophat', 'epanechnikov', 'exponential', 'linear', 'cosine' (default='gaussian') |
| metric | str | No | Distance metric (default='euclidean') |
| atol | float | No | Desired absolute tolerance of the result (default=0) |
| rtol | float | No | Desired relative tolerance of the result (default=0) |
| breadth_first | bool | No | Whether to use breadth-first or depth-first tree traversal (default=True) |
| leaf_size | int | No | Leaf size for the tree (default=40) |
| metric_params | dict or None | No | Additional parameters for the distance metric |
Outputs
| Name | Type | Description |
|---|---|---|
| score_samples(X) | ndarray of shape (n_samples,) | Log-likelihood of each sample under the model |
| sample(n_samples) | ndarray of shape (n_samples, n_features) | Randomly generated samples from the fitted density |
| n_features_in_ | int | Number of features seen during fit |
| tree_ | BallTree or KDTree | The fitted tree object used for queries |
| bandwidth_ | float | The actual bandwidth value used (after estimation if string was provided) |
Usage Examples
Basic Usage
from sklearn.neighbors import KernelDensity
import numpy as np
# Generate sample data
rng = np.random.RandomState(42)
X = rng.randn(100, 2)
# Fit kernel density estimator
kde = KernelDensity(kernel="gaussian", bandwidth=0.5)
kde.fit(X)
# Score new samples
scores = kde.score_samples(X[:5])
print(scores)
# Generate new samples
samples = kde.sample(10, random_state=42)
print(samples.shape)