Implementation:Scikit learn Scikit learn DBSCAN
| Knowledge Sources | |
|---|---|
| Domains | Clustering, Density-Based Clustering |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for performing density-based spatial clustering of applications with noise (DBSCAN) provided by scikit-learn.
Description
DBSCAN is a density-based clustering algorithm that groups together points that are closely packed (high density regions) while marking points in low-density regions as outliers (noise). It finds core samples with at least min_samples neighbors within an eps radius, then expands clusters from those core samples. Unlike K-Means, DBSCAN does not require the number of clusters to be specified in advance and can discover clusters of arbitrary shape.
Usage
Use DBSCAN when you expect clusters of similar density and arbitrary shape, when you need to identify outliers/noise, or when the number of clusters is unknown. It is particularly effective for spatial data and situations where clusters are not convex. Avoid it when clusters have very different densities.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/cluster/_dbscan.py
Signature
class DBSCAN(ClusterMixin, BaseEstimator):
def __init__(
self,
eps=0.5,
*,
min_samples=5,
metric="euclidean",
metric_params=None,
algorithm="auto",
leaf_size=30,
p=None,
n_jobs=None,
):
Import
from sklearn.cluster import DBSCAN
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| eps | float | No | Maximum distance between two samples for neighborhood membership. Default is 0.5. |
| min_samples | int | No | Minimum number of samples in a neighborhood for a core point (includes the point itself). Default is 5. |
| metric | str or callable | No | Distance metric to use. Default is "euclidean". |
| metric_params | dict or None | No | Additional keyword arguments for the metric function. Default is None. |
| algorithm | str | No | Algorithm for nearest neighbor computation: "auto", "ball_tree", "kd_tree", or "brute". Default is "auto". |
| leaf_size | int | No | Leaf size for BallTree or KDTree. Default is 30. |
| p | float or None | No | Power of the Minkowski metric. Default is None (uses p=2, Euclidean). |
| n_jobs | int or None | No | Number of parallel jobs. Default is None. |
Outputs
| Name | Type | Description |
|---|---|---|
| core_sample_indices_ | ndarray of shape (n_core_samples,) | Indices of core samples. |
| components_ | ndarray of shape (n_core_samples, n_features) | Copy of each core sample found by training. |
| labels_ | ndarray of shape (n_samples,) | Cluster labels for each sample. Noisy samples are given the label -1. |
| n_features_in_ | int | Number of features seen during fit. |
Usage Examples
Basic Usage
from sklearn.cluster import DBSCAN
import numpy as np
X = np.array([[1, 2], [2, 2], [2, 3],
[8, 7], [8, 8], [25, 80]])
clustering = DBSCAN(eps=3, min_samples=2).fit(X)
print(clustering.labels_)