Implementation:Scikit learn Scikit learn DBSCAN

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Clustering, Density-Based Clustering
Last Updated	2026-02-08 15:00 GMT

Overview

Concrete tool for performing density-based spatial clustering of applications with noise (DBSCAN) provided by scikit-learn.

Description

DBSCAN is a density-based clustering algorithm that groups together points that are closely packed (high density regions) while marking points in low-density regions as outliers (noise). It finds core samples with at least min_samples neighbors within an eps radius, then expands clusters from those core samples. Unlike K-Means, DBSCAN does not require the number of clusters to be specified in advance and can discover clusters of arbitrary shape.

Usage

Use DBSCAN when you expect clusters of similar density and arbitrary shape, when you need to identify outliers/noise, or when the number of clusters is unknown. It is particularly effective for spatial data and situations where clusters are not convex. Avoid it when clusters have very different densities.

Code Reference

Source Location

Repository: scikit-learn
File: sklearn/cluster/_dbscan.py

Signature

class DBSCAN(ClusterMixin, BaseEstimator):
    def __init__(
        self,
        eps=0.5,
        *,
        min_samples=5,
        metric="euclidean",
        metric_params=None,
        algorithm="auto",
        leaf_size=30,
        p=None,
        n_jobs=None,
    ):

Import

from sklearn.cluster import DBSCAN

I/O Contract

Inputs

Name	Type	Required	Description
eps	float	No	Maximum distance between two samples for neighborhood membership. Default is 0.5.
min_samples	int	No	Minimum number of samples in a neighborhood for a core point (includes the point itself). Default is 5.
metric	str or callable	No	Distance metric to use. Default is "euclidean".
metric_params	dict or None	No	Additional keyword arguments for the metric function. Default is None.
algorithm	str	No	Algorithm for nearest neighbor computation: "auto", "ball_tree", "kd_tree", or "brute". Default is "auto".
leaf_size	int	No	Leaf size for BallTree or KDTree. Default is 30.
p	float or None	No	Power of the Minkowski metric. Default is None (uses p=2, Euclidean).
n_jobs	int or None	No	Number of parallel jobs. Default is None.

Outputs

Name	Type	Description
core_sample_indices_	ndarray of shape (n_core_samples,)	Indices of core samples.
components_	ndarray of shape (n_core_samples, n_features)	Copy of each core sample found by training.
labels_	ndarray of shape (n_samples,)	Cluster labels for each sample. Noisy samples are given the label -1.
n_features_in_	int	Number of features seen during fit.

Usage Examples

Basic Usage

from sklearn.cluster import DBSCAN
import numpy as np

X = np.array([[1, 2], [2, 2], [2, 3],
              [8, 7], [8, 8], [25, 80]])

clustering = DBSCAN(eps=3, min_samples=2).fit(X)
print(clustering.labels_)

Related Pages

Principle:Scikit_learn_Scikit_learn_Clustering

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment