Implementation:Scikit learn Scikit learn PairwiseMetrics
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Distance Computation |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for computing pairwise distances and kernel functions between sets of samples provided by scikit-learn.
Description
The pairwise metrics module provides efficient implementations for computing distances and kernel functions between pairs of samples. It includes distance functions (Euclidean, Manhattan, cosine, Haversine, NaN-aware Euclidean), kernel functions (linear, polynomial, RBF, sigmoid, Laplacian, chi-squared), and utility functions for batched/chunked distance computation. The module supports both dense and sparse input matrices and provides optimized parallel computation.
Usage
Use this module when computing distance matrices for nearest neighbor algorithms, kernel matrices for kernel-based methods (SVM, kernel PCA), or when you need efficient pairwise computations for large datasets with optional chunked processing to control memory usage.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/metrics/pairwise.py
Signature
# Distance functions
def euclidean_distances(X, Y=None, *, Y_norm_squared=None, squared=False, X_norm_squared=None)
def nan_euclidean_distances(X, Y=None, *, squared=False, missing_values=np.nan, copy=True)
def cosine_distances(X, Y=None)
def manhattan_distances(X, Y=None)
def haversine_distances(X, Y=None)
def paired_euclidean_distances(X, Y)
def paired_manhattan_distances(X, Y)
def paired_cosine_distances(X, Y)
def paired_distances(X, Y, *, metric="euclidean", **kwds)
# Kernel functions
def linear_kernel(X, Y=None, dense_output=True)
def polynomial_kernel(X, Y=None, degree=3, gamma=None, coef0=1)
def sigmoid_kernel(X, Y=None, gamma=None, coef0=1)
def rbf_kernel(X, Y=None, gamma=None)
def laplacian_kernel(X, Y=None, gamma=None)
def cosine_similarity(X, Y=None, dense_output=True)
def additive_chi2_kernel(X, Y=None)
def chi2_kernel(X, Y=None, gamma=1.0)
# General-purpose functions
def pairwise_distances(X, Y=None, metric="euclidean", *, n_jobs=None, force_all_finite=True, **kwds)
def pairwise_distances_chunked(X, Y=None, *, reduce_func=None, metric="euclidean", n_jobs=None, working_memory=None, **kwds)
def pairwise_distances_argmin(X, Y, *, axis=1, metric="euclidean", metric_kwargs=None)
def pairwise_distances_argmin_min(X, Y, *, axis=1, metric="euclidean", metric_kwargs=None)
def pairwise_kernels(X, Y=None, metric="linear", *, filter_params=False, n_jobs=None, **kwds)
Import
from sklearn.metrics.pairwise import euclidean_distances, cosine_similarity
from sklearn.metrics.pairwise import rbf_kernel, pairwise_distances
from sklearn.metrics.pairwise import pairwise_kernels, pairwise_distances_chunked
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like or sparse matrix of shape (n_samples_X, n_features) | Yes | First input array of samples |
| Y | array-like or sparse matrix of shape (n_samples_Y, n_features) | No | Second input array (defaults to X if None) |
| metric | str or callable | No | Distance metric or kernel name (default varies by function) |
| n_jobs | int | No | Number of parallel jobs for computation |
| gamma | float | No | Kernel coefficient for RBF, Laplacian, polynomial, and sigmoid kernels |
| degree | int | No | Degree of polynomial kernel |
| coef0 | float | No | Independent term in polynomial and sigmoid kernels |
| working_memory | int | No | Maximum memory (in MB) for chunked distance computation |
Outputs
| Name | Type | Description |
|---|---|---|
| distances | ndarray of shape (n_samples_X, n_samples_Y) | Pairwise distance matrix |
| kernel_matrix | ndarray of shape (n_samples_X, n_samples_Y) | Pairwise kernel matrix |
| argmin | ndarray of shape (n_samples_X,) | Indices of nearest samples in Y for each sample in X |
Usage Examples
Basic Usage
import numpy as np
from sklearn.metrics.pairwise import euclidean_distances, rbf_kernel, cosine_similarity
X = np.array([[0, 1], [1, 0], [2, 2]])
Y = np.array([[1, 1], [0, 0]])
# Compute Euclidean distance matrix
dist = euclidean_distances(X, Y)
print("Euclidean distances:\n", dist)
# Compute RBF kernel matrix
K = rbf_kernel(X, Y, gamma=0.5)
print("RBF kernel:\n", K)
# Compute cosine similarity
sim = cosine_similarity(X, Y)
print("Cosine similarity:\n", sim)