Principle:Scikit learn contrib Imbalanced learn Value Difference Metric

Principle: Value Difference Metric

The Value Difference Metric (VDM) is a distance metric designed specifically for categorical features. Unlike standard distance metrics that treat nominal values as either identical or completely different, VDM computes distances based on the statistical relationship between feature values and class labels. Feature values that lead to similar class distributions are considered close, even if they are nominally distinct.

Mathematical Formulation

Per-Feature Distance

For a single feature f, the distance between two values x and y is defined as:

delta(x, y) = sum_c |p(c|x_f) - p(c|y_f)|^k

where:

C is the set of all classes.
p(c|x_f) is the conditional probability that the output class is c given that feature f has the value x.
k is an exponent, typically set to 1 or 2.

This captures the idea that two feature values are "close" if observing either value leads to a similar distribution over class labels.

Full Vector Distance

For two complete feature vectors X and Y, the distance is:

Delta(X, Y) = sum_f delta(X_f, Y_f)^r

where:

F is the number of features.
r is an exponent, typically set to 1 or 2.

This aggregates the per-feature distances into a single scalar distance, analogous to the Minkowski distance family but operating over categorical probability distributions rather than raw numerical values.

Intuition

Consider a medical dataset with a "Symptom" feature having values {cough, sneeze, chest_pain}. If both cough and sneeze are associated with similar class distributions (e.g., both predominantly linked to "cold"), then VDM will assign a small distance between them. In contrast, chest_pain may be associated with a very different class distribution (e.g., predominantly linked to "heart_disease"), yielding a larger distance from the other two values.

This behavior is fundamentally different from Hamming distance, which would treat all three values as equally distant from each other.

Key Properties

Class-conditional: The distance is derived entirely from the relationship between feature values and class labels, making it inherently supervised.
Probability-based: By working with conditional probability distributions, VDM naturally handles imbalanced category frequencies.
Composable: The per-feature distances are aggregated via a Minkowski-like sum, allowing the metric to scale to multi-feature datasets.
Encoding requirement: In practice, categorical features must be ordinally encoded (e.g., via OrdinalEncoder) before computing VDM distances.

Reference

Stanfill, Craig, and David Waltz. "Toward memory-based reasoning." Communications of the ACM 29.12 (1986): 1213-1228.

Related Pages

Implementation:Scikit_learn_contrib_Imbalanced_learn_ValueDifferenceMetric

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment