Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn MutualInfoClassif

From Leeroopedia
Revision as of 16:36, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Scikit_learn_Scikit_learn_MutualInfoClassif.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Feature Selection, Information Theory
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for estimating mutual information between features and a target variable provided by scikit-learn.

Description

The _mutual_info module provides mutual_info_classif and mutual_info_regression functions that estimate mutual information between each feature and the target variable. Mutual information measures the dependency between variables and is zero if and only if two random variables are independent. Unlike correlation-based methods, mutual information can capture any kind of statistical dependency, including non-linear relationships. The implementation is based on the k-nearest neighbors approach by Kraskov et al.

Usage

Use mutual_info_classif for feature selection when you need to rank features by their statistical dependency with a discrete target variable (classification). Use mutual_info_regression for continuous targets. These functions are particularly useful as score functions for SelectKBest or SelectPercentile, especially when non-linear relationships between features and targets are expected.

Code Reference

Source Location

Signature

def mutual_info_classif(
    X,
    y,
    *,
    discrete_features="auto",
    n_neighbors=3,
    copy=True,
    random_state=None,
    n_jobs=None,
):

def mutual_info_regression(
    X,
    y,
    *,
    discrete_features="auto",
    n_neighbors=3,
    copy=True,
    random_state=None,
    n_jobs=None,
):

Import

from sklearn.feature_selection import mutual_info_classif
from sklearn.feature_selection import mutual_info_regression

I/O Contract

Inputs

Name Type Required Description
X array-like of shape (n_samples, n_features) Yes Feature matrix.
y array-like of shape (n_samples,) Yes Target variable (discrete for classif, continuous for regression).
discrete_features 'auto', bool, or array-like No Whether features are discrete. 'auto' treats them as continuous. Default is 'auto'.
n_neighbors int No Number of neighbors for MI estimation. Default is 3.
copy bool No Whether to make a copy of the given data. Default is True.
random_state int or RandomState No Random state for reproducibility (used to break ties in neighbor search).
n_jobs int No Number of parallel jobs. Default is None (1 job).

Outputs

Name Type Description
mi ndarray of shape (n_features,) Estimated mutual information between each feature and the target in nat units.

Usage Examples

Basic Usage

from sklearn.feature_selection import mutual_info_classif
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=200, n_features=10, n_informative=3, random_state=42)
mi_scores = mutual_info_classif(X, y, random_state=42)

# Print feature importances
for i, score in enumerate(mi_scores):
    print(f"Feature {i}: MI = {score:.4f}")

# Use with SelectKBest
from sklearn.feature_selection import SelectKBest
selector = SelectKBest(mutual_info_classif, k=3)
X_selected = selector.fit_transform(X, y)
print(f"Selected features shape: {X_selected.shape}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment