Principle:Scikit learn Scikit learn Feature Importance Analysis

Overview

An interpretability technique that quantifies the contribution of each input feature to model predictions.

Description

Feature importance analysis provides a way to understand which input features are most influential in a trained model's predictions. There are two primary approaches used with ensemble models in scikit-learn:

Impurity-based importance (Mean Decrease in Impurity, MDI): For tree-based ensembles, each feature's importance is computed as the total (normalized) reduction in the splitting criterion (e.g., Gini impurity or entropy) brought about by that feature across all trees in the ensemble. This is available as the Template:Code attribute on fitted forest and gradient boosting models.

Permutation-based importance: This model-agnostic technique measures how much the model's performance degrades when a single feature's values are randomly shuffled. For each feature, the column is permuted multiple times, and the resulting drop in the chosen scoring metric is recorded. A large drop indicates the feature is important; a negligible drop suggests the feature is uninformative or redundant.

Each approach has distinct advantages and drawbacks:

Impurity-based importance is fast to compute (it is a byproduct of tree construction), but it is biased toward high-cardinality features -- features with many unique values tend to receive artificially inflated importance scores. Additionally, impurity importance reflects training-set behavior and may not represent the feature's importance for generalization.
Permutation importance is model-agnostic and can be computed on a held-out test set, making it a more reliable indicator of a feature's contribution to generalization. However, it is computationally more expensive (requiring multiple model evaluations) and can give misleading results when features are strongly correlated (permuting one correlated feature may be partially compensated by its correlated partner).

Usage

Feature importance analysis is appropriate when:

You need to interpret which features drive the predictions of an ensemble model.
You want to perform feature selection by identifying and removing uninformative features.
You need to communicate model behavior to stakeholders in an understandable way.
You want to diagnose potential data leakage by checking whether unexpected features rank as highly important.

Theoretical Basis

Mean Decrease in Impurity (MDI): For each feature, the total reduction in the node splitting criterion (weighted by the proportion of samples reaching that node) is summed across all trees and normalized. In a random forest with Gini criterion, this is often called "Gini importance." The MDI score reflects how useful a feature is for partitioning the training data, but is biased in favor of features with many possible split points.
Permutation-Based Score Drop: The permutation importance of feature j is defined as the decrease in the model's scoring metric when the values of feature j are randomly permuted, breaking the association between feature j and the target. This is repeated Template:Code times to obtain a distribution of importance scores (mean and standard deviation). Because permutation importance is computed on actual predictions, it captures the feature's effect on the model's generalization performance when evaluated on held-out data.

Related Pages

Implementation:Scikit_learn_Scikit_learn_Permutation_Importance

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment