Principle:DistrictDataLabs Yellowbrick Feature Importance Ranking
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Model_Selection, Hyperparameter_Tuning |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Feature importance ranking is a model interpretation technique that quantifies the contribution of each input feature to a model's predictions, enabling practitioners to identify the most informative variables and guide feature engineering decisions.
Description
In machine learning, not all input features contribute equally to a model's predictive power. Some features carry strong signal, while others may be noisy, redundant, or irrelevant. Feature importance ranking provides a quantitative measure of each feature's contribution, allowing practitioners to understand which variables drive the model's decisions.
There are two primary mechanisms by which models expose feature importance. Tree-based models such as Random Forests and Gradient Boosting Machines compute feature_importances_ based on criteria such as the total reduction in impurity (e.g. Gini impurity or entropy) that each feature provides across all splits in all trees. Linear models such as Logistic Regression and Support Vector Machines expose coef_ (coefficient) arrays that represent the weight assigned to each feature in the decision function. The magnitude (and optionally the sign) of these coefficients indicates how strongly each feature influences predictions.
For multi-class classifiers, coefficient arrays may be multi-dimensional with shape (n_classes, n_features). In such cases, importances can be aggregated by computing the mean coefficient magnitude across classes, or they can be visualized per-class using stacked representations. Feature importance rankings can be displayed as absolute values for easier comparison, or in relative terms normalized to the strongest feature. Practitioners commonly examine the top-N or bottom-N features to focus on the most or least influential variables.
Usage
Feature importance ranking should be used when:
- You want to understand which features are driving your model's predictions.
- You need to perform feature selection by identifying and removing low-importance features.
- You want to communicate model interpretability to stakeholders.
- You are debugging a model that may be relying on spurious or unexpected features.
- You need to reduce model complexity by dropping uninformative features.
Theoretical Basis
Tree-Based Importance
For tree-based ensembles, the importance of a feature is computed as the total weighted reduction in the splitting criterion across all nodes where feature is used:
where is the set of all tree nodes that split on feature , is the fraction of samples reaching node , and is the impurity decrease at that node.
For Gini impurity:
where is the proportion of class samples at node .
Coefficient-Based Importance
For linear models, the importance of feature is derived from the model coefficient:
For multi-class models with coefficient matrix , the aggregated importance is:
Relative Importance
Relative importance normalizes all values to the maximum: