Implementation:Fastai Fastbook Feature Importances
| Knowledge Sources | |
|---|---|
| Domains | Model Interpretation, Feature Selection |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Concrete tools for analyzing feature importance, partial dependence, and per-row prediction decomposition, provided by scikit-learn, treeinterpreter, and pandas.
Description
This implementation covers three complementary model interpretation techniques used in the fastbook Tabular Modeling chapter:
model.feature_importances_: A scikit-learn attribute on fitted tree ensemble models that provides a normalized array of importance scores (summing to 1.0) based on mean decrease in impurity across all trees.plot_partial_dependence: A scikit-learn function that computes and visualizes partial dependence plots, showing how the model's average prediction changes as a single feature varies while all other features remain at their observed values.treeinterpreter.predict: A third-party library function that decomposes each prediction into a bias term (global mean) plus per-feature contributions, enabling row-level explanation of model decisions.
Usage
Use these tools after training a RandomForestRegressor to understand model behavior. Start with global feature importance to identify the most influential features and remove low-importance ones. Then use partial dependence plots to understand the shape of each important feature's relationship with the target. Finally, use treeinterpreter for case-by-case explanation of individual predictions in production.
Code Reference
Source Location
- Repository: fastbook
- File: translations/cn/09_tabular.md (Lines 753-1070)
- Note: These are external tools (scikit-learn, treeinterpreter) demonstrated in the fastbook chapter.
Signature
# 1. Global feature importance (attribute on fitted model)
model.feature_importances_ # numpy.ndarray of shape (n_features,)
# 2. Partial dependence plots
from sklearn.inspection import plot_partial_dependence
plot_partial_dependence(estimator, X, features, grid_resolution=20, ax=None)
# 3. Tree interpretation (per-row decomposition)
import treeinterpreter
prediction, bias, contributions = treeinterpreter.predict(model, X)
Import
from sklearn.inspection import plot_partial_dependence
import treeinterpreter
import pandas as pd
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | RandomForestRegressor (fitted) | Yes | A trained random forest model from which to extract importances and predictions. |
| X (for partial dependence) | pandas.DataFrame or numpy.ndarray | Yes | Feature matrix (typically the validation set features) over which to compute partial dependence. |
| features (for partial dependence) | list of str or list of int | Yes | Column names or indices of features to plot. |
| grid_resolution | int | No | Number of grid points for partial dependence computation. Default 20. |
| X (for treeinterpreter) | numpy.ndarray | Yes | Feature matrix for the rows to decompose. Use df.values to convert from DataFrame.
|
Outputs
| Name | Type | Description |
|---|---|---|
| feature_importances_ | numpy.ndarray (n_features,) | Normalized importance scores for each feature, summing to 1.0. Higher values indicate features used for more impactful splits. |
| Partial dependence plot | matplotlib figure | Line plot showing how average prediction varies as the specified feature changes, with all other features held constant. |
| prediction (treeinterpreter) | numpy.ndarray (n_rows, 1) | The model's prediction for each input row (same as model.predict(X)).
|
| bias (treeinterpreter) | numpy.ndarray (n_rows, 1) | The global mean of the training target, representing the prediction before any feature-based adjustments. |
| contributions (treeinterpreter) | numpy.ndarray (n_rows, n_features) | Per-feature contribution for each row. bias + contributions.sum(axis=1) == prediction.
|
Usage Examples
Basic Usage
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.inspection import plot_partial_dependence
import treeinterpreter
# Assume 'm' is a fitted RandomForestRegressor and 'xs' is the feature DataFrame
# --- 1. Global Feature Importance ---
def rf_feat_importance(m, df):
return pd.DataFrame({'cols': df.columns, 'imp': m.feature_importances_}
).sort_values('imp', ascending=False)
fi = rf_feat_importance(m, xs)
print(fi[:10])
# Plot top 30 features
fi[:30].plot('cols', 'imp', 'barh', figsize=(12, 7), legend=False)
plt.title('Feature Importances')
plt.show()
# --- 2. Remove Low-Importance Features ---
to_keep = fi[fi.imp > 0.005].cols
xs_imp = xs[to_keep]
valid_xs_imp = valid_xs[to_keep]
# Retrain and verify RMSE is maintained
# --- 3. Partial Dependence Plots ---
fig, ax = plt.subplots(figsize=(12, 4))
plot_partial_dependence(m, valid_xs_imp, ['YearMade', 'ProductSize'],
grid_resolution=20, ax=ax)
plt.show()
# --- 4. Tree Interpretation (Per-Row) ---
row = valid_xs_imp.iloc[:5]
prediction, bias, contributions = treeinterpreter.predict(m, row.values)
# For the first row:
print(f"Prediction: {prediction[0]}")
print(f"Bias: {bias[0]}")
print(f"Sum: {bias[0] + contributions[0].sum()}")
Redundancy Analysis
from sklearn.ensemble import RandomForestRegressor
# Quick OOB score function for comparing feature subsets
def get_oob(df):
m = RandomForestRegressor(n_estimators=40, min_samples_leaf=15,
max_samples=50000, max_features=0.5, n_jobs=-1, oob_score=True)
m.fit(df, y)
return m.oob_score_
# Baseline
print(f"Baseline OOB: {get_oob(xs_imp)}")
# Test removing potentially redundant columns one at a time
for c in ('saleYear', 'saleElapsed', 'ProductGroupDesc', 'ProductGroup',
'fiModelDesc', 'fiBaseModel'):
print(f"Drop {c}: OOB = {get_oob(xs_imp.drop(c, axis=1))}")
# Drop multiple redundant columns
to_drop = ['saleYear', 'ProductGroupDesc', 'fiBaseModel', 'Grouser_Tracks']
xs_final = xs_imp.drop(to_drop, axis=1)
print(f"Final OOB: {get_oob(xs_final)}")