Principle:Scikit learn Scikit learn Partial Dependence Analysis

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Model Interpretation, Data Visualization
Last Updated	2026-02-08 15:00 GMT

Overview

Partial dependence analysis visualizes the marginal effect of one or two features on the predicted outcome of a machine learning model, isolating feature effects from the rest of the feature space.

Description

Partial dependence analysis is a model interpretation technique that shows how the prediction of a trained model changes as a function of one or two features, averaged over all other feature values. It solves the problem of understanding what a complex model has learned about the relationship between individual features and the target, providing a global summary of feature effects. Decision boundary visualization complements this by showing how a classifier partitions the feature space. These tools sit within the model interpretability pipeline and are essential for explaining model behavior to stakeholders, debugging models, and validating that learned relationships are scientifically plausible.

Usage

Use partial dependence plots (PDPs) to understand the average effect of one or two features on predictions from any supervised model, particularly when the model is a black box (e.g., gradient boosting, random forest). Use Individual Conditional Expectation (ICE) plots to see the effect per-instance, revealing heterogeneity that the average PDP may hide. Use decision boundary displays to visualize how a classifier partitions a two-dimensional feature space, useful for understanding model behavior in educational and diagnostic contexts.

Theoretical Basis

Partial Dependence Function: For a model $f$ , the partial dependence of the prediction on features $X_{S}$ is:

${\hat{f}}_{S} (x_{S}) = E_{X_{C}} [f (x_{S}, X_{C})] = \frac{1}{n} \sum_{i = 1}^{n} f (x_{S}, x_{C}^{(i)})$

where $X_{S}$ is the set of features of interest, $X_{C}$ is the complement set, and the expectation is approximated by averaging over all training instances' values of $X_{C}$ .

For a single feature $x_{j}$ :

${\hat{f}}_{j} (x_{j}) = \frac{1}{n} \sum_{i = 1}^{n} f (x_{1}^{(i)}, \dots, x_{j}, \dots, x_{d}^{(i)})$

For two features $(x_{j}, x_{k})$ , the partial dependence surface is:

${\hat{f}}_{j k} (x_{j}, x_{k}) = \frac{1}{n} \sum_{i = 1}^{n} f (\dots, x_{j}, \dots, x_{k}, \dots)$

Individual Conditional Expectation (ICE): Instead of averaging, ICE plots show the prediction for each individual instance as the feature varies:

${\hat{f}}_{j}^{(i)} (x_{j}) = f (x_{1}^{(i)}, \dots, x_{j}, \dots, x_{d}^{(i)})$

The PDP is the mean of all ICE curves. ICE plots reveal interaction effects and heterogeneous feature relationships that PDPs mask.

Assumptions and Limitations:

PDPs assume that the features in $X_{S}$ and $X_{C}$ are independent. When features are correlated, the averaging may produce predictions for unrealistic feature combinations.
Accumulated Local Effects (ALE) plots address this limitation by computing effects using conditional rather than marginal distributions.

Decision Boundary Display: For a two-dimensional feature space, the decision boundary is the set of points where the predicted class changes:

$\partial D = {x : \hat{y} (x + ϵ) \neq \hat{y} (x) for some infinitesimal ϵ}$

This is visualized by evaluating the classifier on a grid and coloring regions by predicted class.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment