Principle:DistrictDataLabs Yellowbrick Joint Plot Analysis
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Feature_Analysis, Visualization |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Joint plot analysis is a visualization technique that combines a bivariate scatter plot (or hexbin plot) of two variables with marginal histograms showing each variable's univariate distribution, along with a correlation statistic quantifying their relationship.
Description
A joint plot displays the relationship between two variables in three coordinated views: a central plot showing the joint distribution, and two marginal plots on the axes showing the individual distributions. The central plot is typically a scatter plot or hexagonal bin plot. A correlation coefficient (such as Pearson, Spearman, Kendall tau, or covariance) is computed and displayed to summarize the strength and direction of the relationship.
This combination of views is powerful because it allows the analyst to simultaneously understand: (1) the form of the bivariate relationship (linear, monotonic, clustered), (2) the individual shape of each variable's distribution (skewed, multimodal, normal), and (3) the statistical strength of the association. This makes joint plots especially useful for feature selection and feature engineering in machine learning workflows.
Joint plots can operate in two modes: pairwise feature analysis, where two feature columns from the dataset are compared, and feature-to-target analysis, where a single feature is plotted against the target variable. In both cases, optional marginal histograms (frequency or probability density) provide context about the data distribution.
Usage
Joint plot analysis is used to:
- Examine pairwise feature relationships to detect linear or non-linear correlations.
- Explore feature-target relationships to identify features with strong predictive signal.
- Understand marginal distributions through the histogram panels.
- Detect heteroscedasticity by observing whether the scatter spread changes across the range.
- Identify outliers that appear distant from the main data cloud.
- Compare correlation measures (Pearson vs. Spearman vs. Kendall tau) to assess robustness of the relationship.
Theoretical Basis
Correlation Measures
The joint plot typically displays one of the following correlation statistics:
Pearson correlation measures linear dependence:
Spearman rank correlation measures monotonic dependence by applying Pearson's formula to rank-transformed values.
Kendall tau measures ordinal association based on concordant and discordant pairs:
Covariance measures the joint variability of two variables:
Marginal Distributions
The marginal histograms display the univariate distribution of each variable. When set to "density" mode, the histograms approximate the probability density function, which integrates to 1. This normalization allows comparison between variables on different scales.
Hexbin Plots
As an alternative to scatter plots, hexagonal binning aggregates points into hexagonal cells and colors each cell by the count of points it contains. This is more effective than scatter plots for large datasets where overplotting obscures the density of points.