Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Fastai Fastbook Embedding Inspection

From Leeroopedia


Knowledge Sources
Domains Recommender Systems, Interpretability
Last Updated 2026-02-09 17:00 GMT

Overview

Concrete patterns for inspecting learned embeddings and biases from a trained collaborative filtering model using PyTorch, fastai, and scikit-learn.

Description

This implementation covers three post-training analysis techniques applied to the embeddings of a trained EmbeddingDotBias or EmbeddingNN model:

  1. Bias ranking: Extract the item bias vector from model.i_bias.weight, squeeze it to 1D, and sort to find the highest- and lowest-rated items independent of user-preference match.
  2. Cosine similarity search: Extract the item weight matrix from model.i_weight.weight, select a query item's embedding vector, and compute nn.CosineSimilarity against all other items to find the most similar items.
  3. PCA scatter plot: Extract the item weight matrix, detach it from the computation graph, apply sklearn.decomposition.PCA with 2 components, and plot the results to visualize latent structure.

Usage

Use these patterns after training any collaborative filtering model. For the dot-product model (EmbeddingDotBias), access embeddings via learn.model.i_weight, learn.model.i_bias, learn.model.u_weight, and learn.model.u_bias. For neural models, the embedding layers are accessible through the model's embeds attribute.

Code Reference

Source Location

  • Repository: fastbook
  • File: translations/cn/08_collab.md (Lines 504-613)

Signature

# Bias extraction and ranking
model.i_bias.weight.squeeze() -> Tensor  # shape: (n_items,)

# Embedding weight extraction
model.i_weight.weight -> Tensor  # shape: (n_items, n_factors)

# Cosine similarity between item embeddings
nn.CosineSimilarity(dim=1)(
    movie_factors,           # shape: (n_items, n_factors)
    query_vector[None]       # shape: (1, n_factors), broadcasted
) -> Tensor                  # shape: (n_items,)

# PCA dimensionality reduction
from sklearn.decomposition import PCA
PCA(n_components=2).fit_transform(
    movie_factors.detach().numpy()   # shape: (n_items, n_factors)
) -> ndarray                          # shape: (n_items, 2)

Import

import torch.nn as nn
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

I/O Contract

Inputs

Name Type Required Description
learn.model EmbeddingDotBias or EmbeddingNN Yes A trained collaborative filtering model with accessible embedding layers
dls.classes dict Yes Mapping from contiguous indices to original category labels (e.g., movie titles)
query_item str No Item title for cosine similarity search; e.g., 'Silence of the Lambs, The (1991)'

Outputs

Name Type Description
Bias ranking (lowest) list of str Item titles with the most negative bias (universally disliked)
Bias ranking (highest) list of str Item titles with the most positive bias (universally liked)
Cosine similarity result str Title of the most similar item to the query
PCA scatter plot matplotlib Figure 2D scatter plot of items positioned by their top-2 principal components

Usage Examples

Bias Ranking

# Extract movie biases from the trained dot-product model
movie_bias = learn.model.i_bias.weight.squeeze()

# Find 5 movies with the lowest bias (universally disliked)
idxs = movie_bias.argsort()[:5]
worst_movies = [dls.classes['title'][i] for i in idxs]
print('Lowest bias:', worst_movies)
# Output: ['Children of the Corn: The Gathering (1996)',
#          'Lawnmower Man 2: Beyond Cyberspace (1996)',
#          'Beautician and the Beast, The (1997)',
#          'Crow: City of Angels, The (1996)',
#          'Home Alone 3 (1997)']

# Find 5 movies with the highest bias (universally loved)
idxs = movie_bias.argsort(descending=True)[:5]
best_movies = [dls.classes['title'][i] for i in idxs]
print('Highest bias:', best_movies)
# Output: ['Titanic (1997)', "Schindler's List (1993)",
#          'Shawshank Redemption, The (1994)',
#          'L.A. Confidential (1997)',
#          'Silence of the Lambs, The (1991)']

Cosine Similarity Search

import torch.nn as nn

# Extract item embedding weights
movie_factors = learn.model.i_weight.weight

# Look up the index for a query movie
idx = dls.classes['title'].o2i['Silence of the Lambs, The (1991)']

# Compute cosine similarity between this movie and all others
distances = nn.CosineSimilarity(dim=1)(movie_factors, movie_factors[idx][None])

# Find the most similar movie (excluding the query itself at index 0)
most_similar_idx = distances.argsort(descending=True)[1]
most_similar_title = dls.classes['title'][most_similar_idx]
print('Most similar to Silence of the Lambs:', most_similar_title)
# Output: 'Dial M for Murder (1954)'

PCA Visualization

from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Extract item embeddings and convert to numpy
movie_factors_np = learn.model.i_weight.weight.detach().cpu().numpy()

# Reduce to 2 dimensions with PCA
pca = PCA(n_components=2)
components = pca.fit_transform(movie_factors_np)

# Plot the 2D projection
fig, ax = plt.subplots(figsize=(12, 8))
ax.scatter(components[:, 0], components[:, 1], alpha=0.3, s=10)

# Annotate a few well-known movies for reference
notable_movies = [
    'Toy Story (1995)', 'Star Wars (1977)',
    'Titanic (1997)', 'Fargo (1996)',
    'Silence of the Lambs, The (1991)'
]
for title in notable_movies:
    idx = dls.classes['title'].o2i[title]
    ax.annotate(title, (components[idx, 0], components[idx, 1]),
                fontsize=8, alpha=0.8)

ax.set_xlabel('First Principal Component')
ax.set_ylabel('Second Principal Component')
ax.set_title('Movie Embeddings - PCA Projection')
plt.tight_layout()
plt.show()

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment