Heuristic:Recommenders team Recommenders SAR Cold Start Items

Knowledge Sources	SAR source code warning
Domains	Recommendation_Systems, Debugging, Collaborative_Filtering
Last Updated	2026-02-10 00:00 GMT

Overview

SAR assigns a score of 0 to items in the test set that were not seen during training, which can silently degrade recommendation quality.

Description

The SAR (Simple Algorithm for Recommendations) model builds an item co-occurrence matrix during training. When generating recommendations at prediction time, any items present in the test set but absent from the training set cannot have their co-occurrence scores computed. The model handles this by appending a zero-score column and mapping unknown items to index `n_items`, effectively giving them a score of 0. A `logger.warning` is emitted, but the prediction still completes without raising an error.

Usage

Be aware of this heuristic when evaluating SAR models where the train/test split may introduce items that appear only in the test set. This is especially relevant when using random splits (rather than time-based splits) on datasets with a long tail of infrequent items. If you observe unexpectedly low recommendation scores, check for cold-start items in the test set.

The Insight (Rule of Thumb)

Action: Monitor the warning log for "Items found in test not seen during training" after calling `SARSingleNode.recommend_k_items()` or `SARSingleNode.predict()`.
Value: Unseen items receive a score of exactly 0.
Trade-off: The model gracefully handles cold-start items (no crash) but may silently reduce metric scores if many test items are unseen.
Mitigation: Use stratified splitting (`python_stratified_split`) which ensures item coverage across train/test, or filter cold-start items from evaluation.

Reasoning

SAR is a pure collaborative filtering method with no content features, so it cannot score items it has never observed. The code handles this by:

Mapping unknown items to `np.NaN` via `self.item2index.get(item, np.NaN)`
Detecting NaN values with `np.isnan(item_ids)`
Appending a zero-score column: `np.zeros((self.n_users, 1))`
Remapping NaN item IDs to `self.n_items` (the new zero column index)

Code evidence from `recommenders/models/sar/sar_singlenode.py:575-590`:

item_ids = np.asarray(
    list(
        map(
            lambda item: self.item2index.get(item, np.NaN),
            test[self.col_item].values,
        )
    )
)
nans = np.isnan(item_ids)
if any(nans):
    logger.warning(
        "Items found in test not seen during training, new items will have score of 0"
    )
    test_scores = np.append(test_scores, np.zeros((self.n_users, 1)), axis=1)
    item_ids[nans] = self.n_items
    item_ids = item_ids.astype("int64")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment