Heuristic:Recommenders team Recommenders SAR Cold Start Items
| Knowledge Sources | |
|---|---|
| Domains | Recommendation_Systems, Debugging, Collaborative_Filtering |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
SAR assigns a score of 0 to items in the test set that were not seen during training, which can silently degrade recommendation quality.
Description
The SAR (Simple Algorithm for Recommendations) model builds an item co-occurrence matrix during training. When generating recommendations at prediction time, any items present in the test set but absent from the training set cannot have their co-occurrence scores computed. The model handles this by appending a zero-score column and mapping unknown items to index `n_items`, effectively giving them a score of 0. A `logger.warning` is emitted, but the prediction still completes without raising an error.
Usage
Be aware of this heuristic when evaluating SAR models where the train/test split may introduce items that appear only in the test set. This is especially relevant when using random splits (rather than time-based splits) on datasets with a long tail of infrequent items. If you observe unexpectedly low recommendation scores, check for cold-start items in the test set.
The Insight (Rule of Thumb)
- Action: Monitor the warning log for "Items found in test not seen during training" after calling `SARSingleNode.recommend_k_items()` or `SARSingleNode.predict()`.
- Value: Unseen items receive a score of exactly 0.
- Trade-off: The model gracefully handles cold-start items (no crash) but may silently reduce metric scores if many test items are unseen.
- Mitigation: Use stratified splitting (`python_stratified_split`) which ensures item coverage across train/test, or filter cold-start items from evaluation.
Reasoning
SAR is a pure collaborative filtering method with no content features, so it cannot score items it has never observed. The code handles this by:
- Mapping unknown items to `np.NaN` via `self.item2index.get(item, np.NaN)`
- Detecting NaN values with `np.isnan(item_ids)`
- Appending a zero-score column: `np.zeros((self.n_users, 1))`
- Remapping NaN item IDs to `self.n_items` (the new zero column index)
Code evidence from `recommenders/models/sar/sar_singlenode.py:575-590`:
item_ids = np.asarray(
list(
map(
lambda item: self.item2index.get(item, np.NaN),
test[self.col_item].values,
)
)
)
nans = np.isnan(item_ids)
if any(nans):
logger.warning(
"Items found in test not seen during training, new items will have score of 0"
)
test_scores = np.append(test_scores, np.zeros((self.n_users, 1)), axis=1)
item_ids[nans] = self.n_items
item_ids = item_ids.astype("int64")