Principle:Recommenders team Recommenders News Prediction Export
| Knowledge Sources | |
|---|---|
| Domains | News Recommendation, Inference, Embedding |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Efficient news recommendation inference is achieved by pre-computing news and user embeddings once, then scoring all candidate-user pairs via fast dot products instead of running the full model for each impression.
Description
Standard (slow) news recommendation inference runs the entire model (news encoder + user encoder + scorer) for every impression. This is computationally expensive because:
- The same news article may appear in thousands of impressions and gets re-encoded each time.
- The same user's click history is re-encoded for every impression they appear in.
The fast evaluation principle eliminates this redundancy through a two-phase approach:
Phase 1: Embedding Pre-computation
- News Encoding — Run the news encoder on every unique news article in the dataset exactly once, producing a dictionary mapping news IDs to their embedding vectors.
- User Encoding — Run the user encoder on every unique user's click history exactly once, producing a dictionary mapping impression indices to their user embedding vectors.
Phase 2: Fast Scoring
- For each impression in the behaviors file:
- Look up the pre-computed news vectors for all candidate articles.
- Look up the pre-computed user vector for this impression.
- Compute scores via numpy dot product (no TensorFlow inference needed).
- Collect the impression index, labels, and predictions.
This approach provides significant speedup, especially on large datasets where the same articles and users appear across many impressions. The trade-off is memory usage (all embeddings must fit in memory), which is acceptable for most practical dataset sizes.
The pre-computed embeddings are also stored as self.news_vecs and self.user_vecs on the model object, making them available for downstream analysis (e.g., nearest-neighbor lookup, embedding visualization, or serving in a production system).
Usage
Use fast prediction export when you need efficient inference on large test sets, when you want to export embeddings for a production serving system, or when you need to analyze the learned representations. It is enabled by setting support_quick_scoring=True in the hyperparameters.
Theoretical Basis
Embedding Pre-computation
News Encoding (run once per unique article):
for each news article n in news_file:
news_vecs[n.id] = NewsEncoder(n.title_words)
User Encoding (run once per unique impression):
for each user impression u in behaviors_file:
user_vecs[u.impr_index] = UserEncoder(u.clicked_news_titles)
Fast Dot-Product Scoring
For each impression i in behaviors_file: candidate_news_ids = [n1, n2, ..., nk] # news shown in impression labels = [l1, l2, ..., lk] # click labels (0 or 1) # Stack pre-computed news vectors into matrix news_matrix = stack([news_vecs[n] for n in candidate_news_ids]) # shape: (k, d) # Get pre-computed user vector user_vec = user_vecs[impr_index] # shape: (d,) # Score via dot product scores = news_matrix @ user_vec # shape: (k,)
Complexity Comparison
Slow eval: O(I * C * model_inference_cost) where I = number of impressions, C = average candidates per impression Fast eval: O(N * news_encode_cost + U * user_encode_cost + I * C * d) where N = unique news articles, U = unique users, d = embedding dimension Since N << I*C and U << I, and dot product is much cheaper than model inference, fast eval provides orders-of-magnitude speedup on large datasets.