Principle:Recommenders team Recommenders News Prediction Export

Knowledge Sources	Recommenders
Domains	News Recommendation, Inference, Embedding
Last Updated	2026-02-10 00:00 GMT

Overview

Efficient news recommendation inference is achieved by pre-computing news and user embeddings once, then scoring all candidate-user pairs via fast dot products instead of running the full model for each impression.

Description

Standard (slow) news recommendation inference runs the entire model (news encoder + user encoder + scorer) for every impression. This is computationally expensive because:

The same news article may appear in thousands of impressions and gets re-encoded each time.
The same user's click history is re-encoded for every impression they appear in.

The fast evaluation principle eliminates this redundancy through a two-phase approach:

Phase 1: Embedding Pre-computation

News Encoding — Run the news encoder on every unique news article in the dataset exactly once, producing a dictionary mapping news IDs to their embedding vectors.
User Encoding — Run the user encoder on every unique user's click history exactly once, producing a dictionary mapping impression indices to their user embedding vectors.

Phase 2: Fast Scoring

For each impression in the behaviors file:
1. Look up the pre-computed news vectors for all candidate articles.
2. Look up the pre-computed user vector for this impression.
3. Compute scores via numpy dot product (no TensorFlow inference needed).
4. Collect the impression index, labels, and predictions.

This approach provides significant speedup, especially on large datasets where the same articles and users appear across many impressions. The trade-off is memory usage (all embeddings must fit in memory), which is acceptable for most practical dataset sizes.

The pre-computed embeddings are also stored as self.news_vecs and self.user_vecs on the model object, making them available for downstream analysis (e.g., nearest-neighbor lookup, embedding visualization, or serving in a production system).

Usage

Use fast prediction export when you need efficient inference on large test sets, when you want to export embeddings for a production serving system, or when you need to analyze the learned representations. It is enabled by setting support_quick_scoring=True in the hyperparameters.

Theoretical Basis

Embedding Pre-computation

News Encoding (run once per unique article):
  for each news article n in news_file:
    news_vecs[n.id] = NewsEncoder(n.title_words)

User Encoding (run once per unique impression):
  for each user impression u in behaviors_file:
    user_vecs[u.impr_index] = UserEncoder(u.clicked_news_titles)

Fast Dot-Product Scoring

For each impression i in behaviors_file:
  candidate_news_ids = [n1, n2, ..., nk]   # news shown in impression
  labels = [l1, l2, ..., lk]                # click labels (0 or 1)

  # Stack pre-computed news vectors into matrix
  news_matrix = stack([news_vecs[n] for n in candidate_news_ids])  # shape: (k, d)

  # Get pre-computed user vector
  user_vec = user_vecs[impr_index]  # shape: (d,)

  # Score via dot product
  scores = news_matrix @ user_vec  # shape: (k,)

Complexity Comparison

Slow eval: O(I * C * model_inference_cost)
  where I = number of impressions, C = average candidates per impression

Fast eval: O(N * news_encode_cost + U * user_encode_cost + I * C * d)
  where N = unique news articles, U = unique users, d = embedding dimension

Since N << I*C and U << I, and dot product is much cheaper than model inference,
fast eval provides orders-of-magnitude speedup on large datasets.

Related Pages

Implemented By

Implementation:Recommenders_team_Recommenders_BaseModel_Run_Fast_Eval

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment