Implementation:Recommenders team Recommenders BaseModel Run Eval

Knowledge Sources	Recommenders
Domains	News Recommendation, Evaluation Metrics
Last Updated	2026-02-10 00:00 GMT

Overview

Concrete tool for evaluating a trained news recommendation model on validation or test data, computing impression-level AUC, MRR, NDCG@5, and NDCG@10 metrics.

Description

BaseModel.run_eval is the primary evaluation entry point for all neural news recommendation models that inherit from BaseModel (including NRMSModel, NAMLModel, LSTURModel, and NPAModel). It performs the following:

Mode selection — Checks self.support_quick_scoring to decide between fast and slow evaluation:
- If True, delegates to run_fast_eval which pre-computes news and user embeddings.
- If False, delegates to run_slow_eval which runs the full scorer per impression.
Metric computation — Passes the grouped labels and predictions to cal_metric from deeprec_utils, which computes the metrics specified in self.hparams.metrics.
Result return — Returns a dictionary mapping metric names to their computed values.

The slow evaluation path iterates over all test data batches, collects per-sample predictions, labels, and impression indices, then groups them. The fast evaluation path encodes all news articles and users once, then computes scores via numpy dot products.

Usage

Call run_eval after training to assess model quality. It is also called automatically at the end of each epoch by fit() to report validation metrics.

Code Reference

Source Location

Repository: recommenders-team/recommenders
File: recommenders/models/newsrec/models/base_model.py (lines 321-340)

Signature

def run_eval(self, news_filename: str, behaviors_file: str) -> dict:
    """Evaluate the given file and returns some evaluation metrics.

    Args:
        news_filename (str): Path to the news metadata file (news.tsv).
        behaviors_file (str): Path to the user behaviors file (behaviors.tsv).

    Returns:
        dict: A dictionary containing evaluation metrics
              (e.g., {"group_auc": 0.67, "mean_mrr": 0.33, "ndcg@5": 0.36, "ndcg@10": 0.42}).
    """

Import

# Accessed via the NRMSModel class (inherits from BaseModel)
from recommenders.models.newsrec.models.nrms import NRMSModel
from recommenders.models.newsrec.io.mind_iterator import MINDIterator

model = NRMSModel(hparams, MINDIterator, seed=42)
# model.run_eval(...) is inherited from BaseModel

I/O Contract

Parameter	Type	Description
`news_filename`	`str`	Path to the news.tsv file containing news article metadata
`behaviors_file`	`str`	Path to the behaviors.tsv file containing user impression logs

Return	Type	Description
`res`	`dict`	Dictionary of metric name to value, e.g., `{"group_auc": 0.67, "mean_mrr": 0.33, "ndcg@5": 0.36, "ndcg@10": 0.42}`

Usage Examples

import os

# After training the model
valid_news_file = os.path.join(data_path, "valid", "news.tsv")
valid_behaviors_file = os.path.join(data_path, "valid", "behaviors.tsv")

# Run evaluation
eval_results = model.run_eval(valid_news_file, valid_behaviors_file)

# Print results
for metric, value in sorted(eval_results.items()):
    print(f"{metric}: {value:.4f}")

# Example output:
# group_auc: 0.6713
# mean_mrr: 0.3298
# ndcg@10: 0.4231
# ndcg@5: 0.3612

Dependencies

tensorflow — For running model inference during slow evaluation
numpy — For array operations and grouping predictions
recommenders.models.deeprec.deeprec_utils.cal_metric — Computes AUC, MRR, NDCG, and other ranking metrics from grouped predictions

Related Pages

Implements Principle

Principle:Recommenders_team_Recommenders_News_Recommendation_Evaluation

Requires Environment

Uses Heuristic

Heuristic:Recommenders_team_Recommenders_TensorFlow_Session_Ordering

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment