Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Recommenders team Recommenders BaseModel Run Eval

From Leeroopedia


Knowledge Sources
Domains News Recommendation, Evaluation Metrics
Last Updated 2026-02-10 00:00 GMT

Overview

Concrete tool for evaluating a trained news recommendation model on validation or test data, computing impression-level AUC, MRR, NDCG@5, and NDCG@10 metrics.

Description

BaseModel.run_eval is the primary evaluation entry point for all neural news recommendation models that inherit from BaseModel (including NRMSModel, NAMLModel, LSTURModel, and NPAModel). It performs the following:

  1. Mode selection — Checks self.support_quick_scoring to decide between fast and slow evaluation:
    • If True, delegates to run_fast_eval which pre-computes news and user embeddings.
    • If False, delegates to run_slow_eval which runs the full scorer per impression.
  2. Metric computation — Passes the grouped labels and predictions to cal_metric from deeprec_utils, which computes the metrics specified in self.hparams.metrics.
  3. Result return — Returns a dictionary mapping metric names to their computed values.

The slow evaluation path iterates over all test data batches, collects per-sample predictions, labels, and impression indices, then groups them. The fast evaluation path encodes all news articles and users once, then computes scores via numpy dot products.

Usage

Call run_eval after training to assess model quality. It is also called automatically at the end of each epoch by fit() to report validation metrics.

Code Reference

Source Location

Signature

def run_eval(self, news_filename: str, behaviors_file: str) -> dict:
    """Evaluate the given file and returns some evaluation metrics.

    Args:
        news_filename (str): Path to the news metadata file (news.tsv).
        behaviors_file (str): Path to the user behaviors file (behaviors.tsv).

    Returns:
        dict: A dictionary containing evaluation metrics
              (e.g., {"group_auc": 0.67, "mean_mrr": 0.33, "ndcg@5": 0.36, "ndcg@10": 0.42}).
    """

Import

# Accessed via the NRMSModel class (inherits from BaseModel)
from recommenders.models.newsrec.models.nrms import NRMSModel
from recommenders.models.newsrec.io.mind_iterator import MINDIterator

model = NRMSModel(hparams, MINDIterator, seed=42)
# model.run_eval(...) is inherited from BaseModel

I/O Contract

Parameter Type Description
news_filename str Path to the news.tsv file containing news article metadata
behaviors_file str Path to the behaviors.tsv file containing user impression logs
Return Type Description
res dict Dictionary of metric name to value, e.g., {"group_auc": 0.67, "mean_mrr": 0.33, "ndcg@5": 0.36, "ndcg@10": 0.42}

Usage Examples

import os

# After training the model
valid_news_file = os.path.join(data_path, "valid", "news.tsv")
valid_behaviors_file = os.path.join(data_path, "valid", "behaviors.tsv")

# Run evaluation
eval_results = model.run_eval(valid_news_file, valid_behaviors_file)

# Print results
for metric, value in sorted(eval_results.items()):
    print(f"{metric}: {value:.4f}")

# Example output:
# group_auc: 0.6713
# mean_mrr: 0.3298
# ndcg@10: 0.4231
# ndcg@5: 0.3612

Dependencies

  • tensorflow — For running model inference during slow evaluation
  • numpy — For array operations and grouping predictions
  • recommenders.models.deeprec.deeprec_utils.cal_metric — Computes AUC, MRR, NDCG, and other ranking metrics from grouped predictions

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment