Implementation:Recommenders team Recommenders MINDAll Iterator

Knowledge Sources	Recommenders
Domains	News Recommendation, Data Loading, MIND Dataset
Last Updated	2026-02-10 00:00 GMT

Overview

The MINDAllIterator is a specialized data loader that reads and parses the full MIND dataset format for the NAML news recommendation model, handling multi-field news representations including titles, bodies, categories, and subcategories.

Description

MINDAllIterator extends BaseIterator to provide mini-batch data loading for the NAML model's multi-view architecture. Unlike the simpler MINDIterator which only handles news titles, this iterator processes the complete set of article features required by NAML: title words, body (abstract) words, verticals (categories), and sub-verticals (subcategories).

The iterator loads four pickled dictionaries at initialization: a word dictionary for mapping tokens to indices, a vertical dictionary for category mapping, a sub-vertical dictionary for subcategory mapping, and a user dictionary for user ID indexing. News articles are parsed from a tab-separated file where each article's title and body are tokenized and converted to fixed-length integer index arrays. Behavior logs record each user's click history and impression-level interactions.

During training, the iterator supports negative sampling with a configurable negative-to-positive ratio (npratio). For each positive click in an impression, it samples npratio negative articles and yields the combined batch. When npratio is set to -1, no negative sampling is performed and each news article in an impression is yielded individually. Data is loaded per mini-batch rather than loading the entire dataset into memory, enabling efficient processing of large files.

The iterator provides separate loading methods for different evaluation stages: load_data_from_file for training batches, load_user_from_file for user encoder inference, load_news_from_file for news encoder inference, and load_impression_from_file for impression-level evaluation.

Usage

Use MINDAllIterator when working with the NAML model or any model that requires the full set of MIND article features (title, body, category, subcategory). It is specifically required by the NAML architecture due to its multi-view news encoder design. For models that only require title features (such as NRMS, LSTUR, or NPA), use the simpler MINDIterator instead.

Code Reference

Source Location

Repository: Recommenders
File: recommenders/models/newsrec/io/mind_all_iterator.py
Lines: 1-602

Signature

class MINDAllIterator(BaseIterator):
    def __init__(self, hparams, npratio=-1, col_spliter="\t", ID_spliter="%")
    def load_dict(self, file_path)
    def init_news(self, news_file)
    def init_behaviors(self, behaviors_file)
    def parser_one_line(self, line)
    def load_data_from_file(self, news_file, behavior_file)
    def _convert_data(self, label_list, imp_indexes, user_indexes, candidate_title_indexes, candidate_ab_indexes, candidate_vert_indexes, candidate_subvert_indexes, click_title_indexes, click_ab_indexes, click_vert_indexes, click_subvert_indexes)
    def load_user_from_file(self, news_file, behavior_file)
    def _convert_user_data(self, user_indexes, impr_indexes, click_title_indexes, click_ab_indexes, click_vert_indexes, click_subvert_indexes)
    def load_news_from_file(self, news_file)
    def _convert_news_data(self, news_indexes, candidate_title_indexes, candidate_ab_indexes, candidate_vert_indexes, candidate_subvert_indexes)
    def load_impression_from_file(self, behaivors_file)

Import

from recommenders.models.newsrec.io.mind_all_iterator import MINDAllIterator

I/O Contract

Inputs

Name	Type	Required	Description
hparams	object	Yes	Global hyper-parameters containing batch_size, title_size, body_size, his_size, wordDict_file, vertDict_file, subvertDict_file, and userDict_file
npratio	int	No	Negative-to-positive sampling ratio. Default is -1 (no negative sampling). Set to a positive integer for training.
col_spliter	str	No	Column delimiter in data files. Default is tab character.
ID_spliter	str	No	ID delimiter in data files. Default is "%".
news_file	str	Yes (for load methods)	Path to the news file containing article metadata (ID, category, subcategory, title, abstract, URL).
behavior_file	str	Yes (for load methods)	Path to the behaviors file containing user impression logs.

Outputs

Name	Type	Description
training batch (from load_data_from_file)	dict	Dictionary with keys: "impression_index_batch", "user_index_batch", "clicked_title_batch", "clicked_ab_batch", "clicked_vert_batch", "clicked_subvert_batch", "candidate_title_batch", "candidate_ab_batch", "candidate_vert_batch", "candidate_subvert_batch", "labels" -- all as numpy arrays.
user batch (from load_user_from_file)	dict	Dictionary with keys: "user_index_batch", "impr_index_batch", "clicked_title_batch", "clicked_ab_batch", "clicked_vert_batch", "clicked_subvert_batch" -- all as numpy arrays.
news batch (from load_news_from_file)	dict	Dictionary with keys: "news_index_batch", "candidate_title_batch", "candidate_ab_batch", "candidate_vert_batch", "candidate_subvert_batch" -- all as numpy arrays.
impression data (from load_impression_from_file)	tuple	Tuple of (impression_index, impression_news_indices, user_index, impression_labels).

Usage Examples

Basic Usage

from recommenders.models.newsrec.io.mind_all_iterator import MINDAllIterator

# Initialize the iterator with hyper-parameters
iterator = MINDAllIterator(hparams, npratio=4)

# Load training batches from news and behavior files
for batch in iterator.load_data_from_file(news_file, behavior_file):
    # batch is a dict of numpy arrays ready for model consumption
    labels = batch["labels"]
    candidate_titles = batch["candidate_title_batch"]
    clicked_titles = batch["clicked_title_batch"]
    # ... process batch through NAML model

# Load user features for inference
for user_batch in iterator.load_user_from_file(news_file, behavior_file):
    user_indices = user_batch["user_index_batch"]
    clicked_history = user_batch["clicked_title_batch"]

# Load news features for inference
for news_batch in iterator.load_news_from_file(news_file):
    news_indices = news_batch["news_index_batch"]
    news_titles = news_batch["candidate_title_batch"]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment