Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:LaurentMazare Tch rs Dataset Iteration

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Data_Loading
Last Updated 2026-02-08 14:00 GMT

Overview

Pattern for iterating over datasets in mini-batches with optional shuffling and device transfer for training loops.

Description

Dataset iteration provides a way to split large datasets into smaller mini-batches for stochastic gradient descent training. Each iteration yields a pair of tensors (inputs, labels) of the specified batch size. The iterator supports shuffling (randomizing sample order each epoch) and device transfer (moving batches to GPU). The last batch may be smaller than the requested batch size. This is the standard data feeding mechanism for all training workflows in tch-rs.

Usage

Use in every training loop to iterate over training data in mini-batches. Call .shuffle() for randomized sample ordering and .to_device(device) for GPU transfer.

Theoretical Basis

Mini-batch Training:
  For each epoch:
    For each batch (xs, ys) in dataset.train_iter(batch_size).shuffle():
      logits = model.forward(xs)
      loss = loss_fn(logits, ys)
      optimizer.backward_step(loss)

Iter2 API chain:
  dataset.train_iter(64)         → Basic iterator
    .shuffle()                   → Random ordering
    .to_device(Device::Cuda(0))  → GPU transfer
    .return_smaller_last_batch() → Include partial final batch

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment