Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Roboflow Rf detr Small Dataset Oversampling

From Leeroopedia



Knowledge Sources
Domains Optimization, Computer_Vision, Deep_Learning
Last Updated 2026-02-08 15:00 GMT

Overview

Automatic oversampling mechanism that switches to uniform random sampling with replacement when the training dataset has fewer images than 5 effective batches, preventing training instability on very small datasets.

Description

When the training dataset is smaller than `effective_batch_size * min_batches` (default: 5 batches), RF-DETR automatically replaces the standard sampler with a `RandomSampler(replacement=True)` that guarantees at least 5 full batches per epoch. This prevents degenerate training behavior (too few gradient updates per epoch) on datasets with as few as 10-50 images.

Usage

This heuristic is automatically applied when training on very small datasets. Be aware of it when working with datasets under ~80 images (with default `batch_size=4 * grad_accum_steps=4 = 16` effective batch size). The threshold is `effective_batch_size * 5 = 80 images` by default.

The Insight (Rule of Thumb)

  • Action: The system automatically switches to oversampling when `len(dataset) < effective_batch_size * min_batches`.
  • Value: `min_batches=5` (hardcoded default). This means at least 5 gradient updates per epoch are guaranteed.
  • Trade-off: Images are sampled with replacement, so some images appear multiple times per epoch while others may not appear at all. This introduces variance but prevents the training loop from having too few steps to learn meaningful gradients.

Reasoning

With very small datasets and large effective batch sizes, standard sampling without replacement would produce only 1-2 batches per epoch. This makes learning rate scheduling erratic (too few steps for warmup/cosine decay) and produces noisy gradient estimates. By oversampling to at least 5 batches, the training loop has enough steps per epoch for the LR scheduler to function properly and for loss to decrease steadily.

The code logs a message when this kicks in:

# rfdetr/main.py:272-280
if len(dataset_train) < effective_batch_size * min_batches:
    logger.info(
        f"Training with uniform sampler because dataset is too small: "
        f"{len(dataset_train)} < {effective_batch_size * min_batches}"
    )
    sampler = torch.utils.data.RandomSampler(
        dataset_train,
        replacement=True,
        num_samples=effective_batch_size * min_batches,
    )

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment