Principle:Roboflow Rf detr Training Loop Execution

Knowledge Sources	AdamW Cosine Annealing Mixed Precision Training RF-DETR
Domains	Object_Detection, Training
Last Updated	2026-02-08 15:00 GMT

Overview

The iterative process of updating model parameters through forward passes, loss computation, and backpropagation over the training dataset.

Description

The training loop in RF-DETR implements a modern training pipeline with:

Gradient accumulation: Splitting effective batch size across multiple forward-backward passes to simulate larger batches
Mixed precision (AMP): Using bfloat16 for forward/backward passes with float32 master weights for memory efficiency
Gradient clipping: Constraining gradient norms to prevent training instability
Multi-scale training: Randomly varying input resolution each iteration for improved scale robustness
EMA updates: Maintaining an exponential moving average of model weights after each optimization step
Drop path scheduling: Adjusting stochastic depth rates throughout training
LR scheduling: Step or cosine learning rate decay with optional warmup

Usage

This principle is applied during model fine-tuning on custom datasets. The training loop handles all low-level training mechanics automatically.

Theoretical Basis

The training loop optimizes a set prediction loss combining:

Classification loss: Focal loss or IoU-aware BCE for class predictions
Box regression loss: L1 loss + Generalized IoU loss for bounding box predictions
Auxiliary losses: Applied at intermediate decoder layers to improve gradient flow

The optimizer (AdamW) applies decoupled weight decay regularization. Learning rate scheduling follows either step decay or cosine annealing with warmup:

$η_{t} = η_{m i n} + \frac{1}{2} (η_{m a x} - η_{m i n}) (1 + \cos (\frac{t}{T} π))$

Related Pages

Implemented By

Implementation:Roboflow_Rf_detr_Model_Train

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment