Principle:Roboflow Rf detr Training Loop Execution
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Object_Detection, Training |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
The iterative process of updating model parameters through forward passes, loss computation, and backpropagation over the training dataset.
Description
The training loop in RF-DETR implements a modern training pipeline with:
- Gradient accumulation: Splitting effective batch size across multiple forward-backward passes to simulate larger batches
- Mixed precision (AMP): Using bfloat16 for forward/backward passes with float32 master weights for memory efficiency
- Gradient clipping: Constraining gradient norms to prevent training instability
- Multi-scale training: Randomly varying input resolution each iteration for improved scale robustness
- EMA updates: Maintaining an exponential moving average of model weights after each optimization step
- Drop path scheduling: Adjusting stochastic depth rates throughout training
- LR scheduling: Step or cosine learning rate decay with optional warmup
Usage
This principle is applied during model fine-tuning on custom datasets. The training loop handles all low-level training mechanics automatically.
Theoretical Basis
The training loop optimizes a set prediction loss combining:
- Classification loss: Focal loss or IoU-aware BCE for class predictions
- Box regression loss: L1 loss + Generalized IoU loss for bounding box predictions
- Auxiliary losses: Applied at intermediate decoder layers to improve gradient flow
The optimizer (AdamW) applies decoupled weight decay regularization. Learning rate scheduling follows either step decay or cosine annealing with warmup:
Related Pages
Implemented By
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment