Principle:Roboflow Rf detr Checkpoint Management
| Knowledge Sources | |
|---|---|
| Domains | Training, Model_Selection |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
The strategy for saving, tracking, and selecting the best model checkpoint during training using EMA and best-metric tracking.
Description
Checkpoint management ensures the best-performing model is preserved during training:
- Regular checkpoints: Saved each epoch and at configurable intervals
- Best regular checkpoint: The epoch with the highest mAP among regular model evaluations
- EMA checkpoint: An exponential moving average of model weights that often generalizes better
- Best total checkpoint: The overall best between regular and EMA models, stripped of optimizer state for deployment
The ModelEma class maintains the EMA model with configurable decay and warmup. The BestMetricHolder tracks the best mAP across both regular and EMA models.
Usage
This principle is applied automatically during training. After training completes, the checkpoint_best_total.pth file contains the best model ready for inference or deployment.
Theoretical Basis
Exponential Moving Average (EMA) of model weights provides a form of temporal ensembling:
Where α is the decay rate. With tau-based warmup, the effective decay ramps up:
EMA models tend to have smoother loss landscapes and better generalization, particularly on small datasets.