Heuristic:Roboflow Rf detr EMA Best Checkpoint Strategy
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Deep_Learning, Computer_Vision |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Dual checkpoint tracking strategy that maintains both regular and EMA (Exponential Moving Average) model weights throughout training, automatically selecting whichever achieves higher mAP as the final best model.
Description
During training, RF-DETR tracks two parallel sets of weights: the regular model weights (updated by the optimizer) and an EMA-smoothed copy that maintains a running exponential average. At the end of training, the system compares the best mAP achieved by each variant and copies the winning checkpoint to `checkpoint_best_total.pth`. This file is then stripped of optimizer state for efficient deployment.
Usage
This heuristic is automatically applied when `use_ema=True` (the default in `TrainConfig`). The EMA model typically performs better because weight averaging smooths out noise from stochastic gradient descent. Use `early_stopping_use_ema=True` if you want early stopping to track EMA performance specifically.
The Insight (Rule of Thumb)
- Action: Keep `use_ema=True` (default) and let the system auto-select the best checkpoint.
- Value:
- `ema_decay=0.993` (default): Controls smoothing factor. Higher values = more smoothing.
- `ema_tau=100` (default): Controls warmup period for EMA updates.
- Three checkpoint files are saved: `checkpoint_best_regular.pth`, `checkpoint_best_ema.pth`, `checkpoint_best_total.pth` (the winner).
- Trade-off: EMA doubles the memory footprint for model weights (two copies). For very large models this may be significant. The compute overhead is minimal (just weight averaging per step).
- Best practice: Always use `checkpoint_best_total.pth` for inference — it contains the best weights regardless of whether they came from regular or EMA tracking.
Reasoning
EMA acts as a form of ensemble averaging over the training trajectory. The smoothed weights are less sensitive to the noise from individual mini-batches and tend to generalize better. The automatic selection ensures the user always gets the best available model without having to manually compare checkpoints.
Best checkpoint selection from `rfdetr/main.py:504-512`:
best_is_ema = best_map_ema_5095 > best_map_5095
if utils.is_main_process():
if best_is_ema:
shutil.copy2(output_dir / 'checkpoint_best_ema.pth',
output_dir / 'checkpoint_best_total.pth')
else:
shutil.copy2(output_dir / 'checkpoint_best_regular.pth',
output_dir / 'checkpoint_best_total.pth')
utils.strip_checkpoint(output_dir / 'checkpoint_best_total.pth')
EMA model update from `rfdetr/util/utils.py` (ModelEma class):
# EMA update is applied every step after epoch >= 0
if ema_m is not None:
if epoch >= 0:
ema_m.update(model)
After training, the model switches to whichever variant won from `rfdetr/main.py:531-533`:
if best_is_ema:
self.model = self.ema_m.module
self.model.eval()