Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Roboflow Rf detr Checkpoint Management

From Leeroopedia


Knowledge Sources
Domains Training, Model_Selection
Last Updated 2026-02-08 15:00 GMT

Overview

The strategy for saving, tracking, and selecting the best model checkpoint during training using EMA and best-metric tracking.

Description

Checkpoint management ensures the best-performing model is preserved during training:

  • Regular checkpoints: Saved each epoch and at configurable intervals
  • Best regular checkpoint: The epoch with the highest mAP among regular model evaluations
  • EMA checkpoint: An exponential moving average of model weights that often generalizes better
  • Best total checkpoint: The overall best between regular and EMA models, stripped of optimizer state for deployment

The ModelEma class maintains the EMA model with configurable decay and warmup. The BestMetricHolder tracks the best mAP across both regular and EMA models.

Usage

This principle is applied automatically during training. After training completes, the checkpoint_best_total.pth file contains the best model ready for inference or deployment.

Theoretical Basis

Exponential Moving Average (EMA) of model weights provides a form of temporal ensembling:

θEMA(t)=αθEMA(t1)+(1α)θ(t)

Where α is the decay rate. With tau-based warmup, the effective decay ramps up:

αeff=α(1et/τ)

EMA models tend to have smoother loss landscapes and better generalization, particularly on small datasets.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment