Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Volcengine Verl Evaluation And Checkpointing

From Leeroopedia


Knowledge Sources
Domains Training_Infrastructure, Model_Management, Evaluation
Last Updated 2026-02-07 14:00 GMT

Overview

Periodic model evaluation on held-out data and checkpoint serialization that enables training monitoring, model selection, and fault recovery.

Description

Evaluation and Checkpointing encompasses two related processes in the RL training loop:

Evaluation: At configurable intervals, the current policy is evaluated on a held-out test set using the same rollout and reward infrastructure as training. This provides metrics (reward scores, generation quality) that track training progress and enable model selection.

Checkpointing: Model weights (and optionally optimizer states) are saved to disk at configurable intervals. Checkpoints serve multiple purposes:

  • Fault tolerance — resume training after interruptions
  • Model selection — pick the best checkpoint based on evaluation metrics
  • Deployment — export trained models in HuggingFace format for serving

In verl, both evaluation and checkpointing are orchestrated by the RayPPOTrainer, which manages the distributed workers and coordinates weight collection across FSDP/Megatron shards.

Usage

Evaluation and checkpointing are configured via:

  • trainer.test_freq — how often to run validation (in training steps)
  • trainer.save_freq — how often to save checkpoints
  • trainer.total_epochs — total number of training epochs
  • trainer.default_local_dir — where to save checkpoints

Theoretical Basis

Evaluation and checkpointing follow standard ML training practices:

Evaluation:

# Abstract evaluation loop
for step in training_steps:
    if step % test_freq == 0:
        eval_metrics = evaluate(policy, test_dataset)
        log(eval_metrics)  # W&B, MLflow, or console

Checkpointing:

# Abstract checkpoint save
for step in training_steps:
    if step % save_freq == 0:
        gather_fsdp_weights()  # Collect sharded weights
        save_hf_format(model, tokenizer, path)  # HuggingFace format

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment