Principle:Volcengine Verl Evaluation And Checkpointing
| Knowledge Sources | |
|---|---|
| Domains | Training_Infrastructure, Model_Management, Evaluation |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Periodic model evaluation on held-out data and checkpoint serialization that enables training monitoring, model selection, and fault recovery.
Description
Evaluation and Checkpointing encompasses two related processes in the RL training loop:
Evaluation: At configurable intervals, the current policy is evaluated on a held-out test set using the same rollout and reward infrastructure as training. This provides metrics (reward scores, generation quality) that track training progress and enable model selection.
Checkpointing: Model weights (and optionally optimizer states) are saved to disk at configurable intervals. Checkpoints serve multiple purposes:
- Fault tolerance — resume training after interruptions
- Model selection — pick the best checkpoint based on evaluation metrics
- Deployment — export trained models in HuggingFace format for serving
In verl, both evaluation and checkpointing are orchestrated by the RayPPOTrainer, which manages the distributed workers and coordinates weight collection across FSDP/Megatron shards.
Usage
Evaluation and checkpointing are configured via:
trainer.test_freq— how often to run validation (in training steps)trainer.save_freq— how often to save checkpointstrainer.total_epochs— total number of training epochstrainer.default_local_dir— where to save checkpoints
Theoretical Basis
Evaluation and checkpointing follow standard ML training practices:
Evaluation:
# Abstract evaluation loop
for step in training_steps:
if step % test_freq == 0:
eval_metrics = evaluate(policy, test_dataset)
log(eval_metrics) # W&B, MLflow, or console
Checkpointing:
# Abstract checkpoint save
for step in training_steps:
if step % save_freq == 0:
gather_fsdp_weights() # Collect sharded weights
save_hf_format(model, tokenizer, path) # HuggingFace format