Principle:Volcengine Verl Evaluation And Checkpointing

Knowledge Sources	verl Training Documentation verl
Domains	Training_Infrastructure, Model_Management, Evaluation
Last Updated	2026-02-07 14:00 GMT

Overview

Periodic model evaluation on held-out data and checkpoint serialization that enables training monitoring, model selection, and fault recovery.

Description

Evaluation and Checkpointing encompasses two related processes in the RL training loop:

Evaluation: At configurable intervals, the current policy is evaluated on a held-out test set using the same rollout and reward infrastructure as training. This provides metrics (reward scores, generation quality) that track training progress and enable model selection.

Checkpointing: Model weights (and optionally optimizer states) are saved to disk at configurable intervals. Checkpoints serve multiple purposes:

Fault tolerance — resume training after interruptions
Model selection — pick the best checkpoint based on evaluation metrics
Deployment — export trained models in HuggingFace format for serving

In verl, both evaluation and checkpointing are orchestrated by the RayPPOTrainer, which manages the distributed workers and coordinates weight collection across FSDP/Megatron shards.

Usage

Evaluation and checkpointing are configured via:

trainer.test_freq — how often to run validation (in training steps)
trainer.save_freq — how often to save checkpoints
trainer.total_epochs — total number of training epochs
trainer.default_local_dir — where to save checkpoints

Theoretical Basis

Evaluation and checkpointing follow standard ML training practices:

Evaluation:

# Abstract evaluation loop
for step in training_steps:
    if step % test_freq == 0:
        eval_metrics = evaluate(policy, test_dataset)
        log(eval_metrics)  # W&B, MLflow, or console

Checkpointing:

# Abstract checkpoint save
for step in training_steps:
    if step % save_freq == 0:
        gather_fsdp_weights()  # Collect sharded weights
        save_hf_format(model, tokenizer, path)  # HuggingFace format

Related Pages

Implemented By

Implementation:Volcengine_Verl_RayPPOTrainer_Validate_Save

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment