Workflow:Danijar Dreamerv3 Evaluation Only
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, World_Models, Model_Based_RL |
| Last Updated | 2026-02-15 09:00 GMT |
Overview
Process for loading a pretrained DreamerV3 agent from a checkpoint and evaluating its performance on an environment without any training.
Description
This workflow runs a trained DreamerV3 agent in pure evaluation mode. It loads agent parameters from a saved checkpoint, constructs the target environment, and runs the policy to collect episodes for performance measurement. No training occurs, no replay buffer is maintained, and no gradient updates are performed. The agent uses its learned world model and policy to act in the environment, and episode scores and lengths are logged for analysis.
Usage
Execute this workflow when you have a trained DreamerV3 checkpoint and want to measure its performance on an environment. Common use cases include: generating final benchmark scores after training, testing a model on a different environment variant, recording policy behavior videos, or comparing multiple checkpoints.
Execution Steps
Step 1: Configuration and Checkpoint Specification
Parse command-line arguments with --script eval_only and specify the checkpoint path via --from_checkpoint. The checkpoint path is required for evaluation-only mode. The configuration determines which environment to evaluate on and how many parallel environment instances to use.
Key considerations:
- The from_checkpoint argument is mandatory for this mode
- The environment config must match the observation and action spaces of the trained agent
- The same environment-specific presets should be used as during training
Step 2: Agent and Environment Construction
Construct the DreamerV3 agent architecture and the evaluation environments. The agent is built with the same network architecture as during training, determined by the config. The environment pool is created for parallel episode collection.
Key considerations:
- The agent architecture must be compatible with the checkpoint being loaded
- No replay buffer is needed since no training occurs
- Environment wrappers are applied identically to the training configuration
Step 3: Checkpoint Loading
Load the saved agent parameters from the checkpoint file. Only the agent state is loaded (no replay buffer, no step counter from the checkpoint). The loaded parameters fully restore the world model, policy, and value networks to their trained states.
Key considerations:
- Only the agent key is loaded from the checkpoint, ignoring replay and step data
- Parameter shapes must match between the checkpoint and the current agent architecture
- The agent is ready to act immediately after loading without any warmup
Step 4: Evaluation Episode Collection
Run the agent policy across parallel environments to collect evaluation episodes. The driver steps each environment and queries the agent policy in eval mode. Episode boundaries (start/end) are tracked per environment worker. Each completed episode records its cumulative score, length, and any logged environment metrics.
Key considerations:
- The policy runs in eval mode which may use deterministic action selection
- The step counter increments with environment steps for logging frequency control
- Episode statistics are aggregated across all parallel environment workers
Step 5: Results Logging
Periodically write evaluation metrics including episode scores, lengths, reward rates, system resource usage, and policy throughput (FPS). Results are output to the configured logging backends.
Key considerations:
- Logging frequency is controlled by log_every in the run config
- The evaluation loop continues until the configured step limit is reached
- No training metrics or replay statistics are generated in this mode