Workflow:Danijar Dreamerv3 Evaluation Only

Knowledge Sources	DreamerV3 Mastering Diverse Domains through World Models DreamerV3 Project
Domains	Reinforcement_Learning, World_Models, Model_Based_RL
Last Updated	2026-02-15 09:00 GMT

Overview

Process for loading a pretrained DreamerV3 agent from a checkpoint and evaluating its performance on an environment without any training.

Description

This workflow runs a trained DreamerV3 agent in pure evaluation mode. It loads agent parameters from a saved checkpoint, constructs the target environment, and runs the policy to collect episodes for performance measurement. No training occurs, no replay buffer is maintained, and no gradient updates are performed. The agent uses its learned world model and policy to act in the environment, and episode scores and lengths are logged for analysis.

Usage

Execute this workflow when you have a trained DreamerV3 checkpoint and want to measure its performance on an environment. Common use cases include: generating final benchmark scores after training, testing a model on a different environment variant, recording policy behavior videos, or comparing multiple checkpoints.

Execution Steps

Step 1: Configuration and Checkpoint Specification

Parse command-line arguments with --script eval_only and specify the checkpoint path via --from_checkpoint. The checkpoint path is required for evaluation-only mode. The configuration determines which environment to evaluate on and how many parallel environment instances to use.

Key considerations:

The from_checkpoint argument is mandatory for this mode
The environment config must match the observation and action spaces of the trained agent
The same environment-specific presets should be used as during training

Step 2: Agent and Environment Construction

Construct the DreamerV3 agent architecture and the evaluation environments. The agent is built with the same network architecture as during training, determined by the config. The environment pool is created for parallel episode collection.

Key considerations:

The agent architecture must be compatible with the checkpoint being loaded
No replay buffer is needed since no training occurs
Environment wrappers are applied identically to the training configuration

Step 3: Checkpoint Loading

Load the saved agent parameters from the checkpoint file. Only the agent state is loaded (no replay buffer, no step counter from the checkpoint). The loaded parameters fully restore the world model, policy, and value networks to their trained states.

Key considerations:

Only the agent key is loaded from the checkpoint, ignoring replay and step data
Parameter shapes must match between the checkpoint and the current agent architecture
The agent is ready to act immediately after loading without any warmup

Step 4: Evaluation Episode Collection

Run the agent policy across parallel environments to collect evaluation episodes. The driver steps each environment and queries the agent policy in eval mode. Episode boundaries (start/end) are tracked per environment worker. Each completed episode records its cumulative score, length, and any logged environment metrics.

Key considerations:

The policy runs in eval mode which may use deterministic action selection
The step counter increments with environment steps for logging frequency control
Episode statistics are aggregated across all parallel environment workers

Step 5: Results Logging

Periodically write evaluation metrics including episode scores, lengths, reward rates, system resource usage, and policy throughput (FPS). Results are output to the configured logging backends.

Key considerations:

Logging frequency is controlled by log_every in the run config
The evaluation loop continues until the configured step limit is reached
No training metrics or replay statistics are generated in this mode

Execution Diagram

GitHub URL

Workflow Repository