Implementation:Danijar Dreamerv3 RandomAgent

Knowledge Sources	Danijar_Dreamerv3
Domains	Reinforcement_Learning, Baseline
Last Updated	2026-02-15 09:00 GMT

Overview

Concrete tool for generating uniformly random actions conforming to the DreamerV3 agent interface, used as a baseline and for initial replay buffer population.

Description

The RandomAgent class implements the full DreamerV3 agent interface (policy, train, report, save, load) but performs no learning. Its policy method samples actions uniformly at random from the provided action space for each element in the batch. All training and reporting methods are no-ops that return empty carries and metrics. This allows it to be used as a drop-in replacement for the learned agent in the training pipeline.

Usage

Import this class when you need a baseline agent for benchmarking, for populating an experience replay buffer with random exploration data before training begins, or for testing the training pipeline without a learned agent. It conforms to the same interface as the DreamerV3 agent, so it can be passed to any run loop (train, train_eval, eval_only, parallel).

Code Reference

Source Location

Repository: Danijar_Dreamerv3
File: embodied/core/random.py
Lines: 1-39

Signature

class RandomAgent:

    def __init__(self, obs_space: dict, act_space: dict):
        """
        Args:
            obs_space: Dictionary mapping observation names to Space objects.
            act_space: Dictionary mapping action names to Space objects
                       (includes 'reset' key which is excluded from sampling).
        """

    def init_policy(self, batch_size: int) -> tuple:
        """Return empty carry state for policy."""

    def init_train(self, batch_size: int) -> tuple:
        """Return empty carry state for training."""

    def init_report(self, batch_size: int) -> tuple:
        """Return empty carry state for reporting."""

    def policy(self, carry: tuple, obs: dict, mode: str = 'train') -> tuple:
        """
        Sample random actions from the action space.

        Args:
            carry: Empty carry state (unused).
            obs: Dictionary of observations with 'is_first' key for batch size.
            mode: Operating mode ('train', 'eval') — ignored by random agent.

        Returns:
            Tuple of (carry, actions_dict, empty_metrics_dict).
        """

    def train(self, carry: tuple, data: dict) -> tuple:
        """No-op training step. Returns (carry, empty_outs, empty_metrics)."""

    def report(self, carry: tuple, data: dict) -> tuple:
        """No-op reporting. Returns (carry, empty_metrics)."""

    def stream(self, st):
        """Pass-through stream state."""

    def save(self) -> None:
        """No-op save. Returns None."""

    def load(self, data=None) -> None:
        """No-op load."""

Import

from embodied.core.random import RandomAgent

# Or via the package re-export:
from embodied.core import RandomAgent

I/O Contract

Inputs

Name	Type	Required	Description
obs_space	dict	Yes	Observation space mapping names to Space objects
act_space	dict	Yes	Action space mapping names to Space objects (includes 'reset')
obs (policy)	dict	Yes	Current observations with 'is_first' key indicating batch size
mode (policy)	str	No	Operating mode, ignored by random agent (default: 'train')

Outputs

Name	Type	Description
carry	tuple	Empty tuple (no state maintained)
actions	dict	Dictionary mapping action names to numpy arrays of random samples
metrics	dict	Empty dictionary (no metrics computed)

Usage Examples

Random Agent for Initial Exploration

from embodied.core.random import RandomAgent
import numpy as np

# Create random agent with environment spaces
obs_space = {'image': ..., 'is_first': ...}
act_space = {'action': ..., 'reset': ...}

agent = RandomAgent(obs_space, act_space)

# Initialize carry state
carry = agent.init_policy(batch_size=4)

# Generate random actions
obs = {'is_first': np.array([True, False, False, False])}
carry, actions, metrics = agent.policy(carry, obs)
# actions = {'action': np.array([...])}, shape (4, action_dim)

Using RandomAgent in Training Pipeline

from embodied.core.random import RandomAgent
from embodied.run import train

# RandomAgent conforms to the Agent interface,
# so it can be passed directly to any run loop:
agent = RandomAgent(env.obs_space, env.act_space)

# Use for initial replay buffer population
# or as a baseline comparison
train(agent, env, replay, logger, args)

Related Pages

Principle:Danijar_Dreamerv3_Random_Baseline_Agent

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment