Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Danijar Dreamerv3 RandomAgent

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Baseline
Last Updated 2026-02-15 09:00 GMT

Overview

Concrete tool for generating uniformly random actions conforming to the DreamerV3 agent interface, used as a baseline and for initial replay buffer population.

Description

The RandomAgent class implements the full DreamerV3 agent interface (policy, train, report, save, load) but performs no learning. Its policy method samples actions uniformly at random from the provided action space for each element in the batch. All training and reporting methods are no-ops that return empty carries and metrics. This allows it to be used as a drop-in replacement for the learned agent in the training pipeline.

Usage

Import this class when you need a baseline agent for benchmarking, for populating an experience replay buffer with random exploration data before training begins, or for testing the training pipeline without a learned agent. It conforms to the same interface as the DreamerV3 agent, so it can be passed to any run loop (train, train_eval, eval_only, parallel).

Code Reference

Source Location

Signature

class RandomAgent:

    def __init__(self, obs_space: dict, act_space: dict):
        """
        Args:
            obs_space: Dictionary mapping observation names to Space objects.
            act_space: Dictionary mapping action names to Space objects
                       (includes 'reset' key which is excluded from sampling).
        """

    def init_policy(self, batch_size: int) -> tuple:
        """Return empty carry state for policy."""

    def init_train(self, batch_size: int) -> tuple:
        """Return empty carry state for training."""

    def init_report(self, batch_size: int) -> tuple:
        """Return empty carry state for reporting."""

    def policy(self, carry: tuple, obs: dict, mode: str = 'train') -> tuple:
        """
        Sample random actions from the action space.

        Args:
            carry: Empty carry state (unused).
            obs: Dictionary of observations with 'is_first' key for batch size.
            mode: Operating mode ('train', 'eval') — ignored by random agent.

        Returns:
            Tuple of (carry, actions_dict, empty_metrics_dict).
        """

    def train(self, carry: tuple, data: dict) -> tuple:
        """No-op training step. Returns (carry, empty_outs, empty_metrics)."""

    def report(self, carry: tuple, data: dict) -> tuple:
        """No-op reporting. Returns (carry, empty_metrics)."""

    def stream(self, st):
        """Pass-through stream state."""

    def save(self) -> None:
        """No-op save. Returns None."""

    def load(self, data=None) -> None:
        """No-op load."""

Import

from embodied.core.random import RandomAgent

# Or via the package re-export:
from embodied.core import RandomAgent

I/O Contract

Inputs

Name Type Required Description
obs_space dict Yes Observation space mapping names to Space objects
act_space dict Yes Action space mapping names to Space objects (includes 'reset')
obs (policy) dict Yes Current observations with 'is_first' key indicating batch size
mode (policy) str No Operating mode, ignored by random agent (default: 'train')

Outputs

Name Type Description
carry tuple Empty tuple (no state maintained)
actions dict Dictionary mapping action names to numpy arrays of random samples
metrics dict Empty dictionary (no metrics computed)

Usage Examples

Random Agent for Initial Exploration

from embodied.core.random import RandomAgent
import numpy as np

# Create random agent with environment spaces
obs_space = {'image': ..., 'is_first': ...}
act_space = {'action': ..., 'reset': ...}

agent = RandomAgent(obs_space, act_space)

# Initialize carry state
carry = agent.init_policy(batch_size=4)

# Generate random actions
obs = {'is_first': np.array([True, False, False, False])}
carry, actions, metrics = agent.policy(carry, obs)
# actions = {'action': np.array([...])}, shape (4, action_dim)

Using RandomAgent in Training Pipeline

from embodied.core.random import RandomAgent
from embodied.run import train

# RandomAgent conforms to the Agent interface,
# so it can be passed directly to any run loop:
agent = RandomAgent(env.obs_space, env.act_space)

# Use for initial replay buffer population
# or as a baseline comparison
train(agent, env, replay, logger, args)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment