Implementation:Danijar Dreamerv3 RandomAgent
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Baseline |
| Last Updated | 2026-02-15 09:00 GMT |
Overview
Concrete tool for generating uniformly random actions conforming to the DreamerV3 agent interface, used as a baseline and for initial replay buffer population.
Description
The RandomAgent class implements the full DreamerV3 agent interface (policy, train, report, save, load) but performs no learning. Its policy method samples actions uniformly at random from the provided action space for each element in the batch. All training and reporting methods are no-ops that return empty carries and metrics. This allows it to be used as a drop-in replacement for the learned agent in the training pipeline.
Usage
Import this class when you need a baseline agent for benchmarking, for populating an experience replay buffer with random exploration data before training begins, or for testing the training pipeline without a learned agent. It conforms to the same interface as the DreamerV3 agent, so it can be passed to any run loop (train, train_eval, eval_only, parallel).
Code Reference
Source Location
- Repository: Danijar_Dreamerv3
- File: embodied/core/random.py
- Lines: 1-39
Signature
class RandomAgent:
def __init__(self, obs_space: dict, act_space: dict):
"""
Args:
obs_space: Dictionary mapping observation names to Space objects.
act_space: Dictionary mapping action names to Space objects
(includes 'reset' key which is excluded from sampling).
"""
def init_policy(self, batch_size: int) -> tuple:
"""Return empty carry state for policy."""
def init_train(self, batch_size: int) -> tuple:
"""Return empty carry state for training."""
def init_report(self, batch_size: int) -> tuple:
"""Return empty carry state for reporting."""
def policy(self, carry: tuple, obs: dict, mode: str = 'train') -> tuple:
"""
Sample random actions from the action space.
Args:
carry: Empty carry state (unused).
obs: Dictionary of observations with 'is_first' key for batch size.
mode: Operating mode ('train', 'eval') — ignored by random agent.
Returns:
Tuple of (carry, actions_dict, empty_metrics_dict).
"""
def train(self, carry: tuple, data: dict) -> tuple:
"""No-op training step. Returns (carry, empty_outs, empty_metrics)."""
def report(self, carry: tuple, data: dict) -> tuple:
"""No-op reporting. Returns (carry, empty_metrics)."""
def stream(self, st):
"""Pass-through stream state."""
def save(self) -> None:
"""No-op save. Returns None."""
def load(self, data=None) -> None:
"""No-op load."""
Import
from embodied.core.random import RandomAgent
# Or via the package re-export:
from embodied.core import RandomAgent
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| obs_space | dict | Yes | Observation space mapping names to Space objects |
| act_space | dict | Yes | Action space mapping names to Space objects (includes 'reset') |
| obs (policy) | dict | Yes | Current observations with 'is_first' key indicating batch size |
| mode (policy) | str | No | Operating mode, ignored by random agent (default: 'train') |
Outputs
| Name | Type | Description |
|---|---|---|
| carry | tuple | Empty tuple (no state maintained) |
| actions | dict | Dictionary mapping action names to numpy arrays of random samples |
| metrics | dict | Empty dictionary (no metrics computed) |
Usage Examples
Random Agent for Initial Exploration
from embodied.core.random import RandomAgent
import numpy as np
# Create random agent with environment spaces
obs_space = {'image': ..., 'is_first': ...}
act_space = {'action': ..., 'reset': ...}
agent = RandomAgent(obs_space, act_space)
# Initialize carry state
carry = agent.init_policy(batch_size=4)
# Generate random actions
obs = {'is_first': np.array([True, False, False, False])}
carry, actions, metrics = agent.policy(carry, obs)
# actions = {'action': np.array([...])}, shape (4, action_dim)
Using RandomAgent in Training Pipeline
from embodied.core.random import RandomAgent
from embodied.run import train
# RandomAgent conforms to the Agent interface,
# so it can be passed directly to any run loop:
agent = RandomAgent(env.obs_space, env.act_space)
# Use for initial replay buffer population
# or as a baseline comparison
train(agent, env, replay, logger, args)