Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Danijar Dreamerv3 Environment Construction

From Leeroopedia
Revision as of 18:04, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Danijar_Dreamerv3_Environment_Construction.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Reinforcement_Learning, Environment
Last Updated 2026-02-15 09:00 GMT

Overview

A factory pattern for constructing and wrapping diverse RL environments behind a unified interface, enabling a single agent to operate across fundamentally different domains.

Description

Environment Construction in DreamerV3 uses a registry-based factory to instantiate environment objects from a task string of the form suite_taskname (e.g., atari_pong, dmc_walker_walk, crafter_reward). The factory maps suite names to constructor classes, instantiates the environment, then applies a standard chain of wrappers (action normalization, dtype unification, space checking, action clipping) to ensure all environments expose a consistent interface regardless of their underlying API (Gym, DeepMind, custom).

This solves the problem of running a single RL algorithm across 150+ tasks spanning Atari, DeepMind Control Suite, Crafter, DMLab, Minecraft, ProcGen, BSuite, and custom environments — without any environment-specific code in the agent.

Usage

Use this principle whenever creating environment instances for training, evaluation, or distributed data collection. It is always the second step after configuration loading, and produces the obs_space and act_space dictionaries that define the agent's interface.

Theoretical Basis

The environment construction follows the Abstract Factory pattern:

Pseudo-code Logic:

# Abstract algorithm
suite, task = parse_task_string(config.task)  # "atari_pong" -> ("atari", "pong")
constructor = REGISTRY[suite]                  # Look up environment class
env = constructor(task, **suite_config)         # Instantiate
env = apply_wrappers(env, config)              # Normalize interface
# env now exposes: obs_space, act_space, step(action) -> obs

The wrapper chain ensures:

  • NormalizeAction: Continuous actions mapped to [-1, 1]
  • UnifyDtypes: Consistent observation dtypes
  • CheckSpaces: Runtime validation of obs/act shapes
  • ClipAction: Clamp continuous actions to valid range

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment