Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Danijar Dreamerv3 Process Spawning

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Distributed_Systems
Last Updated 2026-02-15 09:00 GMT

Overview

A distributed process orchestration pattern that spawns separate OS processes for actor inference, learner training, replay management, logging, and environment stepping, connected via RPC.

Description

Process Spawning in DreamerV3 implements a multi-process architecture where each computational role runs in its own process (or thread):

  • Agent Process: Contains both actor (inference) and learner (training) threads sharing a single agent object
  • Replay Process: Manages replay buffers and data streams, enforcing rate limits
  • Logger Process: Aggregates metrics from all other processes
  • Environment Processes: One per environment, sending observations to the actor and receiving actions

All cross-process communication uses the portal library's RPC mechanism (Server/Client/BatchServer). Factory functions are serialized via cloudpickle and deserialized in their target processes. Network addresses are auto-resolved using free ports.

This architecture enables scaling: environments can run on CPU-only machines while the agent uses GPU, the replay buffer can be on a high-memory machine, and the logger runs independently.

Usage

Use this principle when config.script == 'parallel'. It replaces the single-process training loop with a distributed version. Individual processes (parallel_env, parallel_replay) can also be launched as separate jobs for remote execution.

Theoretical Basis

Pseudo-code Logic:

# Abstract algorithm
# Serialize all factory functions
factories = {name: cloudpickle.dumps(fn) for name, fn in factories.items()}

# Resolve network addresses
actor_addr = find_free_port()
replay_addr = find_free_port()
logger_addr = find_free_port()

# Spawn processes
processes = [
    Process(agent_fn, [actor_thread, learner_thread]),
    Process(replay_fn, replay_addr),
    Process(logger_fn, logger_addr),
    *[Process(env_fn, i, actor_addr) for i in range(num_envs)],
]
run_until_any_fails(processes)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment