Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Danijar Dreamerv3 Distributed Actor Inference

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Distributed_Systems
Last Updated 2026-02-15 09:00 GMT

Overview

A batched inference server that receives observations from multiple environment processes, runs the agent policy on GPU in batches, and dispatches actions and transitions to environments and replay.

Description

The distributed actor uses a BatchServer pattern: environment processes send individual observations via RPC, the server accumulates them into batches of size actor_batch, runs a single batched policy inference on GPU, then fans out individual actions back to each environment.

After inference, a post-processing function asynchronously sends the full transitions (observations + actions + policy outputs) to both the replay server (for storage) and the logger server (for episode tracking).

The actor maintains per-environment carry states in a dictionary keyed by envid, enabling stateful RSSM inference across episodes for each environment.

Usage

The actor runs as a thread within the agent process during distributed training. It shares the agent object with the learner thread, which continuously updates the agent's parameters.

Theoretical Basis

Pseudo-code Logic:

# Abstract algorithm
batch_server = BatchServer(batch_size=actor_batch)

def on_batch(observations):
    # Gather per-env carry states
    carries = [state_dict[env_id] for env_id in obs['envid']]
    # Run batched policy inference
    carries, actions, outputs = agent.policy(carries, observations)
    # Update per-env states
    for env_id, carry in zip(obs['envid'], carries):
        state_dict[env_id] = carry
    # Async: send transitions to replay and logger
    replay.add_batch(transitions)
    logger.tran(transitions)
    return actions

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment