Principle:Danijar Dreamerv3 Distributed Actor Inference
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Distributed_Systems |
| Last Updated | 2026-02-15 09:00 GMT |
Overview
A batched inference server that receives observations from multiple environment processes, runs the agent policy on GPU in batches, and dispatches actions and transitions to environments and replay.
Description
The distributed actor uses a BatchServer pattern: environment processes send individual observations via RPC, the server accumulates them into batches of size actor_batch, runs a single batched policy inference on GPU, then fans out individual actions back to each environment.
After inference, a post-processing function asynchronously sends the full transitions (observations + actions + policy outputs) to both the replay server (for storage) and the logger server (for episode tracking).
The actor maintains per-environment carry states in a dictionary keyed by envid, enabling stateful RSSM inference across episodes for each environment.
Usage
The actor runs as a thread within the agent process during distributed training. It shares the agent object with the learner thread, which continuously updates the agent's parameters.
Theoretical Basis
Pseudo-code Logic:
# Abstract algorithm
batch_server = BatchServer(batch_size=actor_batch)
def on_batch(observations):
# Gather per-env carry states
carries = [state_dict[env_id] for env_id in obs['envid']]
# Run batched policy inference
carries, actions, outputs = agent.policy(carries, observations)
# Update per-env states
for env_id, carry in zip(obs['envid'], carries):
state_dict[env_id] = carry
# Async: send transitions to replay and logger
replay.add_batch(transitions)
logger.tran(transitions)
return actions