Principle:Danijar Dreamerv3 Distributed Learner Training
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Distributed_Systems, Model_Based_RL |
| Last Updated | 2026-02-15 09:00 GMT |
Overview
A continuous learning loop that fetches batches from a remote replay process, trains the shared agent, sends replay updates back, and periodically generates evaluation reports.
Description
The distributed learner is the training engine of parallel DreamerV3. It runs as a thread sharing the agent object with the actor thread. The learner:
- Restores from checkpoint (with barrier synchronization so the actor waits)
- Creates prefetched data streams that fetch batches from the remote replay server via portal RPC
- Runs an infinite training loop: fetch batch, call agent.train(), send replay context updates back
- Periodically evaluates by calling agent.report() on report and eval streams
- Logs training metrics to the remote logger server
- Saves checkpoints at regular intervals
The learner uses GlobalClock for time-based scheduling (log, report, save intervals) that accounts for wall-clock time across distributed processes.
Usage
The learner runs as a thread within the agent process during distributed training. It continuously trains while the actor thread simultaneously collects data.
Theoretical Basis
Pseudo-code Logic:
# Abstract algorithm
checkpoint.load_or_save()
barrier.wait() # Signal actor that checkpoint is restored
while True:
batch = prefetch_from_remote_replay('train')
carry, outs, metrics = agent.train(carry, batch)
if 'replay' in outs:
remote_replay.update(outs['replay'])
if should_report():
report_metrics = evaluate(report_stream)
eval_metrics = evaluate(eval_stream)
remote_logger.add(report_metrics)
if should_save():
checkpoint.save()