Principle:Danijar Dreamerv3 Replay Buffer Setup
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Experience_Replay |
| Last Updated | 2026-02-15 09:00 GMT |
Overview
A mechanism for storing, sampling, and replaying fixed-length sequences of environment transitions to enable off-policy training of world models and value functions.
Description
Experience replay is fundamental to sample-efficient RL. DreamerV3's replay buffer stores transitions in fixed-size chunks on disk, supports configurable sampling strategies (uniform, prioritized, recency-weighted, or mixtures thereof), maintains online queues for fresh data, and provides concurrent read/write access via reader-writer locks.
The replay buffer solves three problems:
- Temporal correlation: Breaking sequential correlation by sampling random subsequences
- Sample efficiency: Reusing past transitions multiple times for world model learning
- Scalability: Disk-backed chunked storage enabling millions of stored transitions
The buffer stores sequences of length consec * batch_length + replay_context, allowing consecutive training segments with context for RSSM carry state initialization.
Usage
Use this principle when setting up the training pipeline. One replay buffer is created for training data; a second (smaller) buffer may be created for evaluation data in the train_eval workflow.
Theoretical Basis
Experience replay samples transitions independently from their collection order:
Selector strategies:
- Uniform: Each stored sequence equally likely
- Prioritized: Weighted by TD-error or loss magnitude
- Recency: Power-law decay favoring recent data
- Mixture: Weighted combination (e.g., 80% uniform + 10% priority + 10% recency)
Pseudo-code Logic:
# Abstract algorithm
buffer = ReplayBuffer(capacity=1e6, chunk_size=1024)
selector = MixtureSampler(uniform=0.8, priority=0.1, recency=0.1)
# On each env step:
buffer.add(transition, worker_id)
# On each train step:
batch = buffer.sample(batch_size, selector)