Principle:Danijar Dreamerv3 Replay Buffer Setup

Knowledge Sources	Mastering Diverse Domains through World Models Prioritized Experience Replay DreamerV3
Domains	Reinforcement_Learning, Experience_Replay
Last Updated	2026-02-15 09:00 GMT

Overview

A mechanism for storing, sampling, and replaying fixed-length sequences of environment transitions to enable off-policy training of world models and value functions.

Description

Experience replay is fundamental to sample-efficient RL. DreamerV3's replay buffer stores transitions in fixed-size chunks on disk, supports configurable sampling strategies (uniform, prioritized, recency-weighted, or mixtures thereof), maintains online queues for fresh data, and provides concurrent read/write access via reader-writer locks.

The replay buffer solves three problems:

Temporal correlation: Breaking sequential correlation by sampling random subsequences
Sample efficiency: Reusing past transitions multiple times for world model learning
Scalability: Disk-backed chunked storage enabling millions of stored transitions

The buffer stores sequences of length consec * batch_length + replay_context, allowing consecutive training segments with context for RSSM carry state initialization.

Usage

Use this principle when setting up the training pipeline. One replay buffer is created for training data; a second (smaller) buffer may be created for evaluation data in the train_eval workflow.

Theoretical Basis

Experience replay samples transitions independently from their collection order:

$ℬ \leftarrow ℬ \cup {(o_{t}, a_{t}, r_{t}, o_{t + 1})}$

$batch \sim Selector (ℬ)$

Selector strategies:

Uniform: Each stored sequence equally likely
Prioritized: Weighted by TD-error or loss magnitude
Recency: Power-law decay favoring recent data
Mixture: Weighted combination (e.g., 80% uniform + 10% priority + 10% recency)

Pseudo-code Logic:

# Abstract algorithm
buffer = ReplayBuffer(capacity=1e6, chunk_size=1024)
selector = MixtureSampler(uniform=0.8, priority=0.1, recency=0.1)
# On each env step:
buffer.add(transition, worker_id)
# On each train step:
batch = buffer.sample(batch_size, selector)

Related Pages

Implemented By

Implementation:Danijar_Dreamerv3_Make_Replay

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment