Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Danijar Dreamerv3 Replay Buffer Setup

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Experience_Replay
Last Updated 2026-02-15 09:00 GMT

Overview

A mechanism for storing, sampling, and replaying fixed-length sequences of environment transitions to enable off-policy training of world models and value functions.

Description

Experience replay is fundamental to sample-efficient RL. DreamerV3's replay buffer stores transitions in fixed-size chunks on disk, supports configurable sampling strategies (uniform, prioritized, recency-weighted, or mixtures thereof), maintains online queues for fresh data, and provides concurrent read/write access via reader-writer locks.

The replay buffer solves three problems:

  • Temporal correlation: Breaking sequential correlation by sampling random subsequences
  • Sample efficiency: Reusing past transitions multiple times for world model learning
  • Scalability: Disk-backed chunked storage enabling millions of stored transitions

The buffer stores sequences of length consec * batch_length + replay_context, allowing consecutive training segments with context for RSSM carry state initialization.

Usage

Use this principle when setting up the training pipeline. One replay buffer is created for training data; a second (smaller) buffer may be created for evaluation data in the train_eval workflow.

Theoretical Basis

Experience replay samples transitions independently from their collection order:

{(ot,at,rt,ot+1)}

batchSelector()

Selector strategies:

  • Uniform: Each stored sequence equally likely
  • Prioritized: Weighted by TD-error or loss magnitude
  • Recency: Power-law decay favoring recent data
  • Mixture: Weighted combination (e.g., 80% uniform + 10% priority + 10% recency)

Pseudo-code Logic:

# Abstract algorithm
buffer = ReplayBuffer(capacity=1e6, chunk_size=1024)
selector = MixtureSampler(uniform=0.8, priority=0.1, recency=0.1)
# On each env step:
buffer.add(transition, worker_id)
# On each train step:
batch = buffer.sample(batch_size, selector)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment