Implementation:Allenai Open instruct StreamingDataLoaderConfig
| Type | Dataclass |
|---|---|
| Source | open_instruct/data_loader.py:L297-437
|
| Dependencies | dataclasses, vllm, datasets, transformers |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete configuration dataclass for controlling streaming generation, reward computation, and batch preparation in the GRPO training pipeline, provided by the Open Instruct library.
Description
StreamingDataLoaderConfig is a Python dataclass that centralizes all configuration for the generation side of GRPO training. It includes parameters for:
- Data loading and packing: Maximum prompt/response lengths and pack length.
- Batching: Number of unique prompts per rollout, samples per prompt, and async steps.
- GRPO sampling/filtering: Active sampling, zero-std filtering, advantage normalization type, completion masking.
- Dataset specification: Dataset mixer lists, splits, transform functions, and caching modes.
- Generation: Temperature, stop strings, inflight weight updates.
- Reward: Verifiable reward toggles, R1-style format rewards, LLM judge configuration, code verifier settings, non-stop penalties.
- Rollout saving: Whether to save rollout traces to disk for analysis.
The __post_init__ method enforces invariants and computes derived fields such as max_possible_score.
Usage
This dataclass is typically populated from command-line arguments and passed to the GRPO main function. It is consumed by the DataPreparationActor, build_all_verifiers(), and the generation engine configuration.
Code Reference
Source Location
- Repository: Open Instruct
- File:
open_instruct/data_loader.py
Signature
@dataclass
class StreamingDataLoaderConfig:
# Data loading/packing
max_prompt_token_length: int = 256
response_length: int = 256
pack_length: int = 512
# Batching
async_steps: int = 1
num_samples_per_prompt_rollout: int = 4
num_unique_prompts_rollout: int = 16
# GRPO sampling/filtering
active_sampling: bool = False
filter_zero_std_samples: bool = True
no_resampling_pass_rate: float | None = None
advantage_normalization_type: str = "standard"
mask_truncated_completions: bool = False
mask_tool_use: bool = True
# Dataset
dataset_mixer_list: list[str] = ...
dataset_mixer_eval_list: list[str] = ...
dataset_transform_fn: list[str] = ...
# Generation
temperature: float = 0.7
stop_strings: list[str] | None = None
inflight_updates: bool = False
# Reward - Verifiable reward
apply_verifiable_reward: bool = True
verification_reward: float = 10.0
# Reward - R1 style format reward
apply_r1_style_format_reward: bool = False
r1_style_format_reward: float = 1.0
# ... additional reward fields (LLM judge, code verifier, etc.)
Import
from open_instruct.data_loader import StreamingDataLoaderConfig
I/O Contract
Key Fields
| Field | Type | Default | Description |
|---|---|---|---|
num_unique_prompts_rollout |
int |
16 | Number of unique prompts per generation rollout. |
num_samples_per_prompt_rollout |
int |
4 | Number of completions to sample per prompt (GRPO group size). |
response_length |
int |
256 | Maximum response token length. |
temperature |
float |
0.7 | Sampling temperature for generation. |
pack_length |
int |
512 | Maximum length of packed sequences for training. |
async_steps |
int |
1 | Number of generation batches queued ahead of the trainer. |
filter_zero_std_samples |
bool |
True | Filter prompts where all completions get the same reward. |
stop_strings |
None | None | Stop strings for early generation termination. |
verification_reward |
float |
10.0 | Reward value for correct verifiable answers. |
advantage_normalization_type |
str |
"standard" | "standard" (z-score) or "centered" (mean subtraction only). |
Computed Fields
| Field | Description |
|---|---|
max_possible_score |
Sum of all enabled reward components; computed in __post_init__.
|
Key Method
| Method | Description |
|---|---|
build_dataloader(...) |
Constructs a StreamingDataLoader that pulls pre-prepared data from a DataPreparationActor.
|
Usage Examples
from open_instruct.data_loader import StreamingDataLoaderConfig
config = StreamingDataLoaderConfig(
num_unique_prompts_rollout=32,
num_samples_per_prompt_rollout=8,
response_length=1024,
temperature=0.8,
pack_length=2048,
max_prompt_token_length=512,
async_steps=2,
dataset_mixer_list=["ai2-adapt-dev/rlvr_gsm8k_zs", "0.5",
"ai2-adapt-dev/rlvr_math_zs", "0.5"],
filter_zero_std_samples=True,
apply_verifiable_reward=True,
verification_reward=10.0,
)
# Total completions per step: 32 * 8 = 256
# Total tokens per step (max): 256 * 2048 = 524288