Implementation:Allenai Open instruct Pack Sequences
| Type | Function + Dataclass |
|---|---|
| Source | open_instruct/rl_utils.py:L184-339
|
| Dependencies | torch, numpy |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete function and dataclass for packing variable-length query-response pairs into fixed-size sequences with 3D attention masks for efficient RL training, provided by the Open Instruct library.
Description
The pack_sequences() function and PackedSequences dataclass together implement the sequence packing strategy for GRPO training. The function:
- Filters out padding tokens from queries and responses.
- Greedily packs query-response pairs into buffers of size
effective_pack_length. - Automatically reduces
pack_lengthwhenmin_num_batches > 1to ensure at least one packed sequence per distributed rank. - Constructs 3D intra-document attention masks for each packed sequence.
- Resets position IDs at each sequence boundary.
- Aligns vLLM log-probabilities with the packed layout.
- Supports tool use masking via the
mask_tool_useparameter.
The PackedSequences dataclass stores all packed tensors and metadata needed for the training forward pass, including query-responses, attention masks, response masks, position IDs, vLLM log-probabilities, and sequence boundary markers.
Usage
Called by the DataPreparationActor after generation and reward computation, before distributing data to learner GPUs. The output is consumed by prepare_collated_data_for_workers() which splits packed sequences across data-parallel ranks.
Code Reference
Source Location
- Repository: Open Instruct
- File:
open_instruct/rl_utils.py
Signature
def pack_sequences(
queries: list[list[int]],
responses: list[list[int]],
masks: list[list[int]],
pack_length: int,
pad_token_id: int,
vllm_logprobs: list[list[float]],
min_num_batches: int = 1,
mask_tool_use: bool = False,
) -> PackedSequences:
Dataclass
@dataclass
class PackedSequences(Generic[T]):
query_responses: list[torch.Tensor]
"""packed query and response (batch_size, pack_length)"""
attention_masks: list[torch.Tensor]
"""3D attention mask for packed sequences (batch_size, pack_length, pack_length)"""
response_masks: list[torch.Tensor]
"""bool response mask for packed sequences (batch_size, pack_length)"""
original_responses: list[list[int]]
"""original response for broadcast (batch_size, response_length)"""
advantages: list[torch.Tensor] | None = None
position_ids: list[torch.Tensor] | None = None
packed_seq_lens: list[torch.Tensor] | None = None
vllm_logprobs: list[torch.Tensor] | None = None
dones: list[torch.Tensor] | None = None
rewards: list[torch.Tensor] | None = None
Import
from open_instruct.rl_utils import pack_sequences, PackedSequences
I/O Contract
Inputs
| Name | Type | Description |
|---|---|---|
queries |
list[list[int]] |
List of tokenized prompt sequences. |
responses |
list[list[int]] |
List of tokenized response sequences (one per query, or K per query with GRPO). |
masks |
list[list[int]] |
Tool masks for each response (1 = model-generated, 0 = tool output). Used when mask_tool_use=True.
|
pack_length |
int |
Maximum length of each packed sequence. |
pad_token_id |
int |
Token ID used for padding; must not appear in queries. |
vllm_logprobs |
list[list[float]] |
Per-token log-probabilities from vLLM generation (one per response token). |
min_num_batches |
int |
Minimum number of packed batches to produce (ensures at least one per DP rank). |
mask_tool_use |
bool |
Whether to apply tool use masks to response masks. |
Outputs
| Name | Type | Description |
|---|---|---|
| Return value | PackedSequences |
Dataclass containing packed tensors. query_responses are 1D tensors of length pack_length. attention_masks are 2D tensors of shape (pack_length, pack_length) with integer values encoding sequence membership. response_masks are 1D tensors with integer values where 0=prompt/padding, N=sequence index of the response token. position_ids are 1D tensors with reset positions per sequence.
|
Usage Examples
from open_instruct.rl_utils import pack_sequences
queries = [[10, 20, 30], [40, 50]]
responses = [[100, 200], [300, 400, 500]]
masks = [[1, 1], [1, 1, 1]]
vllm_logprobs = [[-0.5, -1.0], [-0.3, -0.7, -1.2]]
packed = pack_sequences(
queries=queries,
responses=responses,
masks=masks,
pack_length=16,
pad_token_id=0,
vllm_logprobs=vllm_logprobs,
min_num_batches=1,
)
print(f"Original sequences: {len(queries)}")
print(f"Packed sequences: {len(packed.query_responses)}")
print(f"Pack length: {packed.query_responses[0].shape}")
# Advantages are set externally after packing:
import torch
packed.advantages = [torch.zeros(16) for _ in packed.query_responses]