Implementation:Allenai Open instruct Pack Sequences

Type	Function + Dataclass
Source	`open_instruct/rl_utils.py:L184-339`
Dependencies	torch, numpy
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete function and dataclass for packing variable-length query-response pairs into fixed-size sequences with 3D attention masks for efficient RL training, provided by the Open Instruct library.

Description

The pack_sequences() function and PackedSequences dataclass together implement the sequence packing strategy for GRPO training. The function:

Filters out padding tokens from queries and responses.
Greedily packs query-response pairs into buffers of size effective_pack_length.
Automatically reduces pack_length when min_num_batches > 1 to ensure at least one packed sequence per distributed rank.
Constructs 3D intra-document attention masks for each packed sequence.
Resets position IDs at each sequence boundary.
Aligns vLLM log-probabilities with the packed layout.
Supports tool use masking via the mask_tool_use parameter.

The PackedSequences dataclass stores all packed tensors and metadata needed for the training forward pass, including query-responses, attention masks, response masks, position IDs, vLLM log-probabilities, and sequence boundary markers.

Usage

Called by the DataPreparationActor after generation and reward computation, before distributing data to learner GPUs. The output is consumed by prepare_collated_data_for_workers() which splits packed sequences across data-parallel ranks.

Code Reference

Source Location

Repository: Open Instruct
File: open_instruct/rl_utils.py

Signature

def pack_sequences(
    queries: list[list[int]],
    responses: list[list[int]],
    masks: list[list[int]],
    pack_length: int,
    pad_token_id: int,
    vllm_logprobs: list[list[float]],
    min_num_batches: int = 1,
    mask_tool_use: bool = False,
) -> PackedSequences:

Dataclass

@dataclass
class PackedSequences(Generic[T]):
    query_responses: list[torch.Tensor]
    """packed query and response (batch_size, pack_length)"""
    attention_masks: list[torch.Tensor]
    """3D attention mask for packed sequences (batch_size, pack_length, pack_length)"""
    response_masks: list[torch.Tensor]
    """bool response mask for packed sequences (batch_size, pack_length)"""
    original_responses: list[list[int]]
    """original response for broadcast (batch_size, response_length)"""
    advantages: list[torch.Tensor] | None = None
    position_ids: list[torch.Tensor] | None = None
    packed_seq_lens: list[torch.Tensor] | None = None
    vllm_logprobs: list[torch.Tensor] | None = None
    dones: list[torch.Tensor] | None = None
    rewards: list[torch.Tensor] | None = None

Import

from open_instruct.rl_utils import pack_sequences, PackedSequences

I/O Contract

Inputs

Name	Type	Description
`queries`	`list[list[int]]`	List of tokenized prompt sequences.
`responses`	`list[list[int]]`	List of tokenized response sequences (one per query, or K per query with GRPO).
`masks`	`list[list[int]]`	Tool masks for each response (1 = model-generated, 0 = tool output). Used when `mask_tool_use=True`.
`pack_length`	`int`	Maximum length of each packed sequence.
`pad_token_id`	`int`	Token ID used for padding; must not appear in queries.
`vllm_logprobs`	`list[list[float]]`	Per-token log-probabilities from vLLM generation (one per response token).
`min_num_batches`	`int`	Minimum number of packed batches to produce (ensures at least one per DP rank).
`mask_tool_use`	`bool`	Whether to apply tool use masks to response masks.

Outputs

Name	Type	Description
Return value	`PackedSequences`	Dataclass containing packed tensors. `query_responses` are 1D tensors of length `pack_length`. `attention_masks` are 2D tensors of shape `(pack_length, pack_length)` with integer values encoding sequence membership. `response_masks` are 1D tensors with integer values where 0=prompt/padding, N=sequence index of the response token. `position_ids` are 1D tensors with reset positions per sequence.

Usage Examples

from open_instruct.rl_utils import pack_sequences

queries = [[10, 20, 30], [40, 50]]
responses = [[100, 200], [300, 400, 500]]
masks = [[1, 1], [1, 1, 1]]
vllm_logprobs = [[-0.5, -1.0], [-0.3, -0.7, -1.2]]

packed = pack_sequences(
    queries=queries,
    responses=responses,
    masks=masks,
    pack_length=16,
    pad_token_id=0,
    vllm_logprobs=vllm_logprobs,
    min_num_batches=1,
)

print(f"Original sequences: {len(queries)}")
print(f"Packed sequences: {len(packed.query_responses)}")
print(f"Pack length: {packed.query_responses[0].shape}")

# Advantages are set externally after packing:
import torch
packed.advantages = [torch.zeros(16) for _ in packed.query_responses]

Related Pages

Implements Principle

Principle:Allenai_Open_instruct_Sequence_Packing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment