Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Allenai Open instruct Pack Sequences

From Leeroopedia


Type Function + Dataclass
Source open_instruct/rl_utils.py:L184-339
Dependencies torch, numpy
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete function and dataclass for packing variable-length query-response pairs into fixed-size sequences with 3D attention masks for efficient RL training, provided by the Open Instruct library.

Description

The pack_sequences() function and PackedSequences dataclass together implement the sequence packing strategy for GRPO training. The function:

  1. Filters out padding tokens from queries and responses.
  2. Greedily packs query-response pairs into buffers of size effective_pack_length.
  3. Automatically reduces pack_length when min_num_batches > 1 to ensure at least one packed sequence per distributed rank.
  4. Constructs 3D intra-document attention masks for each packed sequence.
  5. Resets position IDs at each sequence boundary.
  6. Aligns vLLM log-probabilities with the packed layout.
  7. Supports tool use masking via the mask_tool_use parameter.

The PackedSequences dataclass stores all packed tensors and metadata needed for the training forward pass, including query-responses, attention masks, response masks, position IDs, vLLM log-probabilities, and sequence boundary markers.

Usage

Called by the DataPreparationActor after generation and reward computation, before distributing data to learner GPUs. The output is consumed by prepare_collated_data_for_workers() which splits packed sequences across data-parallel ranks.

Code Reference

Source Location

Signature

def pack_sequences(
    queries: list[list[int]],
    responses: list[list[int]],
    masks: list[list[int]],
    pack_length: int,
    pad_token_id: int,
    vllm_logprobs: list[list[float]],
    min_num_batches: int = 1,
    mask_tool_use: bool = False,
) -> PackedSequences:

Dataclass

@dataclass
class PackedSequences(Generic[T]):
    query_responses: list[torch.Tensor]
    """packed query and response (batch_size, pack_length)"""
    attention_masks: list[torch.Tensor]
    """3D attention mask for packed sequences (batch_size, pack_length, pack_length)"""
    response_masks: list[torch.Tensor]
    """bool response mask for packed sequences (batch_size, pack_length)"""
    original_responses: list[list[int]]
    """original response for broadcast (batch_size, response_length)"""
    advantages: list[torch.Tensor] | None = None
    position_ids: list[torch.Tensor] | None = None
    packed_seq_lens: list[torch.Tensor] | None = None
    vllm_logprobs: list[torch.Tensor] | None = None
    dones: list[torch.Tensor] | None = None
    rewards: list[torch.Tensor] | None = None

Import

from open_instruct.rl_utils import pack_sequences, PackedSequences

I/O Contract

Inputs

Name Type Description
queries list[list[int]] List of tokenized prompt sequences.
responses list[list[int]] List of tokenized response sequences (one per query, or K per query with GRPO).
masks list[list[int]] Tool masks for each response (1 = model-generated, 0 = tool output). Used when mask_tool_use=True.
pack_length int Maximum length of each packed sequence.
pad_token_id int Token ID used for padding; must not appear in queries.
vllm_logprobs list[list[float]] Per-token log-probabilities from vLLM generation (one per response token).
min_num_batches int Minimum number of packed batches to produce (ensures at least one per DP rank).
mask_tool_use bool Whether to apply tool use masks to response masks.

Outputs

Name Type Description
Return value PackedSequences Dataclass containing packed tensors. query_responses are 1D tensors of length pack_length. attention_masks are 2D tensors of shape (pack_length, pack_length) with integer values encoding sequence membership. response_masks are 1D tensors with integer values where 0=prompt/padding, N=sequence index of the response token. position_ids are 1D tensors with reset positions per sequence.

Usage Examples

from open_instruct.rl_utils import pack_sequences

queries = [[10, 20, 30], [40, 50]]
responses = [[100, 200], [300, 400, 500]]
masks = [[1, 1], [1, 1, 1]]
vllm_logprobs = [[-0.5, -1.0], [-0.3, -0.7, -1.2]]

packed = pack_sequences(
    queries=queries,
    responses=responses,
    masks=masks,
    pack_length=16,
    pad_token_id=0,
    vllm_logprobs=vllm_logprobs,
    min_num_batches=1,
)

print(f"Original sequences: {len(queries)}")
print(f"Packed sequences: {len(packed.query_responses)}")
print(f"Pack length: {packed.query_responses[0].shape}")

# Advantages are set externally after packing:
import torch
packed.advantages = [torch.zeros(16) for _ in packed.query_responses]

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment