Heuristic:Volcengine Verl Sequence Length Balancing

Metadata:

Sources: Repo|verl|https://github.com/volcengine/verl
Domains: Optimization, Distributed_Training
Last Updated: 2026-02-07 17:00 GMT

Overview

Use Karmarkar-Karp algorithm for balanced sequence partitioning across data parallel ranks to minimize idle time from variable-length sequences.

Description

In RL training with variable-length sequences, naive partitioning leads to stragglers (some GPUs processing much longer sequences). verl implements the Karmarkar-Karp Largest Differencing Method to partition sequences into balanced workload groups. The workload is estimated as 24576 * seqlen + seqlen², calibrated for 7B models (hidden_size=4096).

Usage

Enable when using seq_balance mode in training configurations. Most beneficial when batch sequences have high variance in length.

The Insight

Action: Enable sequence balancing via configuration
Value: Workload formula: 24576 * seqlen + seqlen² (calibrated for hidden_size=4096)
Trade-off: Adds overhead for partitioning calculation but significantly reduces GPU idle time
Additional tip: Place smaller micro-batches at both ends of pipeline to reduce warm-up/cool-down bubbles

Reasoning

Transformer attention FLOPs scale as 12 * hidden_size² * seqlen + 2 * hidden_size * seqlen². The quadratic term means longer sequences are disproportionately expensive. The Karmarkar-Karp algorithm produces near-optimal balanced partitions. Additionally, placing smaller micro-batches at pipeline ends reduces bubble overhead.

Code Evidence

From verl/utils/seqlen_balancing.py:27-46:

def calculate_workload(seqlen_list: torch.Tensor) -> torch.Tensor:
    """workload ∝ 24576 * seqlen + seqlen²"""
    return 24576 * seqlen_list + seqlen_list**2

And from verl/utils/seqlen_balancing.py:406-416 (micro-batch placement):

# Place smaller micro-batches at both ends to reduce the bubbles
# exposed during the warm-up and cool-down.
micro_bsz_idx = micro_bsz_idx[::2][::-1] + micro_bsz_idx[1::2]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment