Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Volcengine Verl Sequence Length Balancing

From Leeroopedia




Metadata:

Overview

Use Karmarkar-Karp algorithm for balanced sequence partitioning across data parallel ranks to minimize idle time from variable-length sequences.

Description

In RL training with variable-length sequences, naive partitioning leads to stragglers (some GPUs processing much longer sequences). verl implements the Karmarkar-Karp Largest Differencing Method to partition sequences into balanced workload groups. The workload is estimated as 24576 * seqlen + seqlen², calibrated for 7B models (hidden_size=4096).

Usage

Enable when using seq_balance mode in training configurations. Most beneficial when batch sequences have high variance in length.

The Insight

  • Action: Enable sequence balancing via configuration
  • Value: Workload formula: 24576 * seqlen + seqlen² (calibrated for hidden_size=4096)
  • Trade-off: Adds overhead for partitioning calculation but significantly reduces GPU idle time
  • Additional tip: Place smaller micro-batches at both ends of pipeline to reduce warm-up/cool-down bubbles

Reasoning

Transformer attention FLOPs scale as 12 * hidden_size² * seqlen + 2 * hidden_size * seqlen². The quadratic term means longer sequences are disproportionately expensive. The Karmarkar-Karp algorithm produces near-optimal balanced partitions. Additionally, placing smaller micro-batches at pipeline ends reduces bubble overhead.

Code Evidence

From verl/utils/seqlen_balancing.py:27-46:

def calculate_workload(seqlen_list: torch.Tensor) -> torch.Tensor:
    """workload ∝ 24576 * seqlen + seqlen²"""
    return 24576 * seqlen_list + seqlen_list**2

And from verl/utils/seqlen_balancing.py:406-416 (micro-batch placement):

# Place smaller micro-batches at both ends to reduce the bubbles
# exposed during the warm-up and cool-down.
micro_bsz_idx = micro_bsz_idx[::2][::-1] + micro_bsz_idx[1::2]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment