Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Alibaba ROLL Rollout Scheduling

From Leeroopedia


Knowledge Sources
Domains Distributed_Systems, Reinforcement_Learning, Agentic_AI
Last Updated 2026-02-07 20:00 GMT

Overview

A distributed scheduling principle for coordinating asynchronous trajectory collection across multiple environment workers with batch assembly and group-based queuing.

Description

Rollout Scheduling orchestrates the collection of training trajectories from multiple environment instances running in parallel. The scheduler coordinates:

  • LLM inference requests: Routing generation requests from environment managers to the inference cluster
  • Group-based batching: Collecting all episodes within a group before yielding the group as a training batch (for GRPO/GiGPO variance reduction)
  • Suspension/resumption: Pausing trajectory collection during model updates to ensure on-policy data
  • GPU sharing: Supporting partial GPU mode where inference GPUs are dynamically reassigned between generation and training

The GroupQueueManager handles the buffering of completed episodes and assembles groups into training batches.

Usage

Use this principle when coordinating asynchronous trajectory collection for agentic RL training. The scheduler manages the lifecycle of rollout collection including suspension during model parameter updates.

Theoretical Basis

Pseudo-code:

# Abstract rollout scheduling
scheduler.suspend()         # Pause during model update
model_update()
batch = scheduler.get_batch(batch_size=32)  # Collect completed groups
train(batch)
scheduler.resume()          # Resume collection with updated policy

Related Pages

Implemented By

Related Heuristics

The following heuristics inform this principle:

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment