Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Hpcaitech ColossalAI Launch Distributed

From Leeroopedia


Knowledge Sources
Domains Distributed_Computing, Infrastructure
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for launching the distributed producer-consumer RL training infrastructure using Ray, provided by ColossalChat.

Description

launch_distributed() is the main orchestrator for distributed GRPO training. It initializes a Ray cluster, discovers GPU resources across nodes, creates producer and consumer Ray actors with appropriate GPU allocations, and starts their training loops.

Usage

Called from the main RL entry script (rl_example.py) with all training configuration parameters.

Code Reference

Source Location

  • Repository: ColossalAI
  • File: applications/ColossalChat/coati/distributed/launch.py
  • Lines: 36-191

Signature

def launch_distributed(
    num_producers: int,
    num_proc_per_producer: int,
    num_consumer_procs: int,
    num_episodes: int,
    inference_batch_size: int,
    inference_microbatch_size: int,
    train_batch_size: int,
    train_minibatch_size: int,
    train_dataset_config: Dict[str, Any],
    inference_model_config: Dict[str, Any],
    generate_config: Dict[str, Any],
    train_model_config: Dict[str, Any],
    grpo_config: Dict[str, Any],
    plugin_config: Dict[str, Any],
    tokenizer_config: Optional[Dict[str, Any]] = None,
    inference_backend: str = "transformers",
    num_generations: int = 8,
    master_addr: str = "localhost",
    master_port: int = 29500,
    core_algo: str = "GRPO",
    project_name: Optional[str] = None,
    save_interval: int = 100,
    save_dir: str = "./model",
    eval_dataset_config: Optional[Dict[str, Any]] = None,
    eval_interval: int = 100,
    n_behind: int = 0,
) -> None:
    """
    Launch distributed RL training with Ray-based producer-consumer architecture.
    """

Import

from coati.distributed.launch import launch_distributed

I/O Contract

Inputs

Name Type Required Description
num_producers int Yes Number of inference worker actors
num_consumer_procs int Yes Number of training GPUs per consumer
num_episodes int Yes Total RL training episodes
train_dataset_config Dict Yes Dataset configuration
inference_model_config Dict Yes Model config for inference
grpo_config Dict Yes GRPO algorithm parameters
plugin_config Dict Yes ColossalAI plugin configuration
inference_backend str No "transformers" or "vllm" (default: "transformers")
num_generations int No Responses per prompt for GRPO (default: 8)

Outputs

Name Type Description
Ray actors RemoteRef Producer and consumer actors running training loops
Checkpoints Files Periodic model checkpoints saved by consumers

Usage Examples

from coati.distributed.launch import launch_distributed

launch_distributed(
    num_producers=2,
    num_proc_per_producer=1,
    num_consumer_procs=4,
    num_episodes=1000,
    inference_batch_size=16,
    inference_microbatch_size=4,
    train_batch_size=64,
    train_minibatch_size=8,
    train_dataset_config={"path": "/data/math_prompts.jsonl"},
    inference_model_config={"pretrained": "Qwen/Qwen2.5-3B"},
    generate_config={"temperature": 0.7, "top_p": 0.9},
    train_model_config={"pretrained": "Qwen/Qwen2.5-3B"},
    grpo_config={"num_generations": 8, "temperature": 0.7},
    plugin_config={"plugin_type": "zero2", "zero_stage": 2},
    num_generations=8,
    save_interval=50,
    save_dir="./grpo_model",
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment