Implementation:Hpcaitech ColossalAI Launch Distributed
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, Infrastructure |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for launching the distributed producer-consumer RL training infrastructure using Ray, provided by ColossalChat.
Description
launch_distributed() is the main orchestrator for distributed GRPO training. It initializes a Ray cluster, discovers GPU resources across nodes, creates producer and consumer Ray actors with appropriate GPU allocations, and starts their training loops.
Usage
Called from the main RL entry script (rl_example.py) with all training configuration parameters.
Code Reference
Source Location
- Repository: ColossalAI
- File: applications/ColossalChat/coati/distributed/launch.py
- Lines: 36-191
Signature
def launch_distributed(
num_producers: int,
num_proc_per_producer: int,
num_consumer_procs: int,
num_episodes: int,
inference_batch_size: int,
inference_microbatch_size: int,
train_batch_size: int,
train_minibatch_size: int,
train_dataset_config: Dict[str, Any],
inference_model_config: Dict[str, Any],
generate_config: Dict[str, Any],
train_model_config: Dict[str, Any],
grpo_config: Dict[str, Any],
plugin_config: Dict[str, Any],
tokenizer_config: Optional[Dict[str, Any]] = None,
inference_backend: str = "transformers",
num_generations: int = 8,
master_addr: str = "localhost",
master_port: int = 29500,
core_algo: str = "GRPO",
project_name: Optional[str] = None,
save_interval: int = 100,
save_dir: str = "./model",
eval_dataset_config: Optional[Dict[str, Any]] = None,
eval_interval: int = 100,
n_behind: int = 0,
) -> None:
"""
Launch distributed RL training with Ray-based producer-consumer architecture.
"""
Import
from coati.distributed.launch import launch_distributed
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| num_producers | int | Yes | Number of inference worker actors |
| num_consumer_procs | int | Yes | Number of training GPUs per consumer |
| num_episodes | int | Yes | Total RL training episodes |
| train_dataset_config | Dict | Yes | Dataset configuration |
| inference_model_config | Dict | Yes | Model config for inference |
| grpo_config | Dict | Yes | GRPO algorithm parameters |
| plugin_config | Dict | Yes | ColossalAI plugin configuration |
| inference_backend | str | No | "transformers" or "vllm" (default: "transformers") |
| num_generations | int | No | Responses per prompt for GRPO (default: 8) |
Outputs
| Name | Type | Description |
|---|---|---|
| Ray actors | RemoteRef | Producer and consumer actors running training loops |
| Checkpoints | Files | Periodic model checkpoints saved by consumers |
Usage Examples
from coati.distributed.launch import launch_distributed
launch_distributed(
num_producers=2,
num_proc_per_producer=1,
num_consumer_procs=4,
num_episodes=1000,
inference_batch_size=16,
inference_microbatch_size=4,
train_batch_size=64,
train_minibatch_size=8,
train_dataset_config={"path": "/data/math_prompts.jsonl"},
inference_model_config={"pretrained": "Qwen/Qwen2.5-3B"},
generate_config={"temperature": 0.7, "top_p": 0.9},
train_model_config={"pretrained": "Qwen/Qwen2.5-3B"},
grpo_config={"num_generations": 8, "temperature": 0.7},
plugin_config={"plugin_type": "zero2", "zero_stage": 2},
num_generations=8,
save_interval=50,
save_dir="./grpo_model",
)
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment