Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Hpcaitech ColossalAI GRPO Distributed Environment

From Leeroopedia


Knowledge Sources
Domains Distributed_Training, Reinforcement_Learning, LLMs
Last Updated 2026-02-09 03:00 GMT

Overview

Multi-node distributed environment with Ray cluster, optional vLLM/SGLang inference backends, and NCCL communication for GRPO (Group Relative Policy Optimization) training.

Description

The GRPO distributed training environment implements a Producer-Consumer architecture orchestrated via a Ray cluster. Producers run inference (generating rollouts from the policy model) using one of several selectable backends: transformers (default), vLLM, or SGLang. Consumers run the actual GRPO policy gradient training loop using ColossalAI's HybridParallelPlugin (supporting TP, PP, DP, EP, and SP parallelism) with HybridAdam optimizer.

Model weights are synchronized between consumers (training side) and producers (inference side) via ray_broadcast_tensor_dict, which uses ray.util.collective with the NCCL backend for GPU-to-GPU tensor broadcasts. The system supports multiple producers and multiple consumer processes, with producers allocated to lower-indexed GPU nodes and consumers to higher-indexed nodes. Ray's NodeAffinitySchedulingStrategy is used implicitly through manual node assignment.

vLLM and SGLang are optional inference backends imported via try/except blocks. If neither is installed, the system falls back to the HuggingFace transformers backend. Reward verification for math tasks uses math_verify and latex2sympy2_extended; code reward verification calls an external API via pyext.

The environment also supports multiple RL algorithm variants via the same consumer class: GRPO, DAPO, REINFORCE_PPB, and RLOO (all mapped to GRPOConsumer in launch.py).

Usage

This environment is required whenever running distributed GRPO or GRPO-variant (DAPO, REINFORCE_PPB, RLOO) reinforcement learning training across multiple GPU nodes. It is activated by calling launch_distributed() from coati.distributed.launch.

System Requirements

Category Requirement Notes
OS Linux Required for NCCL and Ray
Hardware Multiple NVIDIA GPUs Separate GPUs for producers (inference) and consumers (training)
Network High-bandwidth interconnect NCCL requires fast GPU-to-GPU communication
Python >=3.8 Required by Ray and ColossalAI
PyTorch >=2.1.0 Per requirements.txt; version-specific API branching in comm.py for torch >=2.3.0, >=1.13.0

Dependencies

Python Packages

Package Required/Optional Purpose
ray Required Cluster orchestration, remote actors (@ray.remote), collective communication groups
ray.util.collective Required NCCL-based broadcast/allreduce for tensor synchronization between producers and consumers
vllm Optional High-throughput inference backend; provides LLM and SamplingParams classes
sglang Optional Alternative inference backend via sgl.Engine (currently disabled in BACKEND_MAP due to process stalling)
colossalai>=0.4.7 Required HybridParallelPlugin, Booster, HybridAdam, CosineAnnealingWarmupLR
transformers>=4.39.3 Required Default inference backend, tokenizer, model loading
torch>=2.1.0 Required Core tensor operations, distributed training
math_verify Required Math answer verification via parse() and verify() in reward functions
latex2sympy2_extended Required LaTeX normalization for math reward via NormalizationConfig
pyext Required Code execution sandbox for code reward verification
wandb Required Experiment tracking and metric logging (used in both producer and consumer)
packaging Required PyTorch version comparison in comm.py
flash-attn Required Flash attention for efficient transformer computation
datasets==2.14.7 Required Dataset loading

All ColossalChat Training Environment Dependencies

The full list of base dependencies is specified in applications/ColossalChat/requirements.txt, including tqdm, loralib, langchain, tokenizers, fastapi, sse_starlette, sentencepiece, gpustat, tensorboard, ninja, tiktoken, and jsonlines.

Credentials

Variable Purpose Required
WANDB_API_KEY Weights & Biases experiment tracking Optional (wandb will prompt interactively if not set)

Quick Install

# Core distributed dependencies
pip install ray[default] colossalai>=0.4.7 transformers>=4.39.3 torch>=2.1.0

# Reward verification
pip install math_verify latex2sympy2_extended pyext

# Optional inference backends (install one or both)
pip install vllm
# pip install sglang  # currently disabled in BACKEND_MAP

# Experiment tracking
pip install wandb

# Full ColossalChat requirements
pip install -r applications/ColossalChat/requirements.txt

Code Evidence

Ray Import (launch.py)

launch.py unconditionally imports Ray and uses it for cluster orchestration:

# applications/ColossalChat/coati/distributed/launch.py:6
import ray

Producers and consumers are created as Ray remote actors:

# applications/ColossalChat/coati/distributed/launch.py:120-121
producer = SimpleProducer.options(num_gpus=num_proc_per_producer).remote(...)

Optional vLLM Import (producer.py)

# applications/ColossalChat/coati/distributed/producer.py:28-31
try:
    from vllm import SamplingParams
except ImportError:
    LLM = None

Optional sglang and vLLM Imports (inference_backend.py)

# applications/ColossalChat/coati/distributed/inference_backend.py:11-19
try:
    import sglang as sgl
except ImportError:
    sgl = None

try:
    from vllm import LLM, SamplingParams
except ImportError:
    LLM = None

NCCL Communication via Ray Collective (comm.py)

# applications/ColossalChat/coati/distributed/comm.py:1-8
import ray
import ray.util.collective as cc
import torch
import torch.distributed.distributed_c10d as c10d
from packaging.version import Version

PyTorch Version-Specific API Branching (comm.py)

# applications/ColossalChat/coati/distributed/comm.py:14-19
if Version(torch.__version__) >= Version("2.3.0"):
    obj_tensor, size_tensor = c10d._object_to_tensor(obj, device=device, group=None)
elif Version(torch.__version__) >= Version("1.13.0"):
    obj_tensor, size_tensor = c10d._object_to_tensor(obj, device=device)
else:
    obj_tensor, size_tensor = c10d._object_to_tensor(obj)

Consumer Ray and Collective Imports (consumer.py)

# applications/ColossalChat/coati/distributed/consumer.py:4-5
import ray
import ray.util.collective as cc

GRPOConsumer Default Learning Rate (grpo_consumer.py)

# applications/ColossalChat/coati/distributed/grpo_consumer.py:76-79
self.optimizer = HybridAdam(
    self.policy_model.parameters(),
    lr=grpo_config.get("lr", 1e-6),
    weight_decay=grpo_config.get("weight_decay", 0.01),
)

Math Reward Verification (reward_fn.py)

# applications/ColossalChat/coati/distributed/reward/reward_fn.py:24-25
from latex2sympy2_extended import NormalizationConfig
from math_verify import ExprExtractionConfig, LatexExtractionConfig, parse, verify

Common Errors

Error Cause Resolution
ModuleNotFoundError: No module named 'ray' Ray not installed pip install ray[default]
ImportError: vllm is not installed vLLM backend selected but not installed pip install vllm or use inference_backend="transformers"
ImportError: sglang is not installed SGLang backend selected but not installed pip install sglang (note: SGLang backend is currently disabled in BACKEND_MAP)
ModuleNotFoundError: No module named 'math_verify' Reward verification dependency missing pip install math_verify
ModuleNotFoundError: No module named 'latex2sympy2_extended' LaTeX parsing dependency missing pip install latex2sympy2_extended
AttributeError: _object_to_tensor() with unexpected arguments PyTorch version mismatch with comm.py version branching Ensure torch>=2.1.0; the code handles torch >=2.3.0, >=1.13.0, and older via branching
NCCL initialization failure GPU communication misconfigured Ensure all nodes have NCCL-compatible GPUs and network connectivity; check NCCL_SOCKET_IFNAME and NCCL_IB_DISABLE env vars
ValueError: Unexpected backend {backend} Invalid inference backend name Use one of: "transformers", "vllm" (sglang is defined but commented out in BACKEND_MAP)
Rollout log file ... already exists Resuming a run without cleaning up previous rollout logs Delete the existing rollout log file or change the project name

Compatibility Notes

  • PyTorch version branching: comm.py uses packaging.version.Version to detect the PyTorch version at runtime and selects the appropriate c10d._object_to_tensor / c10d._tensor_to_object API signatures. Three branches exist: torch >=2.3.0 (passes group=None), torch >=1.13.0 (passes device only), and older versions (no extra kwargs). The minimum required version per requirements.txt is torch >=2.1.0.
  • vLLM and SGLang are optional: Both are imported via try/except in inference_backend.py and producer.py. If not installed, they default to None and raise ImportError only when explicitly selected as backend. Note that the SGLang backend is currently commented out in BACKEND_MAP with the comment "sglang backend will stuck the process due to unknown reason".
  • Gloo backend fallback: ray_broadcast_tensor_dict in comm.py includes special handling for bfloat16 tensors when using the Gloo backend (which does not support bfloat16), temporarily casting to float16.
  • Algorithm variants: ALGO_MAP in launch.py maps "GRPO", "DAPO", "REINFORCE_PPB", and "RLOO" all to the same GRPOConsumer class, which implements algorithm-specific advantage computation internally.
  • ColossalAI version: Requires colossalai>=0.4.7 for HybridParallelPlugin and Booster support.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment