Environment:Hpcaitech ColossalAI GRPO Distributed Environment
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Training, Reinforcement_Learning, LLMs |
| Last Updated | 2026-02-09 03:00 GMT |
Overview
Multi-node distributed environment with Ray cluster, optional vLLM/SGLang inference backends, and NCCL communication for GRPO (Group Relative Policy Optimization) training.
Description
The GRPO distributed training environment implements a Producer-Consumer architecture orchestrated via a Ray cluster. Producers run inference (generating rollouts from the policy model) using one of several selectable backends: transformers (default), vLLM, or SGLang. Consumers run the actual GRPO policy gradient training loop using ColossalAI's HybridParallelPlugin (supporting TP, PP, DP, EP, and SP parallelism) with HybridAdam optimizer.
Model weights are synchronized between consumers (training side) and producers (inference side) via ray_broadcast_tensor_dict, which uses ray.util.collective with the NCCL backend for GPU-to-GPU tensor broadcasts. The system supports multiple producers and multiple consumer processes, with producers allocated to lower-indexed GPU nodes and consumers to higher-indexed nodes. Ray's NodeAffinitySchedulingStrategy is used implicitly through manual node assignment.
vLLM and SGLang are optional inference backends imported via try/except blocks. If neither is installed, the system falls back to the HuggingFace transformers backend. Reward verification for math tasks uses math_verify and latex2sympy2_extended; code reward verification calls an external API via pyext.
The environment also supports multiple RL algorithm variants via the same consumer class: GRPO, DAPO, REINFORCE_PPB, and RLOO (all mapped to GRPOConsumer in launch.py).
Usage
This environment is required whenever running distributed GRPO or GRPO-variant (DAPO, REINFORCE_PPB, RLOO) reinforcement learning training across multiple GPU nodes. It is activated by calling launch_distributed() from coati.distributed.launch.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | Required for NCCL and Ray |
| Hardware | Multiple NVIDIA GPUs | Separate GPUs for producers (inference) and consumers (training) |
| Network | High-bandwidth interconnect | NCCL requires fast GPU-to-GPU communication |
| Python | >=3.8 | Required by Ray and ColossalAI |
| PyTorch | >=2.1.0 | Per requirements.txt; version-specific API branching in comm.py for torch >=2.3.0, >=1.13.0
|
Dependencies
Python Packages
| Package | Required/Optional | Purpose |
|---|---|---|
ray |
Required | Cluster orchestration, remote actors (@ray.remote), collective communication groups
|
ray.util.collective |
Required | NCCL-based broadcast/allreduce for tensor synchronization between producers and consumers |
vllm |
Optional | High-throughput inference backend; provides LLM and SamplingParams classes
|
sglang |
Optional | Alternative inference backend via sgl.Engine (currently disabled in BACKEND_MAP due to process stalling)
|
colossalai>=0.4.7 |
Required | HybridParallelPlugin, Booster, HybridAdam, CosineAnnealingWarmupLR
|
transformers>=4.39.3 |
Required | Default inference backend, tokenizer, model loading |
torch>=2.1.0 |
Required | Core tensor operations, distributed training |
math_verify |
Required | Math answer verification via parse() and verify() in reward functions
|
latex2sympy2_extended |
Required | LaTeX normalization for math reward via NormalizationConfig
|
pyext |
Required | Code execution sandbox for code reward verification |
wandb |
Required | Experiment tracking and metric logging (used in both producer and consumer) |
packaging |
Required | PyTorch version comparison in comm.py
|
flash-attn |
Required | Flash attention for efficient transformer computation |
datasets==2.14.7 |
Required | Dataset loading |
All ColossalChat Training Environment Dependencies
The full list of base dependencies is specified in applications/ColossalChat/requirements.txt, including tqdm, loralib, langchain, tokenizers, fastapi, sse_starlette, sentencepiece, gpustat, tensorboard, ninja, tiktoken, and jsonlines.
Credentials
| Variable | Purpose | Required |
|---|---|---|
WANDB_API_KEY |
Weights & Biases experiment tracking | Optional (wandb will prompt interactively if not set) |
Quick Install
# Core distributed dependencies
pip install ray[default] colossalai>=0.4.7 transformers>=4.39.3 torch>=2.1.0
# Reward verification
pip install math_verify latex2sympy2_extended pyext
# Optional inference backends (install one or both)
pip install vllm
# pip install sglang # currently disabled in BACKEND_MAP
# Experiment tracking
pip install wandb
# Full ColossalChat requirements
pip install -r applications/ColossalChat/requirements.txt
Code Evidence
Ray Import (launch.py)
launch.py unconditionally imports Ray and uses it for cluster orchestration:
# applications/ColossalChat/coati/distributed/launch.py:6
import ray
Producers and consumers are created as Ray remote actors:
# applications/ColossalChat/coati/distributed/launch.py:120-121
producer = SimpleProducer.options(num_gpus=num_proc_per_producer).remote(...)
Optional vLLM Import (producer.py)
# applications/ColossalChat/coati/distributed/producer.py:28-31
try:
from vllm import SamplingParams
except ImportError:
LLM = None
Optional sglang and vLLM Imports (inference_backend.py)
# applications/ColossalChat/coati/distributed/inference_backend.py:11-19
try:
import sglang as sgl
except ImportError:
sgl = None
try:
from vllm import LLM, SamplingParams
except ImportError:
LLM = None
NCCL Communication via Ray Collective (comm.py)
# applications/ColossalChat/coati/distributed/comm.py:1-8
import ray
import ray.util.collective as cc
import torch
import torch.distributed.distributed_c10d as c10d
from packaging.version import Version
PyTorch Version-Specific API Branching (comm.py)
# applications/ColossalChat/coati/distributed/comm.py:14-19
if Version(torch.__version__) >= Version("2.3.0"):
obj_tensor, size_tensor = c10d._object_to_tensor(obj, device=device, group=None)
elif Version(torch.__version__) >= Version("1.13.0"):
obj_tensor, size_tensor = c10d._object_to_tensor(obj, device=device)
else:
obj_tensor, size_tensor = c10d._object_to_tensor(obj)
Consumer Ray and Collective Imports (consumer.py)
# applications/ColossalChat/coati/distributed/consumer.py:4-5
import ray
import ray.util.collective as cc
GRPOConsumer Default Learning Rate (grpo_consumer.py)
# applications/ColossalChat/coati/distributed/grpo_consumer.py:76-79
self.optimizer = HybridAdam(
self.policy_model.parameters(),
lr=grpo_config.get("lr", 1e-6),
weight_decay=grpo_config.get("weight_decay", 0.01),
)
Math Reward Verification (reward_fn.py)
# applications/ColossalChat/coati/distributed/reward/reward_fn.py:24-25
from latex2sympy2_extended import NormalizationConfig
from math_verify import ExprExtractionConfig, LatexExtractionConfig, parse, verify
Common Errors
| Error | Cause | Resolution |
|---|---|---|
ModuleNotFoundError: No module named 'ray' |
Ray not installed | pip install ray[default]
|
ImportError: vllm is not installed |
vLLM backend selected but not installed | pip install vllm or use inference_backend="transformers"
|
ImportError: sglang is not installed |
SGLang backend selected but not installed | pip install sglang (note: SGLang backend is currently disabled in BACKEND_MAP)
|
ModuleNotFoundError: No module named 'math_verify' |
Reward verification dependency missing | pip install math_verify
|
ModuleNotFoundError: No module named 'latex2sympy2_extended' |
LaTeX parsing dependency missing | pip install latex2sympy2_extended
|
AttributeError: _object_to_tensor() with unexpected arguments |
PyTorch version mismatch with comm.py version branching |
Ensure torch>=2.1.0; the code handles torch >=2.3.0, >=1.13.0, and older via branching
|
| NCCL initialization failure | GPU communication misconfigured | Ensure all nodes have NCCL-compatible GPUs and network connectivity; check NCCL_SOCKET_IFNAME and NCCL_IB_DISABLE env vars
|
ValueError: Unexpected backend {backend} |
Invalid inference backend name | Use one of: "transformers", "vllm" (sglang is defined but commented out in BACKEND_MAP)
|
Rollout log file ... already exists |
Resuming a run without cleaning up previous rollout logs | Delete the existing rollout log file or change the project name |
Compatibility Notes
- PyTorch version branching:
comm.pyusespackaging.version.Versionto detect the PyTorch version at runtime and selects the appropriatec10d._object_to_tensor/c10d._tensor_to_objectAPI signatures. Three branches exist: torch >=2.3.0 (passesgroup=None), torch >=1.13.0 (passesdeviceonly), and older versions (no extra kwargs). The minimum required version perrequirements.txtis torch >=2.1.0. - vLLM and SGLang are optional: Both are imported via
try/exceptininference_backend.pyandproducer.py. If not installed, they default toNoneand raiseImportErroronly when explicitly selected as backend. Note that the SGLang backend is currently commented out inBACKEND_MAPwith the comment "sglang backend will stuck the process due to unknown reason". - Gloo backend fallback:
ray_broadcast_tensor_dictincomm.pyincludes special handling forbfloat16tensors when using the Gloo backend (which does not support bfloat16), temporarily casting tofloat16. - Algorithm variants:
ALGO_MAPinlaunch.pymaps "GRPO", "DAPO", "REINFORCE_PPB", and "RLOO" all to the sameGRPOConsumerclass, which implements algorithm-specific advantage computation internally. - ColossalAI version: Requires
colossalai>=0.4.7forHybridParallelPluginandBoostersupport.
Related Pages
- Implementation:Hpcaitech_ColossalAI_Launch_Distributed
- Implementation:Hpcaitech_ColossalAI_SimpleProducer
- Implementation:Hpcaitech_ColossalAI_GRPOConsumer
- Implementation:Hpcaitech_ColossalAI_Ray_Broadcast_Tensor_Dict
- Implementation:Hpcaitech_ColossalAI_RLVRRewardModel
- Implementation:Hpcaitech_ColossalAI_PolicyLoss