Environment:Hpcaitech ColossalAI GRPO Distributed Environment

Knowledge Sources	ColossalChat Distributed
Domains	Distributed_Training, Reinforcement_Learning, LLMs
Last Updated	2026-02-09 03:00 GMT

Overview

Multi-node distributed environment with Ray cluster, optional vLLM/SGLang inference backends, and NCCL communication for GRPO (Group Relative Policy Optimization) training.

Description

The GRPO distributed training environment implements a Producer-Consumer architecture orchestrated via a Ray cluster. Producers run inference (generating rollouts from the policy model) using one of several selectable backends: transformers (default), vLLM, or SGLang. Consumers run the actual GRPO policy gradient training loop using ColossalAI's HybridParallelPlugin (supporting TP, PP, DP, EP, and SP parallelism) with HybridAdam optimizer.

Model weights are synchronized between consumers (training side) and producers (inference side) via ray_broadcast_tensor_dict, which uses ray.util.collective with the NCCL backend for GPU-to-GPU tensor broadcasts. The system supports multiple producers and multiple consumer processes, with producers allocated to lower-indexed GPU nodes and consumers to higher-indexed nodes. Ray's NodeAffinitySchedulingStrategy is used implicitly through manual node assignment.

vLLM and SGLang are optional inference backends imported via try/except blocks. If neither is installed, the system falls back to the HuggingFace transformers backend. Reward verification for math tasks uses math_verify and latex2sympy2_extended; code reward verification calls an external API via pyext.

The environment also supports multiple RL algorithm variants via the same consumer class: GRPO, DAPO, REINFORCE_PPB, and RLOO (all mapped to GRPOConsumer in launch.py).

Usage

This environment is required whenever running distributed GRPO or GRPO-variant (DAPO, REINFORCE_PPB, RLOO) reinforcement learning training across multiple GPU nodes. It is activated by calling launch_distributed() from coati.distributed.launch.

System Requirements

Category	Requirement	Notes
OS	Linux	Required for NCCL and Ray
Hardware	Multiple NVIDIA GPUs	Separate GPUs for producers (inference) and consumers (training)
Network	High-bandwidth interconnect	NCCL requires fast GPU-to-GPU communication
Python	>=3.8	Required by Ray and ColossalAI
PyTorch	>=2.1.0	Per `requirements.txt`; version-specific API branching in `comm.py` for torch >=2.3.0, >=1.13.0

Dependencies

Python Packages

Package	Required/Optional	Purpose
`ray`	Required	Cluster orchestration, remote actors (`@ray.remote`), collective communication groups
`ray.util.collective`	Required	NCCL-based broadcast/allreduce for tensor synchronization between producers and consumers
`vllm`	Optional	High-throughput inference backend; provides `LLM` and `SamplingParams` classes
`sglang`	Optional	Alternative inference backend via `sgl.Engine` (currently disabled in `BACKEND_MAP` due to process stalling)
`colossalai>=0.4.7`	Required	`HybridParallelPlugin`, `Booster`, `HybridAdam`, `CosineAnnealingWarmupLR`
`transformers>=4.39.3`	Required	Default inference backend, tokenizer, model loading
`torch>=2.1.0`	Required	Core tensor operations, distributed training
`math_verify`	Required	Math answer verification via `parse()` and `verify()` in reward functions
`latex2sympy2_extended`	Required	LaTeX normalization for math reward via `NormalizationConfig`
`pyext`	Required	Code execution sandbox for code reward verification
`wandb`	Required	Experiment tracking and metric logging (used in both producer and consumer)
`packaging`	Required	PyTorch version comparison in `comm.py`
`flash-attn`	Required	Flash attention for efficient transformer computation
`datasets==2.14.7`	Required	Dataset loading

All ColossalChat Training Environment Dependencies

The full list of base dependencies is specified in applications/ColossalChat/requirements.txt, including tqdm, loralib, langchain, tokenizers, fastapi, sse_starlette, sentencepiece, gpustat, tensorboard, ninja, tiktoken, and jsonlines.

Credentials

Variable	Purpose	Required
`WANDB_API_KEY`	Weights & Biases experiment tracking	Optional (wandb will prompt interactively if not set)

Quick Install

# Core distributed dependencies
pip install ray[default] colossalai>=0.4.7 transformers>=4.39.3 torch>=2.1.0

# Reward verification
pip install math_verify latex2sympy2_extended pyext

# Optional inference backends (install one or both)
pip install vllm
# pip install sglang  # currently disabled in BACKEND_MAP

# Experiment tracking
pip install wandb

# Full ColossalChat requirements
pip install -r applications/ColossalChat/requirements.txt

Code Evidence

Ray Import (launch.py)

launch.py unconditionally imports Ray and uses it for cluster orchestration:

# applications/ColossalChat/coati/distributed/launch.py:6
import ray

Producers and consumers are created as Ray remote actors:

# applications/ColossalChat/coati/distributed/launch.py:120-121
producer = SimpleProducer.options(num_gpus=num_proc_per_producer).remote(...)

Optional vLLM Import (producer.py)

# applications/ColossalChat/coati/distributed/producer.py:28-31
try:
    from vllm import SamplingParams
except ImportError:
    LLM = None

Optional sglang and vLLM Imports (inference_backend.py)

# applications/ColossalChat/coati/distributed/inference_backend.py:11-19
try:
    import sglang as sgl
except ImportError:
    sgl = None

try:
    from vllm import LLM, SamplingParams
except ImportError:
    LLM = None

NCCL Communication via Ray Collective (comm.py)

# applications/ColossalChat/coati/distributed/comm.py:1-8
import ray
import ray.util.collective as cc
import torch
import torch.distributed.distributed_c10d as c10d
from packaging.version import Version

PyTorch Version-Specific API Branching (comm.py)

# applications/ColossalChat/coati/distributed/comm.py:14-19
if Version(torch.__version__) >= Version("2.3.0"):
    obj_tensor, size_tensor = c10d._object_to_tensor(obj, device=device, group=None)
elif Version(torch.__version__) >= Version("1.13.0"):
    obj_tensor, size_tensor = c10d._object_to_tensor(obj, device=device)
else:
    obj_tensor, size_tensor = c10d._object_to_tensor(obj)

Consumer Ray and Collective Imports (consumer.py)

# applications/ColossalChat/coati/distributed/consumer.py:4-5
import ray
import ray.util.collective as cc

GRPOConsumer Default Learning Rate (grpo_consumer.py)

# applications/ColossalChat/coati/distributed/grpo_consumer.py:76-79
self.optimizer = HybridAdam(
    self.policy_model.parameters(),
    lr=grpo_config.get("lr", 1e-6),
    weight_decay=grpo_config.get("weight_decay", 0.01),
)

Math Reward Verification (reward_fn.py)

# applications/ColossalChat/coati/distributed/reward/reward_fn.py:24-25
from latex2sympy2_extended import NormalizationConfig
from math_verify import ExprExtractionConfig, LatexExtractionConfig, parse, verify

Common Errors

Error	Cause	Resolution
`ModuleNotFoundError: No module named 'ray'`	Ray not installed	`pip install ray[default]`
`ImportError: vllm is not installed`	vLLM backend selected but not installed	`pip install vllm` or use `inference_backend="transformers"`
`ImportError: sglang is not installed`	SGLang backend selected but not installed	`pip install sglang` (note: SGLang backend is currently disabled in `BACKEND_MAP`)
`ModuleNotFoundError: No module named 'math_verify'`	Reward verification dependency missing	`pip install math_verify`
`ModuleNotFoundError: No module named 'latex2sympy2_extended'`	LaTeX parsing dependency missing	`pip install latex2sympy2_extended`
`AttributeError: _object_to_tensor()` with unexpected arguments	PyTorch version mismatch with `comm.py` version branching	Ensure `torch>=2.1.0`; the code handles torch >=2.3.0, >=1.13.0, and older via branching
NCCL initialization failure	GPU communication misconfigured	Ensure all nodes have NCCL-compatible GPUs and network connectivity; check `NCCL_SOCKET_IFNAME` and `NCCL_IB_DISABLE` env vars
`ValueError: Unexpected backend {backend}`	Invalid inference backend name	Use one of: `"transformers"`, `"vllm"` (sglang is defined but commented out in `BACKEND_MAP`)
`Rollout log file ... already exists`	Resuming a run without cleaning up previous rollout logs	Delete the existing rollout log file or change the project name

Compatibility Notes

PyTorch version branching: comm.py uses packaging.version.Version to detect the PyTorch version at runtime and selects the appropriate c10d._object_to_tensor / c10d._tensor_to_object API signatures. Three branches exist: torch >=2.3.0 (passes group=None), torch >=1.13.0 (passes device only), and older versions (no extra kwargs). The minimum required version per requirements.txt is torch >=2.1.0.
vLLM and SGLang are optional: Both are imported via try/except in inference_backend.py and producer.py. If not installed, they default to None and raise ImportError only when explicitly selected as backend. Note that the SGLang backend is currently commented out in BACKEND_MAP with the comment "sglang backend will stuck the process due to unknown reason".
Gloo backend fallback: ray_broadcast_tensor_dict in comm.py includes special handling for bfloat16 tensors when using the Gloo backend (which does not support bfloat16), temporarily casting to float16.
Algorithm variants: ALGO_MAP in launch.py maps "GRPO", "DAPO", "REINFORCE_PPB", and "RLOO" all to the same GRPOConsumer class, which implements algorithm-specific advantage computation internally.
ColossalAI version: Requires colossalai>=0.4.7 for HybridParallelPlugin and Booster support.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment