Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Hpcaitech ColossalAI ColossalChat Training Environment

From Leeroopedia


Knowledge Sources
Domains RLHF, LLMs, Fine_Tuning
Last Updated 2026-02-09 03:00 GMT

Overview

Python environment with ColossalAI, PyTorch, and HuggingFace Transformers for SFT, DPO, KTO, ORPO, and GRPO training of large language models.

Description

ColossalChat's training environment provides a full stack for reinforcement learning from human feedback (RLHF) and preference-based fine-tuning of large language models. At its core, ColossalAI (>=0.4.7) supplies the distributed training backend, including Booster, HybridParallelPlugin, and DistCoordinator for orchestrating data, tensor, and pipeline parallelism across multiple GPUs. PyTorch (>=2.1.0) serves as the underlying deep learning framework, while HuggingFace Transformers (>=4.39.3) provides pretrained model architectures and tokenizers.

The dependency set covers the entire training lifecycle: datasets for loading and preprocessing corpora, loralib for parameter-efficient LoRA adapters, sentencepiece and tiktoken for tokenization, flash-attn for memory-efficient attention kernels, and wandb for optional experiment tracking. For GRPO reward verification workflows, math_verify, latex2sympy2_extended, and pyext handle mathematical expression parsing and evaluation. The environment also supports optional inference backends (vllm and sglang) for generation during reinforcement learning rollouts, and langchain, fastapi, and sse_starlette for serving and streaming interfaces.

Usage

This environment is required for all ColossalChat supervised fine-tuning (SFT), direct preference optimization (DPO), Kahneman-Tversky optimization (KTO), odds-ratio preference optimization (ORPO), and group relative policy optimization (GRPO) training workflows. It is also needed for dataset preparation pipelines (both SFT and preference formats), checkpoint saving/loading, and model instantiation via AutoModelForCausalLM. Any script under applications/ColossalChat/ that imports from coati.trainer or coati.distributed depends on this environment.

System Requirements

Category Requirement Notes
OS Linux Windows not supported (use WSL)
Hardware NVIDIA GPU NCCL backend required for distributed training
Hardware Multiple GPUs recommended For distributed SFT/DPO with data/tensor parallelism
Python Python >= 3.8 Required by ColossalAI and PyTorch >= 2.1.0
CUDA CUDA >= 11.8 Must match PyTorch build; CUDA 12.x recommended for Ampere/Hopper GPUs

Dependencies

System Packages

  • cuda-toolkit (matching PyTorch CUDA version)
  • ninja = 1.11.1

Python Packages

Package Version Constraint Purpose
colossalai >= 0.4.7 Distributed training framework (Booster, HybridParallelPlugin, DistCoordinator)
torch >= 2.1.0 Deep learning framework
transformers >= 4.39.3 Pretrained model architectures and tokenizers (>= 4.39.1 for Qwen2, >= 4.51.0 for Qwen3)
datasets == 2.14.7 HuggingFace dataset loading and processing
sentencepiece == 0.1.99 SentencePiece tokenizer for LLaMA-family models
tiktoken (latest) BPE tokenizer used by GPT/Qwen-family models
loralib (latest) Low-Rank Adaptation (LoRA) parameter-efficient fine-tuning
tokenizers (latest) HuggingFace fast tokenizer backend
langchain (latest) LLM application framework utilities
fastapi (latest) REST API serving framework
sse_starlette (latest) Server-Sent Events for streaming responses
wandb (latest) Weights & Biases experiment tracking (optional)
flash-attn (latest) Flash Attention memory-efficient attention kernels (optional, requires CUDA)
jsonlines (latest) JSONL file reading/writing for training data
math_verify (latest) Mathematical answer verification for GRPO reward computation
latex2sympy2_extended (latest) LaTeX-to-SymPy conversion for GRPO math reward parsing
pyext (latest) Python extension utilities for GRPO reward verification

Optional Inference Backends

Package Purpose
vllm High-throughput inference backend for RL rollout generation
sglang Structured generation language inference backend for RL rollouts

These are imported conditionally in coati/distributed/inference_backend.py and are not required for standard SFT/DPO training.

Credentials

  • WANDB_API_KEY: Weights & Biases API key for logging (optional)
  • HF_TOKEN: HuggingFace token for gated model access (optional)

Quick Install

pip install colossalai>=0.4.7 torch>=2.1.0 transformers>=4.39.3 datasets==2.14.7 flash-attn sentencepiece==0.1.99 wandb tiktoken jsonlines loralib

For GRPO reward verification support:

pip install math_verify latex2sympy2_extended pyext

For optional inference backends (RL rollout generation):

pip install vllm sglang

Code Evidence

Qwen2 Transformers Version Check

From colossalai/shardformer/policies/qwen2.py (lines 43-45):

import transformers
assert transformers.__version__ >= "4.39.1", \
    "Qwen2 model requires transformers>=4.39.1"

Qwen3 Transformers Version Check

From colossalai/shardformer/policies/qwen3.py (lines 43-45):

import transformers
assert transformers.__version__ >= "4.51.0", \
    "Qwen3 model requires transformers>=4.51.0"

Optional vllm/sglang Imports

From applications/ColossalChat/coati/distributed/inference_backend.py (lines 11-19):

try:
    import sglang
except ImportError:
    sglang = None

try:
    import vllm
except ImportError:
    vllm = None

SFT Trainer Core Imports

From applications/ColossalChat/coati/trainer/sft.py:

from colossalai.booster import Booster
from colossalai.shardformer import HybridParallelPlugin
from colossalai.cluster import DistCoordinator

Common Errors

Error Cause Fix
ImportError: No module named 'colossalai' ColossalAI not installed pip install colossalai>=0.4.7
AssertionError: Qwen2 model requires transformers>=4.39.1 Transformers version too old for Qwen2 sharding policy pip install transformers>=4.39.3
AssertionError: Qwen3 model requires transformers>=4.51.0 Transformers version too old for Qwen3 sharding policy pip install transformers>=4.51.0
RuntimeError: FlashAttention only supports Ampere GPUs or newer GPU compute capability < 8.0 Remove flash-attn or use a GPU with compute capability >= 8.0 (A100, H100, etc.)
ImportError: No module named 'flash_attn' flash-attn not installed (optional dependency) pip install flash-attn --no-build-isolation (requires CUDA toolkit and ninja)
RuntimeError: NCCL error Distributed backend failure Verify NCCL installation, ensure GPUs are visible via nvidia-smi, check MASTER_ADDR / MASTER_PORT env vars
datasets.builder.DatasetGenerationError Pinned datasets==2.14.7 incompatible with newer HuggingFace Hub API Ensure huggingface_hub version is compatible with datasets 2.14.7
ImportError: No module named 'vllm' vllm not installed for RL rollout inference backend Install with pip install vllm or switch to sglang backend; not required for SFT/DPO

Compatibility Notes

  • flash-attn requires a CUDA-capable GPU with compute capability >= 8.0 (Ampere architecture or newer: A100, A10G, H100, etc.). It also requires the CUDA toolkit and ninja build system to compile from source.
  • Qwen2 models require transformers>=4.39.1. The ColossalChat requirements already specify >=4.39.3, which satisfies this constraint.
  • Qwen3 models require transformers>=4.51.0. If training Qwen3 architectures, the transformers version must be upgraded beyond the minimum >=4.39.3 specified in requirements.txt.
  • datasets==2.14.7 is pinned to a specific version. Upgrading may break data loading pipelines; downgrading huggingface_hub may be needed if API incompatibilities arise.
  • sentencepiece==0.1.99 is pinned. Other versions may produce different tokenization results affecting training reproducibility.
  • vllm and sglang are conditionally imported and only required for inference-backed RL training (e.g., GRPO with online rollouts). Standard SFT and DPO training does not require either package.
  • PyTorch >= 2.1.0 is required; PyTorch 2.x introduces torch.compile support and improved FSDP which ColossalAI leverages internally.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment