Environment:Hpcaitech ColossalAI ColossalChat Training Environment
| Knowledge Sources | |
|---|---|
| Domains | RLHF, LLMs, Fine_Tuning |
| Last Updated | 2026-02-09 03:00 GMT |
Overview
Python environment with ColossalAI, PyTorch, and HuggingFace Transformers for SFT, DPO, KTO, ORPO, and GRPO training of large language models.
Description
ColossalChat's training environment provides a full stack for reinforcement learning from human feedback (RLHF) and preference-based fine-tuning of large language models. At its core, ColossalAI (>=0.4.7) supplies the distributed training backend, including Booster, HybridParallelPlugin, and DistCoordinator for orchestrating data, tensor, and pipeline parallelism across multiple GPUs. PyTorch (>=2.1.0) serves as the underlying deep learning framework, while HuggingFace Transformers (>=4.39.3) provides pretrained model architectures and tokenizers.
The dependency set covers the entire training lifecycle: datasets for loading and preprocessing corpora, loralib for parameter-efficient LoRA adapters, sentencepiece and tiktoken for tokenization, flash-attn for memory-efficient attention kernels, and wandb for optional experiment tracking. For GRPO reward verification workflows, math_verify, latex2sympy2_extended, and pyext handle mathematical expression parsing and evaluation. The environment also supports optional inference backends (vllm and sglang) for generation during reinforcement learning rollouts, and langchain, fastapi, and sse_starlette for serving and streaming interfaces.
Usage
This environment is required for all ColossalChat supervised fine-tuning (SFT), direct preference optimization (DPO), Kahneman-Tversky optimization (KTO), odds-ratio preference optimization (ORPO), and group relative policy optimization (GRPO) training workflows. It is also needed for dataset preparation pipelines (both SFT and preference formats), checkpoint saving/loading, and model instantiation via AutoModelForCausalLM. Any script under applications/ColossalChat/ that imports from coati.trainer or coati.distributed depends on this environment.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | Windows not supported (use WSL) |
| Hardware | NVIDIA GPU | NCCL backend required for distributed training |
| Hardware | Multiple GPUs recommended | For distributed SFT/DPO with data/tensor parallelism |
| Python | Python >= 3.8 | Required by ColossalAI and PyTorch >= 2.1.0 |
| CUDA | CUDA >= 11.8 | Must match PyTorch build; CUDA 12.x recommended for Ampere/Hopper GPUs |
Dependencies
System Packages
cuda-toolkit(matching PyTorch CUDA version)ninja= 1.11.1
Python Packages
| Package | Version Constraint | Purpose |
|---|---|---|
| colossalai | >= 0.4.7 | Distributed training framework (Booster, HybridParallelPlugin, DistCoordinator) |
| torch | >= 2.1.0 | Deep learning framework |
| transformers | >= 4.39.3 | Pretrained model architectures and tokenizers (>= 4.39.1 for Qwen2, >= 4.51.0 for Qwen3) |
| datasets | == 2.14.7 | HuggingFace dataset loading and processing |
| sentencepiece | == 0.1.99 | SentencePiece tokenizer for LLaMA-family models |
| tiktoken | (latest) | BPE tokenizer used by GPT/Qwen-family models |
| loralib | (latest) | Low-Rank Adaptation (LoRA) parameter-efficient fine-tuning |
| tokenizers | (latest) | HuggingFace fast tokenizer backend |
| langchain | (latest) | LLM application framework utilities |
| fastapi | (latest) | REST API serving framework |
| sse_starlette | (latest) | Server-Sent Events for streaming responses |
| wandb | (latest) | Weights & Biases experiment tracking (optional) |
| flash-attn | (latest) | Flash Attention memory-efficient attention kernels (optional, requires CUDA) |
| jsonlines | (latest) | JSONL file reading/writing for training data |
| math_verify | (latest) | Mathematical answer verification for GRPO reward computation |
| latex2sympy2_extended | (latest) | LaTeX-to-SymPy conversion for GRPO math reward parsing |
| pyext | (latest) | Python extension utilities for GRPO reward verification |
Optional Inference Backends
| Package | Purpose |
|---|---|
| vllm | High-throughput inference backend for RL rollout generation |
| sglang | Structured generation language inference backend for RL rollouts |
These are imported conditionally in coati/distributed/inference_backend.py and are not required for standard SFT/DPO training.
Credentials
WANDB_API_KEY: Weights & Biases API key for logging (optional)HF_TOKEN: HuggingFace token for gated model access (optional)
Quick Install
pip install colossalai>=0.4.7 torch>=2.1.0 transformers>=4.39.3 datasets==2.14.7 flash-attn sentencepiece==0.1.99 wandb tiktoken jsonlines loralib
For GRPO reward verification support:
pip install math_verify latex2sympy2_extended pyext
For optional inference backends (RL rollout generation):
pip install vllm sglang
Code Evidence
Qwen2 Transformers Version Check
From colossalai/shardformer/policies/qwen2.py (lines 43-45):
import transformers
assert transformers.__version__ >= "4.39.1", \
"Qwen2 model requires transformers>=4.39.1"
Qwen3 Transformers Version Check
From colossalai/shardformer/policies/qwen3.py (lines 43-45):
import transformers
assert transformers.__version__ >= "4.51.0", \
"Qwen3 model requires transformers>=4.51.0"
Optional vllm/sglang Imports
From applications/ColossalChat/coati/distributed/inference_backend.py (lines 11-19):
try:
import sglang
except ImportError:
sglang = None
try:
import vllm
except ImportError:
vllm = None
SFT Trainer Core Imports
From applications/ColossalChat/coati/trainer/sft.py:
from colossalai.booster import Booster
from colossalai.shardformer import HybridParallelPlugin
from colossalai.cluster import DistCoordinator
Common Errors
| Error | Cause | Fix |
|---|---|---|
ImportError: No module named 'colossalai' |
ColossalAI not installed | pip install colossalai>=0.4.7
|
AssertionError: Qwen2 model requires transformers>=4.39.1 |
Transformers version too old for Qwen2 sharding policy | pip install transformers>=4.39.3
|
AssertionError: Qwen3 model requires transformers>=4.51.0 |
Transformers version too old for Qwen3 sharding policy | pip install transformers>=4.51.0
|
RuntimeError: FlashAttention only supports Ampere GPUs or newer |
GPU compute capability < 8.0 | Remove flash-attn or use a GPU with compute capability >= 8.0 (A100, H100, etc.)
|
ImportError: No module named 'flash_attn' |
flash-attn not installed (optional dependency) | pip install flash-attn --no-build-isolation (requires CUDA toolkit and ninja)
|
RuntimeError: NCCL error |
Distributed backend failure | Verify NCCL installation, ensure GPUs are visible via nvidia-smi, check MASTER_ADDR / MASTER_PORT env vars
|
datasets.builder.DatasetGenerationError |
Pinned datasets==2.14.7 incompatible with newer HuggingFace Hub API | Ensure huggingface_hub version is compatible with datasets 2.14.7
|
ImportError: No module named 'vllm' |
vllm not installed for RL rollout inference backend | Install with pip install vllm or switch to sglang backend; not required for SFT/DPO
|
Compatibility Notes
- flash-attn requires a CUDA-capable GPU with compute capability >= 8.0 (Ampere architecture or newer: A100, A10G, H100, etc.). It also requires the CUDA toolkit and
ninjabuild system to compile from source. - Qwen2 models require
transformers>=4.39.1. The ColossalChat requirements already specify>=4.39.3, which satisfies this constraint. - Qwen3 models require
transformers>=4.51.0. If training Qwen3 architectures, the transformers version must be upgraded beyond the minimum>=4.39.3specified in requirements.txt. - datasets==2.14.7 is pinned to a specific version. Upgrading may break data loading pipelines; downgrading
huggingface_hubmay be needed if API incompatibilities arise. - sentencepiece==0.1.99 is pinned. Other versions may produce different tokenization results affecting training reproducibility.
- vllm and sglang are conditionally imported and only required for inference-backed RL training (e.g., GRPO with online rollouts). Standard SFT and DPO training does not require either package.
- PyTorch >= 2.1.0 is required; PyTorch 2.x introduces
torch.compilesupport and improved FSDP which ColossalAI leverages internally.
Related Pages
- Implementation:Hpcaitech_ColossalAI_SFTTrainer
- Implementation:Hpcaitech_ColossalAI_DPOTrainer
- Implementation:Hpcaitech_ColossalAI_Prepare_Dataset_SFT
- Implementation:Hpcaitech_ColossalAI_Prepare_Dataset_Preference
- Implementation:Hpcaitech_ColossalAI_DataCollatorForPreferenceDataset
- Implementation:Hpcaitech_ColossalAI_Save_Checkpoint_SFT
- Implementation:Hpcaitech_ColossalAI_AutoModelForCausalLM_SFT