Environment:Hpcaitech ColossalAI ColossalChat Training Environment

Knowledge Sources	ColossalChat
Domains	RLHF, LLMs, Fine_Tuning
Last Updated	2026-02-09 03:00 GMT

Overview

Python environment with ColossalAI, PyTorch, and HuggingFace Transformers for SFT, DPO, KTO, ORPO, and GRPO training of large language models.

Description

ColossalChat's training environment provides a full stack for reinforcement learning from human feedback (RLHF) and preference-based fine-tuning of large language models. At its core, ColossalAI (>=0.4.7) supplies the distributed training backend, including Booster, HybridParallelPlugin, and DistCoordinator for orchestrating data, tensor, and pipeline parallelism across multiple GPUs. PyTorch (>=2.1.0) serves as the underlying deep learning framework, while HuggingFace Transformers (>=4.39.3) provides pretrained model architectures and tokenizers.

The dependency set covers the entire training lifecycle: datasets for loading and preprocessing corpora, loralib for parameter-efficient LoRA adapters, sentencepiece and tiktoken for tokenization, flash-attn for memory-efficient attention kernels, and wandb for optional experiment tracking. For GRPO reward verification workflows, math_verify, latex2sympy2_extended, and pyext handle mathematical expression parsing and evaluation. The environment also supports optional inference backends (vllm and sglang) for generation during reinforcement learning rollouts, and langchain, fastapi, and sse_starlette for serving and streaming interfaces.

Usage

This environment is required for all ColossalChat supervised fine-tuning (SFT), direct preference optimization (DPO), Kahneman-Tversky optimization (KTO), odds-ratio preference optimization (ORPO), and group relative policy optimization (GRPO) training workflows. It is also needed for dataset preparation pipelines (both SFT and preference formats), checkpoint saving/loading, and model instantiation via AutoModelForCausalLM. Any script under applications/ColossalChat/ that imports from coati.trainer or coati.distributed depends on this environment.

System Requirements

Category	Requirement	Notes
OS	Linux	Windows not supported (use WSL)
Hardware	NVIDIA GPU	NCCL backend required for distributed training
Hardware	Multiple GPUs recommended	For distributed SFT/DPO with data/tensor parallelism
Python	Python >= 3.8	Required by ColossalAI and PyTorch >= 2.1.0
CUDA	CUDA >= 11.8	Must match PyTorch build; CUDA 12.x recommended for Ampere/Hopper GPUs

Dependencies

System Packages

cuda-toolkit (matching PyTorch CUDA version)
ninja = 1.11.1

Python Packages

Package	Version Constraint	Purpose
colossalai	>= 0.4.7	Distributed training framework (Booster, HybridParallelPlugin, DistCoordinator)
torch	>= 2.1.0	Deep learning framework
transformers	>= 4.39.3	Pretrained model architectures and tokenizers (>= 4.39.1 for Qwen2, >= 4.51.0 for Qwen3)
datasets	== 2.14.7	HuggingFace dataset loading and processing
sentencepiece	== 0.1.99	SentencePiece tokenizer for LLaMA-family models
tiktoken	(latest)	BPE tokenizer used by GPT/Qwen-family models
loralib	(latest)	Low-Rank Adaptation (LoRA) parameter-efficient fine-tuning
tokenizers	(latest)	HuggingFace fast tokenizer backend
langchain	(latest)	LLM application framework utilities
fastapi	(latest)	REST API serving framework
sse_starlette	(latest)	Server-Sent Events for streaming responses
wandb	(latest)	Weights & Biases experiment tracking (optional)
flash-attn	(latest)	Flash Attention memory-efficient attention kernels (optional, requires CUDA)
jsonlines	(latest)	JSONL file reading/writing for training data
math_verify	(latest)	Mathematical answer verification for GRPO reward computation
latex2sympy2_extended	(latest)	LaTeX-to-SymPy conversion for GRPO math reward parsing
pyext	(latest)	Python extension utilities for GRPO reward verification

Optional Inference Backends

Package	Purpose
vllm	High-throughput inference backend for RL rollout generation
sglang	Structured generation language inference backend for RL rollouts

These are imported conditionally in coati/distributed/inference_backend.py and are not required for standard SFT/DPO training.

Credentials

WANDB_API_KEY: Weights & Biases API key for logging (optional)
HF_TOKEN: HuggingFace token for gated model access (optional)

Quick Install

pip install colossalai>=0.4.7 torch>=2.1.0 transformers>=4.39.3 datasets==2.14.7 flash-attn sentencepiece==0.1.99 wandb tiktoken jsonlines loralib

For GRPO reward verification support:

pip install math_verify latex2sympy2_extended pyext

For optional inference backends (RL rollout generation):

pip install vllm sglang

Code Evidence

Qwen2 Transformers Version Check

From colossalai/shardformer/policies/qwen2.py (lines 43-45):

import transformers
assert transformers.__version__ >= "4.39.1", \
    "Qwen2 model requires transformers>=4.39.1"

Qwen3 Transformers Version Check

From colossalai/shardformer/policies/qwen3.py (lines 43-45):

import transformers
assert transformers.__version__ >= "4.51.0", \
    "Qwen3 model requires transformers>=4.51.0"

Optional vllm/sglang Imports

From applications/ColossalChat/coati/distributed/inference_backend.py (lines 11-19):

try:
    import sglang
except ImportError:
    sglang = None

try:
    import vllm
except ImportError:
    vllm = None

SFT Trainer Core Imports

From applications/ColossalChat/coati/trainer/sft.py:

from colossalai.booster import Booster
from colossalai.shardformer import HybridParallelPlugin
from colossalai.cluster import DistCoordinator

Common Errors

Error	Cause	Fix
`ImportError: No module named 'colossalai'`	ColossalAI not installed	`pip install colossalai>=0.4.7`
`AssertionError: Qwen2 model requires transformers>=4.39.1`	Transformers version too old for Qwen2 sharding policy	`pip install transformers>=4.39.3`
`AssertionError: Qwen3 model requires transformers>=4.51.0`	Transformers version too old for Qwen3 sharding policy	`pip install transformers>=4.51.0`
`RuntimeError: FlashAttention only supports Ampere GPUs or newer`	GPU compute capability < 8.0	Remove `flash-attn` or use a GPU with compute capability >= 8.0 (A100, H100, etc.)
`ImportError: No module named 'flash_attn'`	flash-attn not installed (optional dependency)	`pip install flash-attn --no-build-isolation` (requires CUDA toolkit and ninja)
`RuntimeError: NCCL error`	Distributed backend failure	Verify NCCL installation, ensure GPUs are visible via `nvidia-smi`, check `MASTER_ADDR` / `MASTER_PORT` env vars
`datasets.builder.DatasetGenerationError`	Pinned datasets==2.14.7 incompatible with newer HuggingFace Hub API	Ensure `huggingface_hub` version is compatible with datasets 2.14.7
`ImportError: No module named 'vllm'`	vllm not installed for RL rollout inference backend	Install with `pip install vllm` or switch to sglang backend; not required for SFT/DPO

Compatibility Notes

flash-attn requires a CUDA-capable GPU with compute capability >= 8.0 (Ampere architecture or newer: A100, A10G, H100, etc.). It also requires the CUDA toolkit and ninja build system to compile from source.
Qwen2 models require transformers>=4.39.1. The ColossalChat requirements already specify >=4.39.3, which satisfies this constraint.
Qwen3 models require transformers>=4.51.0. If training Qwen3 architectures, the transformers version must be upgraded beyond the minimum >=4.39.3 specified in requirements.txt.
datasets==2.14.7 is pinned to a specific version. Upgrading may break data loading pipelines; downgrading huggingface_hub may be needed if API incompatibilities arise.
sentencepiece==0.1.99 is pinned. Other versions may produce different tokenization results affecting training reproducibility.
vllm and sglang are conditionally imported and only required for inference-backed RL training (e.g., GRPO with online rollouts). Standard SFT and DPO training does not require either package.
PyTorch >= 2.1.0 is required; PyTorch 2.x introduces torch.compile support and improved FSDP which ColossalAI leverages internally.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment