Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Hpcaitech ColossalAI Booster Plugin Configuration

From Leeroopedia


Knowledge Sources
Domains Distributed_Computing, Optimization
Last Updated 2026-02-09 00:00 GMT

Overview

A distributed training orchestration pattern that wraps model, optimizer, dataloader, and scheduler with parallelism strategies through a plugin-based abstraction.

Description

The Booster-Plugin pattern is ColossalAI's core abstraction for distributed training. The Booster acts as a unified interface that applies a selected Plugin (parallelism strategy) to transparently handle model sharding, gradient synchronization, memory optimization, and data distribution. This decouples the training logic from the distributed infrastructure.

Available plugins include:

  • TorchDDPPlugin: Standard data parallelism
  • LowLevelZeroPlugin: ZeRO stages 1/2 for optimizer state and gradient partitioning
  • GeminiPlugin: Heterogeneous memory management (CPU+GPU)
  • HybridParallelPlugin: Combined tensor/pipeline/sequence/data parallelism (3D parallelism)

Usage

Use this principle whenever training a model with ColossalAI. The plugin choice depends on model size, number of GPUs, and memory constraints. For models that fit in a single GPU, use DDP. For large models requiring memory optimization, use ZeRO or Gemini. For very large models requiring model parallelism, use HybridParallel.

Theoretical Basis

The Booster-Plugin pattern implements a strategy design pattern:

  1. Plugin Selection: Choose parallelism strategy based on hardware and model size
  2. Model Wrapping: The plugin wraps the model for distributed execution (e.g., sharding layers across GPUs for tensor parallelism)
  3. Optimizer Wrapping: The optimizer is wrapped to handle partitioned gradients and optimizer states
  4. DataLoader Wrapping: The dataloader is wrapped with distributed samplers
  5. Unified Interface: All training operations (backward, step, save) go through the Booster

Key ZeRO stages:

  • Stage 1: Partition optimizer states across ranks
  • Stage 2: Additionally partition gradients across ranks
  • Stage 3: Additionally partition model parameters across ranks

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment