Principle:Hpcaitech ColossalAI GRPO Consumer Setup
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Distributed_Computing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A training worker pattern that receives experiences from producers and updates the policy model using the GRPO objective with ColossalAI distributed training.
Description
The GRPO Consumer is a training worker that receives experience batches (generated responses, log probabilities, rewards, advantages) from producer actors and updates the policy model. It uses ColossalAI's Booster for distributed training, supporting ZeRO and hybrid parallelism. After each update, it broadcasts updated weights back to producers.
Usage
Consumers are automatically created by launch_distributed(). Configure the number of consumer GPUs based on model size and available memory.
Theoretical Basis
The consumer minimizes the GRPO policy loss with importance sampling: