Implementation:Hpcaitech ColossalAI SimpleProducer
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Distributed_Computing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Ray remote actor for generating GRPO experiences through multi-response generation and reward scoring, provided by ColossalChat.
Description
SimpleProducer is a Ray remote actor that implements the inference side of distributed GRPO. It loads a policy model (optionally with vLLM backend), generates multiple responses per prompt, scores them with reward functions, computes group-relative advantages, and sends experience batches to consumer actors.
Usage
Created automatically by launch_distributed(). The producer runs a continuous loop: receive weights -> generate experiences -> send to consumer.
Code Reference
Source Location
- Repository: ColossalAI
- File: applications/ColossalChat/coati/distributed/producer.py
- Lines: 422-514
Signature
@ray.remote
class SimpleProducer(BaseProducer):
def __init__(
self,
producer_idx: int,
num_producers: int,
num_consumer_procs: int,
num_episodes: int,
batch_size: int,
train_dataset_config: Dict[str, Any],
model_config: Dict[str, Any],
generate_config: Dict[str, Any],
tokenizer_config: Optional[Dict[str, Any]] = None,
microbatch_size: int = 1,
backend: str = "transformers",
num_generations: int = 8,
grpo_config: Dict[str, Any] = None,
):
"""GRPO inference producer actor."""
def rollout(self, input_ids, attention_mask, ...) -> Dict:
"""Generate responses and compute rewards."""
def loop(self) -> None:
"""Main producer loop: receive weights, generate, send experiences."""
Import
from coati.distributed.producer import SimpleProducer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_config | Dict | Yes | Model name/path and loading configuration |
| batch_size | int | Yes | Inference batch size |
| num_generations | int | No | Responses per prompt (default: 8) |
| backend | str | No | Inference backend: "transformers" or "vllm" |
| Updated weights | Dict[str, Tensor] | Yes | Received from consumer via Ray broadcast |
Outputs
| Name | Type | Description |
|---|---|---|
| Experience dict | Dict[str, Tensor] | Contains input_ids, attention_mask, action_mask, action_log_probs, reward, advantages |
Usage Examples
# Producers are created internally by launch_distributed()
# Direct creation (for reference):
import ray
producer = SimpleProducer.options(
num_gpus=1,
scheduling_strategy=ray.util.scheduling_strategies.NodeAffinitySchedulingStrategy(
node_id=ray.get_runtime_context().get_node_id(),
soft=False,
),
).remote(
producer_idx=0,
num_producers=2,
num_consumer_procs=4,
num_episodes=1000,
batch_size=16,
train_dataset_config={"path": "data.jsonl"},
model_config={"pretrained": "Qwen/Qwen2.5-3B"},
generate_config={"temperature": 0.7},
num_generations=8,
)
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment