Implementation:Hpcaitech ColossalAI SimpleProducer

Knowledge Sources	ColossalAI
Domains	Reinforcement_Learning, Distributed_Computing
Last Updated	2026-02-09 00:00 GMT

Overview

Ray remote actor for generating GRPO experiences through multi-response generation and reward scoring, provided by ColossalChat.

Description

SimpleProducer is a Ray remote actor that implements the inference side of distributed GRPO. It loads a policy model (optionally with vLLM backend), generates multiple responses per prompt, scores them with reward functions, computes group-relative advantages, and sends experience batches to consumer actors.

Usage

Created automatically by launch_distributed(). The producer runs a continuous loop: receive weights -> generate experiences -> send to consumer.

Code Reference

Source Location

Repository: ColossalAI
File: applications/ColossalChat/coati/distributed/producer.py
Lines: 422-514

Signature

@ray.remote
class SimpleProducer(BaseProducer):
    def __init__(
        self,
        producer_idx: int,
        num_producers: int,
        num_consumer_procs: int,
        num_episodes: int,
        batch_size: int,
        train_dataset_config: Dict[str, Any],
        model_config: Dict[str, Any],
        generate_config: Dict[str, Any],
        tokenizer_config: Optional[Dict[str, Any]] = None,
        microbatch_size: int = 1,
        backend: str = "transformers",
        num_generations: int = 8,
        grpo_config: Dict[str, Any] = None,
    ):
        """GRPO inference producer actor."""

    def rollout(self, input_ids, attention_mask, ...) -> Dict:
        """Generate responses and compute rewards."""

    def loop(self) -> None:
        """Main producer loop: receive weights, generate, send experiences."""

Import

from coati.distributed.producer import SimpleProducer

I/O Contract

Inputs

Name	Type	Required	Description
model_config	Dict	Yes	Model name/path and loading configuration
batch_size	int	Yes	Inference batch size
num_generations	int	No	Responses per prompt (default: 8)
backend	str	No	Inference backend: "transformers" or "vllm"
Updated weights	Dict[str, Tensor]	Yes	Received from consumer via Ray broadcast

Outputs

Name	Type	Description
Experience dict	Dict[str, Tensor]	Contains input_ids, attention_mask, action_mask, action_log_probs, reward, advantages

Usage Examples

# Producers are created internally by launch_distributed()
# Direct creation (for reference):
import ray

producer = SimpleProducer.options(
    num_gpus=1,
    scheduling_strategy=ray.util.scheduling_strategies.NodeAffinitySchedulingStrategy(
        node_id=ray.get_runtime_context().get_node_id(),
        soft=False,
    ),
).remote(
    producer_idx=0,
    num_producers=2,
    num_consumer_procs=4,
    num_episodes=1000,
    batch_size=16,
    train_dataset_config={"path": "data.jsonl"},
    model_config={"pretrained": "Qwen/Qwen2.5-3B"},
    generate_config={"temperature": 0.7},
    num_generations=8,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment