Implementation:Hpcaitech ColossalAI ExperienceMaker Base

Knowledge Sources	Hpcaitech_ColossalAI
Domains	Reinforcement Learning, RLHF, PPO
Last Updated	2026-02-09 00:00 GMT

Overview

Base classes for the Experience dataclass and the ExperienceMaker abstract interface used in PPO-based RLHF training.

Description

This module defines two core components of the ColossalChat RLHF system. The Experience dataclass holds a batch of PPO experience data including sequences, action log probabilities, values, rewards, KL divergences, advantages, and attention/action masks, with methods for device transfer (to_device) and memory pinning (pin_memory). The ExperienceMaker abstract base class defines the interface for generating experience data, holding references to the actor model, critic model, reward model, and initial (reference) model, with a single abstract method make_experience that subclasses must implement.

Usage

Use Experience as the standard data container throughout the PPO training pipeline. Subclass ExperienceMaker to implement custom experience generation strategies that coordinate the actor, critic, reward, and reference models.

Code Reference

Source Location

Repository: Hpcaitech_ColossalAI
File: applications/ColossalChat/coati/experience_maker/base.py
Lines: 1-90

Signature

@dataclass
class Experience:
    sequences: torch.Tensor
    action_log_probs: torch.Tensor
    values: torch.Tensor
    reward: torch.Tensor
    kl: torch.Tensor
    advantages: torch.Tensor
    attention_mask: Optional[torch.LongTensor]
    action_mask: Optional[torch.BoolTensor]

    @torch.no_grad()
    def to_device(self, device: torch.device) -> None:
    def pin_memory(self):

class ExperienceMaker(ABC):
    def __init__(
        self, actor: PreTrainedModel, critic: Critic,
        reward_model: RewardModel, initial_model: PreTrainedModel
    ) -> None:

    @abstractmethod
    def make_experience(
        self, input_ids: torch.Tensor, attention_mask: torch.Tensor, **generate_kwargs
    ) -> Experience:

Import

from coati.experience_maker.base import Experience, ExperienceMaker

I/O Contract

Inputs (ExperienceMaker.init)

Name	Type	Required	Description
actor	PreTrainedModel	Yes	The actor (policy) model for generating sequences
critic	Critic	Yes	The critic model for value estimation
reward_model	RewardModel	Yes	The reward model for computing rewards
initial_model	PreTrainedModel	Yes	The reference/initial model for KL divergence computation

Inputs (make_experience)

Name	Type	Required	Description
input_ids	torch.Tensor	Yes	Input token IDs (prompts)
attention_mask	torch.Tensor	Yes	Attention mask for the input
**generate_kwargs	dict	No	Additional generation parameters

Outputs (make_experience)

Name	Type	Description
return	Experience	A batch of experience data with sequences, log probs, values, rewards, KL, advantages, and masks

Usage Examples

from coati.experience_maker.base import Experience, ExperienceMaker
import torch

# Experience is used as data container
experience = Experience(
    sequences=sequences_tensor,
    action_log_probs=log_probs_tensor,
    values=values_tensor,
    reward=reward_tensor,
    kl=kl_tensor,
    advantages=advantages_tensor,
    attention_mask=attn_mask,
    action_mask=act_mask,
)

# Move experience to a specific device
experience.to_device(torch.device("cuda:0"))

# Pin memory for faster host-to-device transfer
experience.pin_memory()

Related Pages

Environment:Hpcaitech_ColossalAI_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment