Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hpcaitech ColossalAI ExperienceMaker Base

From Leeroopedia


Knowledge Sources
Domains Reinforcement Learning, RLHF, PPO
Last Updated 2026-02-09 00:00 GMT

Overview

Base classes for the Experience dataclass and the ExperienceMaker abstract interface used in PPO-based RLHF training.

Description

This module defines two core components of the ColossalChat RLHF system. The Experience dataclass holds a batch of PPO experience data including sequences, action log probabilities, values, rewards, KL divergences, advantages, and attention/action masks, with methods for device transfer (to_device) and memory pinning (pin_memory). The ExperienceMaker abstract base class defines the interface for generating experience data, holding references to the actor model, critic model, reward model, and initial (reference) model, with a single abstract method make_experience that subclasses must implement.

Usage

Use Experience as the standard data container throughout the PPO training pipeline. Subclass ExperienceMaker to implement custom experience generation strategies that coordinate the actor, critic, reward, and reference models.

Code Reference

Source Location

Signature

@dataclass
class Experience:
    sequences: torch.Tensor
    action_log_probs: torch.Tensor
    values: torch.Tensor
    reward: torch.Tensor
    kl: torch.Tensor
    advantages: torch.Tensor
    attention_mask: Optional[torch.LongTensor]
    action_mask: Optional[torch.BoolTensor]

    @torch.no_grad()
    def to_device(self, device: torch.device) -> None:
    def pin_memory(self):

class ExperienceMaker(ABC):
    def __init__(
        self, actor: PreTrainedModel, critic: Critic,
        reward_model: RewardModel, initial_model: PreTrainedModel
    ) -> None:

    @abstractmethod
    def make_experience(
        self, input_ids: torch.Tensor, attention_mask: torch.Tensor, **generate_kwargs
    ) -> Experience:

Import

from coati.experience_maker.base import Experience, ExperienceMaker

I/O Contract

Inputs (ExperienceMaker.__init__)

Name Type Required Description
actor PreTrainedModel Yes The actor (policy) model for generating sequences
critic Critic Yes The critic model for value estimation
reward_model RewardModel Yes The reward model for computing rewards
initial_model PreTrainedModel Yes The reference/initial model for KL divergence computation

Inputs (make_experience)

Name Type Required Description
input_ids torch.Tensor Yes Input token IDs (prompts)
attention_mask torch.Tensor Yes Attention mask for the input
**generate_kwargs dict No Additional generation parameters

Outputs (make_experience)

Name Type Description
return Experience A batch of experience data with sequences, log probs, values, rewards, KL, advantages, and masks

Usage Examples

from coati.experience_maker.base import Experience, ExperienceMaker
import torch

# Experience is used as data container
experience = Experience(
    sequences=sequences_tensor,
    action_log_probs=log_probs_tensor,
    values=values_tensor,
    reward=reward_tensor,
    kl=kl_tensor,
    advantages=advantages_tensor,
    attention_mask=attn_mask,
    action_mask=act_mask,
)

# Move experience to a specific device
experience.to_device(torch.device("cuda:0"))

# Pin memory for faster host-to-device transfer
experience.pin_memory()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment