Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hpcaitech ColossalAI Trainer Callback Base

From Leeroopedia
Revision as of 15:10, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Hpcaitech_ColossalAI_Trainer_Callback_Base.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Reinforcement Learning, Callbacks, Training
Last Updated 2026-02-09 00:00 GMT

Overview

Abstract base callback class for the ColossalChat trainer, defining lifecycle hook interfaces for the on-policy reinforcement learning training loop.

Description

Callback is an abstract base class that defines the interface for callbacks used by the OLTrainer (online learning trainer). It provides no-op default implementations for ten lifecycle hooks covering the fit lifecycle (start/end), episode lifecycle (start/end), experience making (start/end), and learning epoch and batch boundaries (start/end). Each hook method takes minimal parameters appropriate to its phase.

Unlike the Ray callback base classes, this Callback is designed for the non-distributed (single-process) trainer and includes hooks for on_make_experience_start/end and on_learn_epoch_start/end instead of the update-oriented hooks in the Ray version.

Usage

Subclass Callback to implement custom logging, checkpointing, or performance monitoring during on-policy RLHF training. Override only the hooks relevant to your use case. Pass instances to the OLTrainer constructor via the callbacks parameter.

Code Reference

Source Location

Signature

class Callback(ABC):
    def on_fit_start(self) -> None: ...
    def on_fit_end(self) -> None: ...
    def on_episode_start(self, episode: int) -> None: ...
    def on_episode_end(self, episode: int) -> None: ...
    def on_make_experience_start(self) -> None: ...
    def on_make_experience_end(self, experience: Experience) -> None: ...
    def on_learn_epoch_start(self, epoch: int) -> None: ...
    def on_learn_epoch_end(self, epoch: int) -> None: ...
    def on_learn_batch_start(self) -> None: ...
    def on_learn_batch_end(self, experience: Experience) -> None: ...

Import

from coati.trainer.callbacks.base import Callback

I/O Contract

Inputs

Name Type Required Description
episode int No Episode index passed to on_episode_start/end
experience Experience No Experience object passed to on_make_experience_end and on_learn_batch_end
epoch int No Learning epoch index passed to on_learn_epoch_start/end

Outputs

Name Type Description
return None All callback methods return None

Usage Examples

from coati.trainer.callbacks.base import Callback
from coati.experience_maker import Experience

class WandbCallback(Callback):
    def on_episode_start(self, episode: int) -> None:
        print(f"Starting episode {episode}")

    def on_make_experience_end(self, experience: Experience) -> None:
        batch_size = experience.sequences.shape[0]
        print(f"Generated {batch_size} experience samples")

    def on_fit_end(self) -> None:
        print("Training complete")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment