Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hiyouga LLaMA Factory Train Callbacks

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Training Infrastructure
Last Updated 2026-02-06 19:00 GMT

Overview

Training callbacks for checkpoint management, progress logging, experiment tracking, and adapter conversion in LLaMA-Factory.

Description

The callbacks module defines five TrainerCallback subclasses and one utility function that provide essential cross-cutting concerns used by all training stages. FixValueHeadModelCallback separates value-head weights from decoder weights at checkpoint save time for PPO training. SaveProcessorCallback persists the processor (tokenizer + image processor) alongside model checkpoints. PissaConvertCallback handles PiSSA-to-LoRA adapter conversion at training start and end. LogCallback tracks training progress with timing, throughput, VRAM statistics, and writes JSON log files via a background thread pool, with optional Web UI integration. ReporterCallback pushes hyperparameter configurations to Weights & Biases or SwanLab at training start.

Usage

These callbacks are registered automatically by the training workflow modules. LogCallback and ReporterCallback are added for all training stages. FixValueHeadModelCallback is added for PPO training. SaveProcessorCallback is added when a processor is provided. PissaConvertCallback is added when PiSSA conversion is enabled.

Code Reference

Source Location

Signature

def fix_valuehead_checkpoint(
    model: "AutoModelForCausalLMWithValueHead",
    output_dir: str,
    safe_serialization: bool,
) -> None:
    """Fix the valuehead checkpoint files by separating v_head weights."""

class FixValueHeadModelCallback(TrainerCallback):
    def on_save(self, args, state, control, **kwargs): ...

class SaveProcessorCallback(TrainerCallback):
    def __init__(self, processor: "ProcessorMixin") -> None: ...
    def on_save(self, args, state, control, **kwargs): ...
    def on_train_end(self, args, state, control, **kwargs): ...

class PissaConvertCallback(TrainerCallback):
    def on_train_begin(self, args, state, control, **kwargs): ...
    def on_train_end(self, args, state, control, **kwargs): ...

class LogCallback(TrainerCallback):
    def __init__(self) -> None: ...
    def on_init_end(self, args, state, control, **kwargs): ...
    def on_train_begin(self, args, state, control, **kwargs): ...
    def on_train_end(self, args, state, control, **kwargs): ...
    def on_log(self, args, state, control, **kwargs): ...
    def on_prediction_step(self, args, state, control, **kwargs): ...

class ReporterCallback(TrainerCallback):
    def __init__(
        self,
        model_args: "ModelArguments",
        data_args: "DataArguments",
        finetuning_args: "FinetuningArguments",
        generating_args: "GeneratingArguments",
    ) -> None: ...
    def on_train_begin(self, args, state, control, **kwargs): ...

Import

from llamafactory.train.callbacks import (
    FixValueHeadModelCallback,
    SaveProcessorCallback,
    PissaConvertCallback,
    LogCallback,
    ReporterCallback,
)

I/O Contract

Inputs

Name Type Required Description
model AutoModelForCausalLMWithValueHead Yes (FixValueHead) Value-head model whose checkpoint needs splitting
output_dir str Yes (FixValueHead) Directory where checkpoint files are saved
processor ProcessorMixin Yes (SaveProcessor) Tokenizer/processor to save alongside checkpoints
model_args ModelArguments Yes (Reporter) Model config for experiment tracking
finetuning_args FinetuningArguments Yes (Reporter) Fine-tuning config for experiment tracking

Outputs

Name Type Description
fix_valuehead_checkpoint None Side effect: splits v_head weights into separate file, saves decoder weights
LogCallback logs JSON file Writes training progress (loss, lr, epoch, throughput, VRAM) to trainer_log.jsonl
ReporterCallback None Side effect: updates wandb/swanlab config with all argument dictionaries

Usage Examples

from llamafactory.train.callbacks import LogCallback, ReporterCallback, SaveProcessorCallback

# LogCallback is typically added automatically
log_callback = LogCallback()
# It tracks: current_steps, total_steps, loss, eval_loss, lr, epoch,
# percentage, elapsed_time, remaining_time, throughput, VRAM usage

# ReporterCallback pushes config to experiment trackers
reporter = ReporterCallback(model_args, data_args, finetuning_args, generating_args)

# SaveProcessorCallback preserves processor in checkpoints
if processor is not None:
    trainer.add_callback(SaveProcessorCallback(processor))

# Callbacks are registered via trainer.add_callback() or passed to Trainer init
from transformers import Trainer
trainer = Trainer(
    model=model,
    callbacks=[log_callback, reporter],
    ...
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment