Implementation:Hiyouga LLaMA Factory Train Callbacks
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Training Infrastructure |
| Last Updated | 2026-02-06 19:00 GMT |
Overview
Training callbacks for checkpoint management, progress logging, experiment tracking, and adapter conversion in LLaMA-Factory.
Description
The callbacks module defines five TrainerCallback subclasses and one utility function that provide essential cross-cutting concerns used by all training stages. FixValueHeadModelCallback separates value-head weights from decoder weights at checkpoint save time for PPO training. SaveProcessorCallback persists the processor (tokenizer + image processor) alongside model checkpoints. PissaConvertCallback handles PiSSA-to-LoRA adapter conversion at training start and end. LogCallback tracks training progress with timing, throughput, VRAM statistics, and writes JSON log files via a background thread pool, with optional Web UI integration. ReporterCallback pushes hyperparameter configurations to Weights & Biases or SwanLab at training start.
Usage
These callbacks are registered automatically by the training workflow modules. LogCallback and ReporterCallback are added for all training stages. FixValueHeadModelCallback is added for PPO training. SaveProcessorCallback is added when a processor is provided. PissaConvertCallback is added when PiSSA conversion is enabled.
Code Reference
Source Location
- Repository: Hiyouga_LLaMA_Factory
- File: src/llamafactory/train/callbacks.py
- Lines: 1-384
Signature
def fix_valuehead_checkpoint(
model: "AutoModelForCausalLMWithValueHead",
output_dir: str,
safe_serialization: bool,
) -> None:
"""Fix the valuehead checkpoint files by separating v_head weights."""
class FixValueHeadModelCallback(TrainerCallback):
def on_save(self, args, state, control, **kwargs): ...
class SaveProcessorCallback(TrainerCallback):
def __init__(self, processor: "ProcessorMixin") -> None: ...
def on_save(self, args, state, control, **kwargs): ...
def on_train_end(self, args, state, control, **kwargs): ...
class PissaConvertCallback(TrainerCallback):
def on_train_begin(self, args, state, control, **kwargs): ...
def on_train_end(self, args, state, control, **kwargs): ...
class LogCallback(TrainerCallback):
def __init__(self) -> None: ...
def on_init_end(self, args, state, control, **kwargs): ...
def on_train_begin(self, args, state, control, **kwargs): ...
def on_train_end(self, args, state, control, **kwargs): ...
def on_log(self, args, state, control, **kwargs): ...
def on_prediction_step(self, args, state, control, **kwargs): ...
class ReporterCallback(TrainerCallback):
def __init__(
self,
model_args: "ModelArguments",
data_args: "DataArguments",
finetuning_args: "FinetuningArguments",
generating_args: "GeneratingArguments",
) -> None: ...
def on_train_begin(self, args, state, control, **kwargs): ...
Import
from llamafactory.train.callbacks import (
FixValueHeadModelCallback,
SaveProcessorCallback,
PissaConvertCallback,
LogCallback,
ReporterCallback,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | AutoModelForCausalLMWithValueHead | Yes (FixValueHead) | Value-head model whose checkpoint needs splitting |
| output_dir | str | Yes (FixValueHead) | Directory where checkpoint files are saved |
| processor | ProcessorMixin | Yes (SaveProcessor) | Tokenizer/processor to save alongside checkpoints |
| model_args | ModelArguments | Yes (Reporter) | Model config for experiment tracking |
| finetuning_args | FinetuningArguments | Yes (Reporter) | Fine-tuning config for experiment tracking |
Outputs
| Name | Type | Description |
|---|---|---|
| fix_valuehead_checkpoint | None | Side effect: splits v_head weights into separate file, saves decoder weights |
| LogCallback logs | JSON file | Writes training progress (loss, lr, epoch, throughput, VRAM) to trainer_log.jsonl |
| ReporterCallback | None | Side effect: updates wandb/swanlab config with all argument dictionaries |
Usage Examples
from llamafactory.train.callbacks import LogCallback, ReporterCallback, SaveProcessorCallback
# LogCallback is typically added automatically
log_callback = LogCallback()
# It tracks: current_steps, total_steps, loss, eval_loss, lr, epoch,
# percentage, elapsed_time, remaining_time, throughput, VRAM usage
# ReporterCallback pushes config to experiment trackers
reporter = ReporterCallback(model_args, data_args, finetuning_args, generating_args)
# SaveProcessorCallback preserves processor in checkpoints
if processor is not None:
trainer.add_callback(SaveProcessorCallback(processor))
# Callbacks are registered via trainer.add_callback() or passed to Trainer init
from transformers import Trainer
trainer = Trainer(
model=model,
callbacks=[log_callback, reporter],
...
)