Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Norrrrrrr lyn WAInjectBench torch save Checkpoint

From Leeroopedia
Knowledge Sources
Domains Model_Management, Deep_Learning
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for saving LLaVA fine-tuning checkpoints with training metadata, provided by PyTorch's torch.save as used in the WAInjectBench train/llava-ft module.

Description

The checkpoint saving in train/llava-ft.py (L394-408) uses torch.save to write a dictionary containing the model state dict, optimizer state dict, epoch number, best TPR, and AMP configuration. The filename follows the pattern best_epoch{N}_tpr{X.XXXX}.pt under the --out_dir directory (default "runs/ft"). Checkpoints are only saved when the current epoch's TPR exceeds the previous best.

Usage

Called automatically within the validation loop when a new best TPR is achieved.

Code Reference

Source Location

Signature

if tpr > best_tpr:
    best_tpr = tpr
    best_path = os.path.join(args.out_dir, f"best_epoch{epoch}_tpr{tpr:.4f}.pt")
    torch.save(
        {
            "epoch": epoch,
            "model_state": model.state_dict(),
            "optimizer_state": optim.state_dict(),
            "best_tpr": best_tpr,
            "amp_enabled": state.use_amp,
            "amp_dtype": str(amp_dtype) if amp_dtype is not None else "fp32",
        },
        best_path
    )
    print(f"Saved: {best_path}")

Import

import torch
import os

I/O Contract

Inputs

Name Type Required Description
model nn.Module Yes Trained model whose state_dict to save
optim torch.optim.AdamW Yes Optimizer whose state_dict to save
epoch int Yes Current epoch number
best_tpr float Yes Best TPR achieved
state TrainState Yes AMP configuration
out_dir str Yes Output directory (default "runs/ft")

Outputs

Name Type Description
.pt file File Checkpoint at {out_dir}/best_epoch{N}_tpr{X.XXXX}.pt containing model_state, optimizer_state, epoch, best_tpr, amp_enabled, amp_dtype

Usage Examples

Saving and Loading Checkpoints

import torch

# Save checkpoint
torch.save({
    "epoch": 2,
    "model_state": model.state_dict(),
    "optimizer_state": optim.state_dict(),
    "best_tpr": 0.9523,
    "amp_enabled": True,
    "amp_dtype": "torch.bfloat16",
}, "runs/ft/best_epoch2_tpr0.9523.pt")

# Load checkpoint for inference
ckpt = torch.load("runs/ft/best_epoch2_tpr0.9523.pt")
model.load_state_dict(ckpt["model_state"])
print(f"Loaded epoch {ckpt['epoch']} with TPR={ckpt['best_tpr']}")

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment