Principle:OpenRLHF OpenRLHF Model Checkpointing

Knowledge Sources	DeepSpeed Checkpointing HuggingFace Model Saving
Domains	Training_Infrastructure, Distributed_Computing
Last Updated	2026-02-07 00:00 GMT

Overview

A persistence pattern that saves trained model weights from distributed DeepSpeed processes to a unified HuggingFace-compatible format on disk.

Description

Model Checkpointing in distributed training requires gathering sharded model parameters from all processes and saving them in a format usable for inference or further training. This principle handles ZeRO-3 parameter gathering, LoRA adapter extraction, and conversion to HuggingFace model format. It supports saving the full model or only LoRA adapter weights.

Usage

Use at the end of any training workflow to persist the trained model, or at intermediate checkpoints for fault tolerance. The saved model is compatible with HuggingFace's from_pretrained loading.

Theoretical Basis

In ZeRO-3 training, model parameters are sharded across all processes. Saving requires:

Parameter gathering: Each rank gathers full parameters from all other ranks
State dict construction: Only rank 0 constructs the full state dict
LoRA extraction: If using LoRA, only adapter weights are saved via PEFT
Disk writing: Model and tokenizer saved to output directory

Related Pages

Implemented By

Implementation:OpenRLHF_OpenRLHF_DeepspeedStrategy_save_model

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment